We Have Mail
Dear Mr. Ward:
I am not much of a letter writer, but after reading the July 89 issue of the C
Users Journal I felt I could save some of your readers a lot of time tracking
down a problem with the Microsoft C, version 5.10 memory allocation routines.
Enclosed is a listing and the output from the program.
This may help Steven Isaacson who is having memory allocation problems using
Vitamin C. I found this problem after a week of tracking down a memory leak
problem in a very large application. My final solution was to write my own
malloc()/free() rountines that call DOS directly. This will let the DOS
allocator do what it is supposed to do. No time penalty was noticed in our
application.
Note if you do write your own malloc()/free() routines, call them something
else! MSC uses these routines internally and makes assumptions about what data
is located outside the allocated area. I always use a malloc()/free() shell to
test for things like memory leaks and the free of a non-allocated block. It
also will give you an easy way to install a global 'out of memory' error
handler.
The code supplied by Leonard Zerman on finding the amount of free space in a
system is simplistic and very limited. A better routine would build a linked
list of elements and then the variable vptrarray could be made a single
pointer to the head of the list. The entire routine becomes dynamic, much more
robust, and there is no danger of overflowing a statically allocated array.
See the supplied code for an example.
The linked list implementation has the side effect that it will work on a
virtual memory system. Why you would want to do this is beyond me, but it
could be considered a very time consuming way to find out what swapmax is set
to on a UNIX system.
If you have any questions, please contact me. My phone number is (408)
988-3818. My fax number is (408) 748-1424.
Sincerely yours,
Jim Schimandle
Primary Syncretics
473 Sapena Court, Unit #6
Santa Clara, CA 95054
Thanks for the information. We've included your code in Listing 1. -- rlw
Dear Mr. Ward:
I'm new to programming and need to extract information from old mainframe
files. Each file has its own annoying attributes.
Some files are reports for printing on 132 column paper with headers on each
page along with errors in tabulation and decimal point alignment.
I'd like to know enough about grep, awk, sed, and tr so I'm not reinventing
the wheel with my C programs for file manipulation.
Where can I find an understandable and brief overview of these UNIX tools? (I
know nothing about regular expressions, scanning, and syntactic analysis.)
Sincerely,
Orion C. Whitaker, M.D.
400 Brookline Ave., #22F
Boston, MA 02215
I suggest The UNIX Programming Environment by Kernighan and Pike. This is a
tidy little book that does more to explain how the tools work and work
together than any other book I've seen. While it's insightful, it's also a
good teaching text.
You should also consider The Awk Programming Language by Aho, Kernighan and
Weinberger (the A. W. K. in awk).
If our readers know of other texts that do a good job of explaining how to use
the UNIX language-oriented tools, I'd like to hear from you. -- rlw
Thank you for your letter/brochure. First, I have some questions. I studied
BASIC last Semester at Comm. College, and would now like to learn C. My major
problem is MY computer. I have a Commodore 64 with 256K RAM expansion, and
plan to use Abacus Software's Super C Compiler 64. I am a retiree with little
prospect of buying a new computer.
1. Do you offer much in this format, or am I butting my head against a wall?
2. Would it be practical for me to attend a class where they are using,
probably, IBM compatibles, and do my homework on my system? Would work
developed on my system operate on "IBM"s? The disks are not compatible, but
could my work be 'retyped' into the "IBM"?
I have Standard C by Plauger & Brodie, and Transactor Magazine has articles
which look like they will be useful when I learn more.
Les Maynard
P.O. Box 915
Palmer, AK 99645
Unfortunately, we can't write Commodore disks. However, it's my understanding
that if you have the right Commodore drive you can get a program that will let
you read MS-DOS disks directly.
Whether you can do your C homework on your Commodore depends on several
things:
1) Is your instructor willing to accept Commodore output. If you have to run
your work on an MS-DOS host to make it acceptable, it probably won't work.
2) What subjects and exercises will the class focus upon? If writing direct to
the IBM video display is one of the exercises, it probably isn't reasonable
for you to try to work along on the Commodore. If, on the other hand, the
class will confine itself to general, portable language features and concepts,
you will have less trouble.
3) How adept are you at researching your own system? At some point (probably
several points), a classroom illustration isn't going to work on your machine.
It really isn't fair to expect the instructor to research the problem for you.
Can you find your own way?
4) Is your Commodore implementation complete enough to support the scope of
the class? Will you be asked to write programs that exceed the memory space?
Will you need doubles? Will the exercises require elaborate pre-processor
capabilities?
At the very least you should have a serious talk with the instructor before
you enroll.
Whether work you develop will run on an IBM depends entirely upon the code. If
you confine yourself to generic file processing and discipline yourself to
avoid or at least properly hide any Commodore peculiarities, then your code
should run in the IBM environment. (You might find some helpful ideas in
Rainer Gerhard's story in this issue.) Please note these are major ifs even
for very experienced C programmers. -- rlw
Dear CUG,
I am writing to warn you and other users of the problems I have found with LEX
part 1 and 2 on disk number 172 and 173. The program generates code which
crashes the system when run. The problem is in llstin(). If _tabp is NULL, it
assigns it to the return of lexswitch(). lexswitch() returns a pointer to the
previous table, which is NULL when first cared. The results is _tabp being set
to NULL forever. Since this table contains pointers to functions, the program
jumps off to an unknown address. The source code that was provided will NOT
generate this code, indicating that the exe file was not built from this
source! So, I rebuilt it and, in testing, found the new exe produced different
tables than the release program did.
There are various solutions to this problem. One is by setting _tabp to the
location of the table in the .lxi. The solution is to edit the generated
source file each time and removing the assignment statement to _tabp in
llstin(). Or you could alternately change lexswitch() to return the new value.
I don't like the last one because all the documentation states the return
value is a pointer to the previous table. Since I am using the -s option, I
edit the file as there is another problem with that option.
The problem with the -s option may only exist with Microsoft C. llstin() is
declared as a void at the beginning. The function itself is NOT. The compiler
produces a diagnostic error. With the incorrect source, the only way around
this is to edit the file. (A REAL PAIN if you are using a make file to build
the final program.)
I also have a copy of "Bison". It has worked very well with one exception. I
found I had to include stdlib.h in simple.prs in order to get rid of several
warning messages under certain conditions. One might include it inside the .y
file, instead. By placing it inside simple.prs I don't have to remember to put
it inside the .y. In general, I've found bison to be GREAT.
Keep up the good work, and good luck.
Sincerely,
Frank Veenstra
24797 Metric Dr.
Moreno Valley, CA 92387
Yes, the .exe and source files are out of phase. We'll test your fix and
remaster the volume with the fix. When we have a new master we'll announce an
upgrade in the New Releases column. Thanks for the help. -- rlw
Mr. Robert Ward:
In the May, 1989 issue of the C Users Journal, Timothy Prince presented a
rather eloquent and detailed article entitled "Efficient Matrix Coding in C".
However, I would like to bring to your attention, excuse me if someone already
has, an error in that article.
Mr. Prince asserts the following to be true:
a[i][j] = *( &a[0][0] + i * I + j )
when given the declaration:

float a[I][J] ;
C stores array elements in a row-major order and not in column-major order as
suggested above. The valid condition is as follows:
a[i][j] = *( &a[0][0] + i * J + j)
for the given declaration. All the elements of row a[0][.] are located at a
lower address than the first element of row a[1][.], which is stored right
after the array element a[0][J-1]. Consequently, to access a[i][j], it is
necessary to skip i rows, where each row contains J elements plus the j
elements before the desired element.
I would also like to take this opportunity to commend you and your staff on
producing a Journal that is superior technically than all the other
superficial computer magazines that I have read. That May issue was my first
copy of C Users Journal and it certainly will not be my last.
Sincerely yours,
Girish T. Hagan
27401 Via Olmo
Mission Viejo, CA 92691
Ah yes, the hazard of too much FORTRAN and Pascal. Thanks for correcting our
slip -- and thanks for the kind words. -- rlw
Dear Robert,
I have been a member of the C Users' Group for quite a long time now, around
the seven to eight years mark. Over this period I have kept all of your
newsletters and your present The C Users Journal publications. I have watched
the evolution of the Journal with great interest.
During your 'early days' I often reread some of the newsletters when I needed
some information on a particular piece of code, or on a bug which another
member had discovered. But time seems to compress as you get older. These days
I rarely have the time to re-read articles, unless it is important that I do
so.
WHY is he telling me this...do I hear you ask? Well, I hope I have set the
scene properly because I assume you have many more readers than just Phil
Cogar who have difficulty in finding enough time to squeeze in their preferred
reading. Professionals in any line of work tend to be busy people.
Which brings me to the August issue of the Journal and, specifically, the
article by Denis Schrader on the FOR_C Translator. Not that I am at all
interested in FORTRAN_to_C translators but I always read the Journal from
cover to cover and I hope my comments will assist in raising the standard of
the Journal even further.
With respect to Denis Schrader, who I hope does not take offence that I have
selected his article to point out what I believe is wrong with some of the
User Reports, I would like to direct your attention to this article with the
plea that you consider setting certain standards for authors to write to for
future User Reports.
So, and without wishing to offend Denis, let me start by asking you to
instruct your authors to make their reviews complete (or as complete as they
can in the circumstance) as they stand. Don't presume the reader either has
access to, or the inclination to look up, an earlier review. Of course rules
are meant to be broken so you might give a reference to something written
within the previous several months, but I suggest two years is a bit too long.
I refer here specifically to the words-"...which I reviewed in the August 1987
issue...However, comments in this review will point out improvements which
have occurred since the release of earler verisons of the product."
Point 2, back up specific comments with specific information. For example if
you say--
"The translator will pay for itself quickly in saved programmer hours."
then you should also say how much it costs, both the List Price and, if you
know it, the street price.
Point 3, if we are talking about a specific product then either cut out or cut
down on the generalisations. An example of this is the comment (statement?)--
"The translator translates almost 100 percent of ANSI Standard FORTRAN as
well...extensions."
If the reader is reading the User Report because he or she wants to be better
informed about the product (and isn't that the purpose of the User Report?)
then, in this case, unless we are told-
-Whether this (the non-translation of the FORTRAN code) is a transient thing.
In other words do you have to check each piece of translated code for small
errors (perhaps for large errors...I don't know and the Review doesn't say)
which might translate to bugs in C; or
-Whether this is systematic and the FOR_C translator only fails to properly
translate certain pieces of FORTRAN code properly into C. In such a case does
the translator 'flag' the offending pieces of code so they can be corrected
using the recommended, known conversion; or
-If your translated C code compiles without the compiler complaining to you,
does this mean the code is a 1-for-1 translation of the FORTRAN routine, or
not;
and so on.
It seems to me that a generalised comment of the type mentioned above does
little (nothing?) to better inform the reader about the merits or otherwise of
the product.
Point 4, comparisons are odious (or so we are told) but they seem to abound in
product reviews. My point is that partial comparisons tend more to mislead the
reader than to inform him/her.
In other words we are talking here about a product which translates FORTRAN
code into C code. We are not told WHY it is desirable to do this if you
already have good, de-bugged FORTRAN routines you wish to incorporate into C
programmes. Please correct me if I have got it wrong but, as I understand the
situation the Microsoft family of microcomputer languages allow you to
generate files compiled in Basic, C Pascal, FORTRAN and assembler any and all
of which can be linked into a run-time file as required.
I am (most certainly) NOT an apologist for Microsoft but I do suggest a
reviewer has not properly informed the reader as to the merits or otherwise of
the product without at least canvassing other alternatives. If Microsoft, for
example, have a family of languages which can do the job in another way
(you'll notice I didn't say 'a better way' because I don't have a clue which
is the better approach, the Review didn't tell me) then the Reviewer should at
least mention this.
In other words alert the reader to other possible alternatives, at least. The
preferred option would be to make a comparison between the competing products
and compare features, strengths and weaknesses.
So there it is. In summary my four points are--
Point 1- Make the review as complete as possible in the spacce allowed. Don't
ask the reader to look up other references. We aren't dealing with a
scientific paper, just a product review.
Point 2- Give specific (factual) backup to specific comments. It's not that we
don't trust reviewers to be objective, but we are discussing opinions here,
and my opinion may well differ from the Reviewer's if I am given the
opportunity to see what his\her opinion was based on.
Point 3- Leave out generalisations, at least if we are discussing one,
specific product. Generalisations are OK if we are discussing a 'family' of
products. Who was it said--
'All generalisations are false.'
Or perhaps I got that wrong?
Point 4- If you believe comparisons (with products from other sources) make
the review stronger, then by all means put in the comparisons...but at least
try to cover the best alternatives to the product being reviewed. Anything
less then you are misleading your readers.
I know it has been tedious, but that's all I wish to say on the subject for
the moment. Perhaps you will find something here to put before future Product
Reviewers, when they submit their articles. My hope is that I have sparked a
debate which will lead to an even higher standard for what is already a fine
publication.
Yours sincerely,
Phil E. Cogar
P.O. Box 364, Narrabeen,
N.S.W. Australia 2101
I find myself in complete agreement with your four points. I'm sorry the FOR_C
article didn't measure up.
Generally I'd just as soon do without "reviews". That's why we've used the
label "User Reports". I don't really care if someone gave the product four
stars -- I want to know what it's like to use the product. Will it require
some changes in my work habits? Does it seem to fit a certain design style
better than others? Are certain unobvious tricks necessary to certain goals.
If someone has spent enough time with a product to be qualified to evaluate it
for other experienced programmers, then that person has also learned several
things that aren't in the manuals. Why should I have to relearn those items if
I decide to buy the product? The writer should give me the full vicarious
benefit of his experience.
Here are some of my guidelines for anyone interested in writing a product
related story.
Don't try to sell the product or your philosophy of how products should be
designed, tested, marketed, packaged ... whatever. Instead, tell us what it
does and doesn't do.
Keep the opinions to a minimum. If you give intelligent, experienced readers
access to the facts that produced your opinion, they'll reach a similar (or at
least reasonable) opinion on their own.
Don't be cute. I don't care how entertaining you think your struggle to remove
the shrink wrap was, I don't want to waste time reading about it.
Don't guess. If you aren't certain about a particular issue, either find out
or don't mention it.
Don't just list features. That's the role of vendor literature.
Do share all you learned in working with the product. If you include
information inappropriate to my audience, I can edit it out. I can't edit in
information.
I'm acutely aware that we very seldom get product-releated copy that fully
measures up to these guidelines. We're always working on getting better copy.
-- rlw
To The C Users Group:
I am disheartened at the lack of truly advanced pioneering books in C
programming. Particularly those of a scientific nature. Numerical Recipes in C
and Numerical Software Tools in C are the only two that I have heard of, which
are primarily argorithm'ic' books without instruction. Everyone seems to be
publishing the same link lists, the same databases, and the same TODO lists.
Just as in assembly language books one gets the same Ram disks, disk caches
and clocks. That is not just book publishers either. Journals and magazines
are doing the same thing. I cannot believe that the programming community
lacks such expertise. When will publishers realize that enough is enough, and
start producing books and articles of a truly advanced nature, like the one
you had The Fast Walsh Transform. It is also time for a complete numerical
methods book written for C programming in a common compiler (MSC TC) with full
descriptions as one would receive in a course in numerical methods at a
University.
Sincerely,
Jerry Rice

504 Eastland St.
El Paso, TX 79907
Maybe some qualified author (with a willing publisher) will hear your plea.
Why do publishers publish the same material over and over? Perhaps because it
sells. One of our earlier issues (with several stories covering the
fundamentals of device drivers) remains one of our most popular back issues.
Perhaps device drivers are old-hat to you, but to many they remain a mystery.
Most of our readers are expert programmers, they just aren't all expert in the
same areas.--rlw



























































Using Header Files To Enhance Portability


Rainer Gerhards


Rainer Gerhards specializes in systems programming and has a strong interest
in C. He has written some large-scale control systems and many small utilities
in C. He owns his own small software company in addition to managing the
computing center of a mid-sized company. He may be contacted at
Petronellastrasse 6, 5112 Baesweiler, West Germany.


C is known for its efficient code, rich set of features and portability. While
portability is not built in, you can avoid possible portability problems by
anticipating them. Let's look at a few problem areas, suggest some solutions,
and examine one method in detail.
One important portability issue is the C dialect that your compiler
implements. Although there have always been C language standards, until
recently they have been too imprecise to preclude varying interpretations.
Early, less powerful machines also forced compiler writers to limit features,
contributing additional variant dialects. Thus, some compilers can't
understand valid C-coding if it contains unsupported features.
Bit fields are a good example. A number of modern compilers still don't
support bit fields. Of course, you could avoid using bit fields, but what if
you write for one compiler which doesn't support structure and union
assignment and for several others which do? You might avoid these constructs
too, but would you prefer to learn while porting a 50,000 line program which
makes extensive use of structure assignment, that the environment to which
you're porting doesn't support structure assignment? The challenge is to know
which features to avoid.
Now nearly all commercially-used compilers support C in its entirety. But
these compilers offer extra features, especially in the preprocessor area.
Though you may simply avoid these features, you may not know which features
are non-standard, especially if you are new to C or if you work in just one
environment. Some compiler vendors don't flag such features.
Even an experienced C programmer determined to avoid the problems outlined
above by using only standardized constructs still faces the difficulty of
deciding which "standard" to use: the original Kernighan and Ritchie (K&R)
standard defined in The C Programming Language, or the forthcoming ANSI
standard.
The ANSI standard resolves many portability problems not addressed by K&R and
provides a good base for the future. The ANSI standard is mostly upwardly
compatible with K&R; most K&R programs can be moved to ANSI compilers without
any problems. But in order to move code in the opposite direction successfully
(from ANSI to K&R), compilers require special preprocessor tricks I'll
describe later.
The standard library poses similar problems. Compiler writers have restricted
and extended the library rather than the language. Some compilers don't even
have a standard library; many libraries include numerous extensions. MS-DOS
compilers in particular tend to offer extensions covering graphics,
interrupts, and operating system interfaces. Porting code which uses one
compiler's extensions to a different compiler can be very difficult.
Operating system differences, because they are the hardest to hide, are among
the hardest subjects to address. Moreover, operating systems differ greatly --
some do multi-tasking, some are multi-user, and some are single tasking
systems. The file-naming conventions are anything but standardized. These
problems are minor compared to the variations in file organization. For
example, while most operating systems consider text files to have variable
length records (if any), some use fixed-length records (if any). Records may
be delimited by \n, \n\r or record-length fields. Some OSs use special
blocking mechanisms, others don't.
Fortunately most standard libraries can hide these differences, but only by
distinguishing between text and binary mode, introducing subtle, non-standard
features.
In addition to processing files the operating system should have some kind of
interaction with the user, which leads to additional problems if you use
special system features like asynchronous communication or sophisticated
display manipulation.
Hardware differences can cause programs that compile and link without error
and run well in one environment, to crash in another. Often these problems are
caused by different word lengths. It's hard for a UNIX programmer working with
the portable C compiler (PCC) on 68xxx to learn that the same PCC on
80x86-based machines uses 16 instead of 32 bits for integers. A 68xxx program
that uses integers to index some two million database records on a 68xxx
machine may require a major rewrite before it can access more than 32,767
records on the 80x86 machine.
Hardware differences can also affect the portability of pointer casts. Many
programmers assume that pointers can simply be cast from one type to another
-- a reasonable assumption on most byte machines. However, word machines'
(like the Unisys 1100) pointers to word-aligned items differ significantly
from pointers to non-aligned items. This is true for some so-called byte
machines too. Still other problems arise when you port code from machines with
a segmented address space to one with a linear address space.
The last problem is machine resources. Many programmers assume that if their
code is portable and standardized, their program will run on all machines
supporting a standard C- compiler. While this is basically true, some programs
require so much memory or processing time that they simply can't be run on
some smaller machines.


Designing For Portability


In spite of these problems, it is possible to write C programs that can be
compiled and executed in different environments. To be portable, a program
must be designed and coded in a fashion that hides environmental differences.
C's own design hides many environmental differences. The standard library is a
successful attempt to hide some very environment-specific information -- such
as the way in which file system (and some others) calls are done on the target
operating system. Without the standard library, every programmer would have to
write the interface coding himself. Even worse, he would have to rewrite it
again and again for each new environment.
You can hide other large environment differences by creating your own
"standard libraries" for other tasks: extract the non-portable operations to a
separate source module, define a general interface for this model and build a
different implementation for every environment you want to work with. Many of
the high quality portable support library products available do this for you.
Such a library provides "instant" portability, lower cost, and more
functionality than an equivalent product written by a single programmer.
While system-specific libraries are appropriate for horrible, non-portable
tasks like dealing with the user console, using a standardized function call
for smaller tasks which require only slightly different coding in limited
areas of the source code might not make sense. In this case it would not make
sense to define a one-line function to set a signal handler under one
environment only, especially if the signal-handler is called from inside a
tight loop where the calling overhead could cause performance problems.
The C preprocessor is the obvious tool for these smaller coding differences:
just use conditional compilation to enable the code which sets the signal
handler in the one environment where it's needed. You don't have to define a
large number of functions, and there is no unnecessary calling overhead.
The preprocessor can also help solve problems that arise simply because
different names are used for the same thing. For example, nearly every
compiler uses its own name for the machine-level i/o (port) functions of
MS-DOS compilers (for example inp and outp versus inportb and outportb).
Fortunately these functions have the same calling conventions. In this
situation, rather than use conditional compilation for every function call
parameterized, just use conditional compilation one time to define a macro
that in turn calls the function with the right name. Everywhere else, the code
uses the macro to call the function.
Macro and constant definitions can also completely hide slight differences in
standard library paramenters. For example, when working under two different
operating systems where the standard libraries have different open modes for
text and binary files, you could use the call to open a binary file for
writing
fp = fopen ("file", OPM_WM)
Under UNIX, OPM_WB would be defined "w" and the call would expand to
fp = fopen("file", "w")
Under MS-DOS (Microsoft C) OPM_WB would be defined "wb" and would expand to
fp = fopen("file", "wb")
Sometimes a simple define can also hide significant hardware differences.
Different data type sizes can be hidden by defining your own data types with a
guaranteed minimum and maximum precision. For example, type int32 (integer
containing at least 32 bits) would be mapped to int for 68xxx machines and to
long for 80x86 machines. If int32 has been used in every spot requiring a
32-bit integer, nothing but the definition needs to be changed to adjust for
the alternate name. (Please note that a data type redefinition can be done
either with the preprocessor or a compiler typedef. While the former is
potentially more portable, so far I have not seen a compiler which does not
implement typedef. Thus I prefer using typedef because sophisticated compilers
can do better error checking with it. However, if you want to be absolutely
sure that your data type redefinition will be accepted by all old compilers,
you must use preprocessor defines.)
By now it is obvious that the preprocessor can help make programs more
portable. What would make more sense than to combine all these
preprocessor-based aids? This can be done in a single header file. For nearly
two years I have been using such a file, working mainly with four different
MS-DOS compilers and the UNIX PCC. The idea developed because of minor
standard-library differences between MS-DOS compilers, but it soon became
clear that the header file could help when porting to UNIX, too. The still
incomplete result will be described below.


environ.h


All necessary preprocessor statements and typedefs are included in one single
file named environ.h (Listing 1). It should be the very first file included.
Before including environ.h, you should define which other standard include
files you need. This is done by defining some preprocessor constants which
correspond to standard include file functionality. You read right,
functionality -- not names. For example, if you select the define INCL_ASSERT,
not only will the file assert.h be included but the necessary (for MS-DOS/MSC)
file process.h also. If you compile under UNIX, only assert.h is included.
Defining these constants in terms of functionality hides the include file name
differences -- an important feature that saves you many conditional directives
in the source modules. Microsoft uses a similar system for their OS/2 header
files in MSC 5.1.
When completely defined for your environment, environ.h should #include all
include files needed by your application. If you find it necessary to
explicitly include other files, you should extend the definitions in
environ.h. They are still incomplete (see lines 274 - 401).
environ.h begins by preventing the accidental inclusion of a header file more
than once. Multiple inclusion may cause damage to some preprocessor defines.
At best, it will cause additional overhead, and at worst, program errors may
occur. To prevent these problems environ.h checks preprocessor constant
ENVIRON_H. If this constant is defined, environ.h assumes that it has been
previously included and takes no further steps (via the #ifndef ENVIRON_H in
line 26). If ENVIRON_H is not defined, then this is the first inclusion of
environ.h and processing takes place. First ENVIRON_H is defined, ensuring
that no second inclusion will be possible.
Next, based on which compiler and operating system are active, ENVIRON_H
defines the target environment. Information about the environment is acquired
in a relatively straightforward way (lines 29 - 165). Operating-system
specific constants that may be defined automatically by the compiler are
purged -- they will be replaced with your own. The #undef of the default
definitions is not actually necessary, but it will prevent possible warning
messages from appearing when redefining the compiler default constants.
The #undefs are followed by defines which select the target OS. Only one may
be active at one time. Note the definition to 0 or 1. You could also define
only one OS constant and use #ifdef instead of
#if CONSTANT == 1
but this has the disadvantage that K&R compilers have no "#if
defined(CONSTANT)". Without this command it is hard to build complex
preprocessor-ifs using #ifdef and #ifndef because you can't use Boolean
operators. If you define the constants to 0 and 1, you can build normal
conditional expressions. This is an advantage if you consider that you must
often ask questions like
#if MSDOS && USE_BIOS
Following the OS definition there are some auxiliary definitions used only
under specific OS to identify the target machine. Currently these apply only
to certain generic MS-DOS machines within compatible hardware or BIOS
requiring actual MS-DOS calls (as opposed to BIOS calls or direct hardware
manipulation). The only common example is the early Wang PCs, for which there
is a separate definition.
The operating system definitions are followed by the compiler definitions. A
specific compiler selection is only necessary if more than one is available
under one OS. In my case this is only needed for MS-DOS. But as you can see in
environ.h there is only a definition for MSC. All other compilers I use
identify themselves by doing an automatic constant definition upon startup
(e.g., ___TURBOC___ for Borland's Turbo C). Note that the MSC constant is
overridden if one of the other predefined constants is detected or an OS other
than MS-DOS is active (lines 88 - 106). This feature simplifies proper
configuring of the header-file.
Separate constants for each compiler to allow conditional compilation for
small compiler differences. To avoid code like "#if MSC DLC LC ___TURBOC___
.... "we introduce some language set selection constants (lines 70 - 76). Each
define corresponds to one language feature. If the constant is equated to true
(1) that language feature can be used, otherwise it cannot. All other
decisions are based on these feature selection constants and are much more
readable. Now the example given above takes the more intelligible form

#if USE_VOID.
To avoid modifying all language selection constants each time you change
compilers, environ.h includes an automatic language set selection which
automatically redefines the language set constants based on the compilers' and
OS definitions. While auto selection is currently only functional in the
MS-DOS environment, it can easily be expanded to work under different
operating systems (lines 129 - 164).
To complete the environment definition, environ.h defines the constant ANSI_C
to 0 or 1 in respect to the compilers' C standard (K&R/ANSI) (lines 119 -
127). This constant is currently set based on the state of a language feature
selection (like USE_VOID), but could become more important in the future.
The example header file still lacks one feature, a definition check. All
definitions are accepted as entered. If, for example, the programmer defines
two or more operating systems to 1 the behavior of environ.h is undefined but
clearly erroneous. This could be avoided by checking the entered definitions
to see if two or more definitions are true and aborting compilation if so:
#if MSDOS && UNIX
"Error: Both MSDOS and UNIX
selected"
#endif
This code ask for the error condition and generates a compile-time error if it
detects one. The error message generated by the compiler points at the real
error message in the source module. Examples can be found in CUG library
volume 227 (compatible graphics) in file graphics.h. This file contains
extensive definition checking.
So far environ.h has supplied definitions that allow conditional compilation
in the source units but no automatic porting aids. The balance of the file
addresses this second need. Different compiler data types and modifiers can be
hidden largely by preprocessor defines. For example, if the compiler doesn't
support the void keyword, just define void to nothing, and the void keyword
will disappear. Since you didn't use void originally when writing for that
compiler, this disappearance will cause no problems. Your coding can now be
used with compilers that support void without any additional work.
That is the key feature of modifier definition: you can hide all data type and
modifier differences by simply defining the data type in question to nothing
(as in lines 167 - 195 in environ.h).
Here's another example: if a compiler doesn't support the volatile modifier,
it normally doesn't do the strange optimizations that force you to use
volatile (or they can be turned off), so there is no problem in purging all
volatile modifiers in your source.
This kind of type redefinition allows you to use the types on machines
supporting them without losing backward compatibility. If an older compiler
doesn't support these type modifiers, their extra value is gone but your
program still runs without problems.
Most data types and modifiers can be treated in this manner. (In some cases
you may instead redefine the type to something different -- e.g. define void
to int instead of purging it). However, some types and modifiers, like enum,
can't simply be redefined to nothing or to some other value. If you try to
redefine these types, your program won't compile due to the syntax differences
between defining a "normal" data item and an enum one. Defining an enum is a
process nearly identical to defining a structure or union. Special definitions
are required. You can't hide them by one general define.
You still can use enum on supporting and non-supporting compilers, but you
must define all your enum types using conditional compilation. If the compiler
supports enum, you can use it without difficulty. If not, you define an int
type and use the preprocessor to define the enum tags:
"#if USE_ENUM
typedef enum { A, B } enumtype;
#else typedef int enumtype;
#define A 0
#define B 1
#endif"
This clearly entails more programming work but allows the use of extended
error checking features of compilers that support enum.
You can define your own data types to hide hardware differences, especially
machine word length differences. They ("personal types") have a guaranteed
minimum and maximum precision and are mapped to the actual hardware data type.
By relying on these "personal types," you can write programs that work on
different machines in an expected manner, and you can take memory requirements
into account because there is a guaranteed MAXIMUM precision.
This problem wasn't critical to me, so the example header file contains only
very limited support (lines 258 - 261). Please note that typedefs are used
instead of preprocessor defines.
The next problem area is that of standard library function names and calling
conventions. For example, calling exit() in C will commonly terminate your
program gracefully. Under the Starsys OS, exit() is an OS call something like
abort(). The real exit() function has been called dx_exit(). This causes
problems to all but a few programs and would normally require text
modifications. But that's exactly what the preprocessor can do for you: if
you're running under Starsys, just define a macro named exit which takes one
parameter (the return value). It will expand to a call to dx_exit() with that
given parameter (line 234 - 236).
A similar technique hides the variations among library functions with
different names but identical calling parameters and functionality. Example
macro definitions can be found a few lines above the exit() macro.
File open modes are addressed in lines 241 - 253. Please note that not all
open modes are supported, but the definitions can be easily expanded.


Function Prototyping


Unfortunately, ANSI function prototyping is not supported in every
environment. Rather than sacrificing the extended error checking features that
prototyping offers by not using it at all, you can use prototyping when the
compiler supports it and turn it off when it does not.
Turning off function prototyping is a little harder than turning off an
unknown modifier. First you must build two classes of function prototypes,
external and internal, corresponding to external and static functions. The
external prototype macros appear in lines 197 - 211. This macro expands to
extern func() for a K&R compiler and to extern func(int) for ANSI compilers.
Please note the extra parentheses around int in the PROTT definition. These
parentheses become part of the macro argument and are re-expanded. After
expansion, they are the function parentheses of extern func(int). These
parentheses are especially important if you want to prototype a function with
more than one argument. If there were no inner braces, the macro would have
two arguments, which would force you to write one prototyping macro for every
number of function arguments you will ever use. Given these inner braces the
whole prototype is one macro argument and only one prototyping macro will
satisfy all needs.
Normally you write a function header only once for each internal function. It
is more difficult to hide these prototypes: modern ANSI's style is to write
argument types and names in the function header (e.g. static func(int a)),
while K&R's style is to write the argument names only (static func (a)).
Fortunately ANSI compilers accept function headers written in K&R style, but
usually don't build prototypes for such headers. One solution is to write the
prototype first and then to write the actual function header (STATICPT(func,
(int));\n static func()). In this case the function prototype defines the
function first as extern to prototype it (just as is done in application
header files). While this has worked well with all ANSI-compilers I know of,
I'm not certain that it is guaranteed to be legal under ANSI-standard.
At first glance you may wonder why the prototype does not have the form static
func PROTT((int)) and in fact I am not sure if these constructs are legal.
Most compilers accept the functions to be declared to extern and later
redefined to static. However, the MSC compiler doesn't accept this construct
and generates error messages (at least QC does; CL accepts them with
warnings). Instead, MSC allows both the function prototype and the actual
function header to be declared static -- the approach used in environ.h. If
MSC is active, the prototype attribute is redefined to static. To do this the
macro must have control over the whole prototype line, not just part of it. So
a new construct has been created. The macro has two parameters: the function
name and the prototype. It expands to the correct modifier followed by the
function name and (if selected) the function prototype.
This may be a somewhat unusual macro construct, but remember that the C
preprocessor is mainly a text substitution tool and not part of the actual
compilation process. This allows the preprocessor to make some very strange
modifications to the C source code, including constructs like the static
function prototyping which cannot be done by any C statement. Building such
unusual constructs can give very simple solutions to otherwise intractable
problems. The STATICPT() macro can be found between lines 197 and 211.


Conclusions


As you can see, the environmental header file environ.h can aid in writing
portable programs, especially in the problem areas of data type, modifier and
name differences. In addition, some machine specifics can be hidden and some
newer constructs mapped to work with older compilers.
On the other hand, the header file can't hide some differences (e.g. different
mechanisms for interacting with the user console). Such differences require
special coding that normally should be contained in external modules. But the
header file can help you write these modules too by precisely defining the
target environment. Precise functional definitions are the basis for selecting
the right code sequences in the low-level driver modules (assuming that coding
for more than one environment can be contained in one source unit). The
definitions will aid you in activating slightly different source lines which
you may have in your program.
Thus, a larger porting system is built using three modules. First, the
environment header file describes the environment and hides all differences
possible using the preprocessor and typedefs (mainly text substitutions).
Second, libraries of standardized functions handle larger problem areas that
actually require different coding. Third, conditional compilation within the
source modules hides very small differences where the text-substitution
capabilities of the preprocessor are insufficient and a special function call
makes no sense.
This last option should be limited to cases where it is absolutely necessary,
because conditional compilation is not really portable programming, but is
rather having code for all known environments. If you switch to a new
environment, you must not only write new coding but also look for a problem
area in the source file. To avoid these problems I recommend flagging these
lines with special comments (e.g./*PORT*/).
Related code can be found in the CUG library holdings. Volume CUG227 contains
a compatible graphics system which makes extensive use of the preprocessor's
text substitution capabilities. Volume CUG265, the cpio starter kit, contains
a header file similar to the one discussed here. It also contains programs
using it.

Listing 1
1: /*
2: *e n v i r o n. h
3: * -----------------
4: * This module contains environment specific information.
5: * It's used to make the programs more portable.
6: *
7: * @(#)Copyrigth (C) by Rainer Gerhards. All rights reserved.
8: *
9: * Include-file selection defines are:

10: *
11: * Define Class
12: * ---------------------------------------------------------
13: * INCL_ASSERT assert macro and needed functions
14: * INCL_CONIO low-level console i/o
15: * INCL_CONVERT conversion and classification functions
16: * INCL_CTYPE ctype.h
17: * INCL_CURSES curses.h
18: * INCL_LLIO low-level i/o
19: * INCL_MEMORY memory acclocation/deallocation functions
20: * INCL_MSDOS MS-DOS support
21: * INCL_PROCESS process control
22: * INCL_STDIO stdio.h
23: * INCL_STDLIB standard library functions
24: * INCL_STRING string handling functions
25: */
26: #ifndef ENVIRON_H
27: #define ENVIRON_H
28:
29: #undef MSDOS
30: #undef OS2
31: #undef UNIX
32: #undef STARSYS
33:
34: /*
35: * configurable parameters.
36: * modify the following parameters according to the target environment.
37: */
38:
39: /*
40: * define target operating system
41: */
42: #define MSDOS 0
43: #define UNIX 0
44: #define OS2 1
45: #define STARSYS 0
46:
47: /*
48: * define target machine
49: *
50: * This is auxiluary data only needed for some operating
51: * systems. Currently only needed if MS-DOS is active.
52: */
53: #define IBM_PC 1 /* IBM PC, XT, AT & compatibels */
54: #define WANG_PC 0 /* Wang PC, APC ... */
55:
56: /*
57: * define target compiler (if neccessary)
58: */
59: #undef MSC
60: #define MSC 1 /* Microsoft C */
61:
62: #define AUTO_SEL 1
63: /*
64: * The above #define allowes an automatic language set selection. It is
65: * only functional if the used compiler identifies itself via a #define.
66: *
67: * Note: If AUTO_SEL is set, the parameters below are meaningless!
68: */

69:
70: #define USE_FAR 0 /* use far keyword */
71: #define USE_NEAR 0 /* use near keyword */
72: #define USE_VOID 1 /* use void keyword */
73: #define USE_VOLA 0 /* use volatile keyword */
74: #define USE_CONST 0 /* use const keyword */
75: #define USE_PROTT 0 /* use function prototypes */
76: #define USE_INTR 0 /* use interrupt keyword */
78: /* +--------------------------------------------------------+
79: * End Of Configurable Parameters 
80: * +--------------------------------------------------------+
81: * Please do not make any changes below this point!
82: */
83:
84: #ifdef SYMDEB
85: # define SYMDEB 0
86: #endif
87:
88: /*
89: * Check target_compiler. Note that the MSC switch is overriden if
90: * either __TURBOC__ or DLC are defined.
91: */
92: #ifdef __TURBOC______LINEEND____
93: # undef MSC
94: #endif
95: #ifdef DLC
96: # undef MSC
97: #endif
98: #if STARSYS
99: # undef MSC
100: #endif
101:
102: #if !(MSDOS OS2)
103: # undef MSC
104: # undef AUTO_SEL
105: # define AUTO_SEL 0
106: #endif
107:
108: #if OS2
109: # undef MSC
110: # define MSC 1
111: # undef AUTO_SEL
112: # define AUTO_SEL 1
113: #endif
114:
115: /*
116: * Compiler ANSI-compatible?
117: * (First we assume it's not!)
118: */
119: #define ANSI_C 0
120: #ifdef MSC
121: # undef ANSI_C
122: # define ANSI_C 1
123: #endif
124: #ifdef TURBO_C
125: # undef ANSI_C
126: # define ANSI_C 1
127: #endif
128:

129: #if AUTO_SEL
130: # undef USE_FAR
131: # undef USE_NEAR
132: # undef USE_VOID
133: # undef USE_VOLA
134: # undef USE_CONST
135: # undef USE_PROTT
136: # undef USE_INTR
137: # ifdef __TURBOC______LINEEND____
138: # define USE_FAR 1
139: # define USE_NEAR 1
140: # define USE_VOID 1
141: # define USE_VOLA 1
142: # define USE_CONST 1
143: # define USE_PROTT 1
144: # define USE_INTR 1
145: # endif
146: # ifdef DLC
147: # define USE_FAR 1
148: # define USE_NEAR 1
149: # define USE_VOID 1
150: # define USE_VOLA 1
151: # define USE_CONST 1
152: # define USE_PROTT 1
153: # define USE_INTR 0
154: # endif
155: # ifdef MSC
156: # define USE_FAR l
157: # define USE_NEAR 1
158: # define USE_VOID 1
159: # define USE_VOLA 1
160: # define USE_CONST 1
161: # define USE_PROTT 1
162: # define USE_INTR 1
163: # endif
164: #endif
165:
166:
167: #if !USE_FAR
168: #define far
169: #endif
170:
171: #if !USE_NEAR
172: #define near
173: #endif
174:
175: #if !USE_VOID
176: #define void
177: #endif
178:
179: #if !USE_VOLA
180: #define volatile
181: #endif
182:
183: #if !USE_CONST
184: #define const
185: #endif
186:
187: #if USE_INTR

188: # ifdef MSC
189: # define INTERRUPT interrupt far
190: # else
191: # define INTERRUPT interrupt
192: # endif
193: #else
194: # define INTERRUPT
195: #endif
196:
197: #if USE_PROTT
198: # define PROTT(x) x
199: # ifdef MSC
200: # define STATICPT(func, prott) static func prott
201: # else
202: # define STATICPT(func, prott) extern func prott
203: # endif
204: #else
205: # define PROTT(x) ()
206: # ifdef MSC
207: # define STATICPT(func, prott) static func ()
208: # else
209: # define STATICPT(func, prott) extern func ()
210: # endif
211: #endif
212:
213: #ifdef MSC
214: # define inportb(port) inp(port)
215: # define outportb(port, val) outp(port, val)
216: #endif
217:
218: #ifdef__TURBOC______LINEEND____
219: # define REGPKT struct REGS
220: #else
221: # define REGPKT union REGS
222: #endif
223:
224: #ifdef DLC
225: # define defined(x)
226: # define inportb inp
227: # define outportb outp
228: #endif
229:
230: #if !SYMDEB /* symbolic debugging support */
231: # define STATICATT static
232: #endif
233:
234: #if STARSYS
235: # define exit(x) dx_exit(x)
236: #endif
237:
238: /*
239: * Define open modes according to selected operating system/compiler.
240: */
241: #if MSDOS 0S2
242: # define OPM_WB "wb"
243: # define OPM_WT "wt"
244: # define OPM_RB "rb"
245: # define OPM_RT "rt"
246: #endif

247:
248: #if UNIX
249: # define OPM_WB "w"
250: # define OPM_WT "w"
251: # define OPM_RB "r"
252: # define OPM_RT "r"
253: #endif
254:
255: #define TRUE 1
256: #define FALSE 0
257:
258: typedef unsigned char uchar:;
259: typedef int bool;
260: typedef unsigned short ushort;
261: typedef unsigned long ulong;
262:
263: #define tonumber(x) ((x) - '0')
264: #define FOREVERL() for(;;)
265:
266: /*
267: * Select #include-files depending on target compiler and OS.
268: *
269: * Phases:
270: * 1. Define all include selection constants to true or false.
271: * 2. Select actual include files and include them.
272: * 3. #Undef all include selection constants.
273: */
274: #ifndef INCL_STDIO
275: # define INCL_STDIO 0
276: #else
277: # under INCL_STDIO
278: # define INCL_STDIO 1
279: #endif
280: #ifndef INCL_CURSES
281: # define INCL_CURSES 0
282: #else
283: # undef INCL_CURSES
284: # define INCL_CURSES 1
285: #endif
286: #ifndef INCL_CTYPE
287: # define INCL_CTYPE 0
288: #else
289: # undef INCL_CTYPE
290: # define INCL_CTYPE 1
291: #endif
292: #ifndef INCL_ASSERT
293: # define INCL_ASSERT 0
294: #else
295: # undef INCL_ASSERT
296: # define INCL_ASSERT 1
297: #endif
298: #ifndef INCL_LLIO
299: # define INCL_LLIO 0
300: #else
301: # undef INCL_LLIO
302: # define INCL_LLIO 1
303: #endif
304: #ifndef INCL_PROCESS
305: # define INCL_PROCESS 0

306: #else
307: # undef INCL_PROCESS
308: # define INCL_PROCESS 1
309: #endif
310: #ifndef INCL_MEMORY
311: # define INCL_MEMORY 0
312: #else
313: # undef INCL_MEMORY
314: # define INCL_MEMORY 1
315: #endif
316: #ifndef INCL_STRING
317: # define INCL_STRING 0
318: #else
319: # undef INCL_STRING
320: # define INCL_STRING 1
321: #endif
322: #ifndef INCL_STDLIB
323: # define INCL_STDLIB 0
324: #else
325: # undef INCL_STDLIB
326: # define INCL_STDLIB 1
327: #endif
328: #ifndef INCL_CONVERT
329: # define INCL_CONVERT 0
330: #else
331: # undef INCL_CONVERT
332: # define INCL_CONVERT 1
333: #endif
334: #ifndef INCL_MSDOS
335: # define INCL_MSDOS 0
336: #else
337: # undef INCL_MSDOS
338: # define INCL_MSDOS 1
339: #endif
340: #ifndef INCL_CONIO
341: # define INCL_CONIO 0
342: #else
343: # undef INCL_CONIO
344: # define INCL_CONIO 1
345: #endif
346:
347: #if INCL_STDIO && !(INCL_CURSES && UNIX)
348: # include <stdio.h>
349: #endif
350: #if INCL_CURSES && UNIX
351: # include <curses.h>
352: #endif
353: #if INCL_CTYPE INCL_CONVERT
354: # include <ctype.h>
355: #endif
356: #if INCL_ASSERT
357: # include <assert.h>
358: # ifdef MSC
359: # undef INCL_PROCESS
360: # define INCL_PROCESS 1
361: # endif
362: # ifdef __TURBOC______LINEEND____
363: # undef INCL_PROCESS
364: # define INCL_PROCESS 1

365: # endif
366: #endif
367: #if INCL_LLIO
368: # ifdef MSC
369: # include <fcntl.h>
370: # include <io.h>
371: # endif
372: #endif
373: #if INCL_PROCESS
374: # ifdef MSC
375: # include <process.h>
376: # endif
377: #endif
378: #if INCL_MEMORY
379: # include <malloc.h>
380: #endif
381: #if INCL_STRING
382: # if ANSI_C
383: # include <string.h>
384: # endif
385: #endif
386: #if INCL_STDLIB INCL_CONVERT
387: # if ANSI_C
388: # include <stdlib.h>
389: # endif
390: #endif
391: #if INCL_CONIO
392: # ifdef __TURBOC______LINEEND____
393: # include <conio.h>
394: # endif
395: # ifdef MSC
396: # include <conio.h>
397: # endif
398: #endif
399: #if MSDOS && INCL_MSDOS
400: # include <dos.h>
401: #endif
402:
403:
404: /*
405: * Purge utility #defines.
406: */
407: #undef INCL_STDIO
408:
409: #endif


















Writing Standard Headers: The String Functions


Dan Saks


Dan Saks is the owner of Saks & Associates, which offers training and
consulting in C and C++. He is a member of X3J11, the ANSI C committee. He has
an M.S.E. in computer science from the University of Pennsylvania. You can
write to him at 287 W. McCreight Ave., Springfield, OH 45504 or call (513)
324-3601.


In a recent letter to The C Users Journal, Phil Cogar of N.S.W. Australia
complained that much of the C source code appearing in this and other
programming journals contains references to headers such as <stdlib.h> that
are not published along with the code. He observed that if your compiler
provides these headers, then typing in the code and getting it to run is
usually easy; without them, it may be impossible. He has a legitimate
complaint, but as editor Robert Ward points out in his response, it's often
impractical to publish the headers with the code. (See The C Users Journal,
October 1989, p.138.)
To get the programs to run, you can write your own standard headers to go with
your existing compiler and library. Although writing an entire Standard C
library from scratch is a big chore, you can fill many of the gaps in an
existing library by yourself in only a few days.


The Standard Headers


The fifteen headers specified by the Standard are summarized in Table 1. Most
of them declare a set of related library functions, along with any macros and
types needed to call them. A few headers don't contain any functions; they
simply define useful macros and types that have nowhere else to go. Some
macros and types appear in more than one header, but each function is declared
only once.
Most compilers supply additional headers. For example, UNIX compilers add
headers such as <direct.h>, <fcntl.h> and <process.h>. Many MS-DOS compilers
supply some of the UNIX headers, along with others such as <bios.h>, <conio.h>
and <dos.h>. None of these headers is covered by the C Standard. Some UNIX
headers have been formalized by the IEEE 1003.1 POSIX Portable Operating
System Standard, but many aren't covered by any non-proprietary standard. A C
program using library headers other than those listed in Table 1 will not be
portable to all Standard C implementations.
A program accesses the contents of a standard header by referencing the header
in an include directive, such as
#include <stdio.h>
Headers are often referred to as "include files" because they are almost
always implemented as source files with the same names. Other implementations
are permitted, and so the Standard is careful not to refer to them as files.
Nevertheless, "headers" and "include files" are generally understood to mean
the same thing.


Determining What You Already Have


Before starting to fix your standard headers, you should look to see what you
already have. Headers are usually easy to locate them. For example, on UNIX
systems the headers for cc are usually in /usr/include (see the subheading
FILES on the manual page(s) for cc(1) in your UNIX manual). The default setup
for Turbo C on MS-DOS places the headers in \turboc\include. Most MS-DOS
compilers do something similar. The headers for DECUS C on my PDP-11 are in
the same subdirectory as my compiler executables, which is a subdirectory with
the logical name C:.
You should not be surprised to find that you already have several of the
standard headers. The standard library is not pure invention; it's the result
of an effort to "codify common existing practice." You will almost certainly
find a version of <stdio.h> -- the only standard header used by Kernighan and
Ritchie in the first edition of The C Programming Language. <ctype.h> is also
extremely common. Beyond that, it's hard to say just how many headers you're
likely to find.
For example, the DECUS C compiler has only four of the standard headers:
<ctype.h>, <setjmp.h>, <stdio.h>, and <time.h>. The UNIX 4.2 BSD compiler (cc)
has these four, plus <assert.h>, <errno.h>, <math.h>, and <signal.h>. It also
has <varargs.h>, which is very similar to <stdarg.h>. Turbo C 2.0, Microsoft C
5.1 and Zortech C 1.07 (all for MS-DOS) have every header except <locale.h>,
but very few of the headers among all three compilers are exactly as they
should be.


Where To Put New Headers


Before you start creating and modifying headers, you should think about where
to put them. You can throw caution to the wind and put the new headers in the
same directory as your existing ones (assuming you have the access rights),
but then you run a serious risk that some of your old code won't work with the
new headers. I recommend creating a directory for your new headers and
reconfiguring your compiler environment to search this new directory before it
searches the old one. Remove the new headers from the search if you have to.
Compiler environments vary so much that I can't explain how to do this for
everyone, but I will show you what I've done on a few different systems:


On UNIX 4.2 BSD:


I put the new headers in a subdirectory /usr/include within my home directory
(/u/dsaks). I wrote a shell script called cc that simply contains
/bin/cc -I/u/dsaks/usr/include $*
This script invokes the UNIX C compiler (in /bin) with the -I option. -I tells
the compiler to search for include files in the named directory before
searching in the standard places. The $* passes all the arguments to the cc
script through to the C compiler.
I put this script in /u/dsaks/usr/bin, and added this directory name to my
shell path variable. I made the script executable by using
chmod +x cc
This cc command compiles with the new headers. If I need to omit them, I
simply rename the command with
mv cc cc.new
so the cc command reverts to the one in /usr/bin (without -I).


On MS-DOS 3.0 and higher:


I put the original headers for Microsoft C and Quick C in \ms\include, and my
new headers in \ms\usr\include. Both compilers support the -I option, so you
can create a cc.bat command file like the UNIX shell script. Yet, Microsoft
gives you an easier alternative. The Microsoft compilers use the INCLUDE
environment variable to define the search path for include files. I use two
different command files to configure the compiler environment. My msnew.bat
uses
set INCLUDE=c:\ms\usr\include;c:\ms\include
to put the new headers in the search path, while msold.bat uses
set INCLUDE=c:\ms\include

to take them out.
Other MS-DOS compilers require slightly different approaches. Zortech's
command line compiler, ZTC, uses the INCLUDE environment variable just like
Microsoft C, but their integrated environment, ZED, gets its search path from
a configuration file maintained by a utility called ZCONFIG. Borland's Turbo C
lets you specify the search path in a file called TURBOC.CFG. Consult your
compiler user's guide for details.


On RT-11 V5.0 and higher:


The DECUS C compiler has a built-in preprocessor that's virtually useless.
Fortunately, the compiler is distributed with MP, a decent preprocessor from
the UNIX User's Group. My compilation command files disable the built-in
preprocessor (with the /M compiler switch) and use MP instead.
MP has a preset search path for include files. First it looks in the directory
with the logical name LB:, then it looks in C:, and finally it looks in SY:. I
put the original headers in a directory assigned to C: and the new headers in
another directory assigned to LB:. I can remove the new headers from the
search by deassigning LB:.


<string.h>


I'll begin with <string.h> because it's often missing and yet is easy to
create. Once you have it, you'll use it frequently.
<string.h> (see Table 2) declares the string handling functions in the
library. It also declares one macro, NULL, and one type, size_t, that are
needed to use these functions.
There is no universal way to define NULL ---- you tailor the definition to
your machine's architecture. The easiest way to obtain a definition for NULL
is to steal one from <stdio.h>. If you can't find a definition there or in
some other header, then you should probably use
#define NULL ((void *)0)
if your compiler supports the void * type, or
#define NULL ((char *)0)
if it doesn't. If you know that your pointers have the same size as type int,
you can use simply
#define NULL 0
If the pointers on your machine have the same size as type long int, you can
use
#define NULL OL
I prefer to use the casts to determine the size of NULL. However, I suspect
you'll find that one of the latter two forms is already used in your existing
headers. Whichever form you choose, use it consistently.
Most MS-DOS C compilers provide pointers in two different sizes, near and far.
The headers in these compilers use conditional compilation to select the
appropriate definition for NULL, something like
#ifdef _NEAR_POINTERS
#define NULL 0
#else
#define NULL OL
#endif
If your <string.h> needs a definition like this, you should find it in one of
your existing headers. (For more insight into the possible definitions for
NULL, see "Doctor C's Pointers: The 'NULL' Macro and Null Pointers" by Rex
Jaeschke in The C Users Journal, Sept/Oct, 1988.)
NULL is defined in several standard headers. The headers may be included in
any order, and a given header may be included more than once, so you must
insure that the repeated definitions for NULL don't conflict with each other.
Most implementations permit "benign" macro redefinitions (repeated definitions
formed by identical sequences of tokens) as specified in the Standard. In this
case, make all the definitions the same. If your preprocessor doesn't allow
any redefinitions, you will have to put a "protective wrapper" around each
one, as in
#ifndef NULL
#define NULL ((void *)O)
#endif
size_t is the type of the result of the sizeof operator. The Standard says
that it should be an unsigned integral type, so use either
typedef unsigned size_t;
or
typedef unsigned long size_t;
You can select the appropriate definition using the program in Listing 1.
In many C implementations, sizeof yields a signed int value. You should still
define size_t as unsigned, so that operations on objects of that type have the
proper unsigned behavior. You can always use size_t to cast the possibly
negative result of sizeof to its 'true' unsigned value, as in
if ((size_t)sizeof(something_big) > 0)
For more about size_t and sizeof, see "Doctor C's Pointers: Exploring the
Subtle Side of the 'sizeof' Operator" by Rex Jaeschke in The C Users Journal,
Feb., 1988 or see Rex's book, listed in References.
As with NULL, size_t appears in several standard headers. The Standard and
many implementations do not allow typedef redefinitions (even "benign" ones)
in the same scope, so you may need a protective wrapper around each
definition. For example
#ifndef _SIZE_T_DEFINED
typedef unsigned size_t;
#define _SIZE_T_DEFINED
#endif
You don't have to use the name _SIZE_T_DEFINED. Any identifier beginning with
an underscore followed by an upper-case letter or another underscore will do.
The Standard reserves these names for the implementation of the compiler (of
which the headers are part).
Since benign macro redefinitions are usually allowed, you may be tempted to
define size_t as
#define size_t unsigned
in order to eliminate the protective wrapper. I have seen this done in some
"ANSI-conforming" compilers. Although you will probably never notice the
difference, the macro definition is wrong because it changes the scope of
size_t. Use the typedef.
And now for the functions. Most older C compilers don't support prototypes, so
you might have to delete or "comment out" the parameter lists. Some functions
return void *. If your compiler won't accept that type, use char *.
You will find that your library contains some, but not all, of the string
functions. Sometimes you will find a standard C function under an archaic
name. Many recent books on C have an appendix that details the functions in
the standard library. (See references at the end of the article.) You should
compare the functions in the standard library with the functions in your
compiler's library to find as many matches as you can.
For example, some implementations use index instead of strchr. In this case,
you could declare strchr as
char *index();
#define strchr(s, c) index(s, c)
but there is a hazard. If you forget that strchr is really index, and write
another function called index, you will inadvertently redefine strchr. (This
is an excellent way to test your debugging skills.) This macro definition
should only be used as an interim fix until you add a compiled version of the
missing function to the run-time library.

What about functions that are completely missing? Should you still put their
declarations in <string. h>? The answer is a definite maybe.
Suppose that memchr is missing from your library. memchr returns a void *, but
if you leave the declaration out of <string. h>, the compiler will assume it
returns an int. When you compile
char *p, s[10];
p = memchr(s, 'x', 10);
you may get a spurious warning about an illegal pointer assignment, but
compilation will continue. You won't know what's really happening until the
linker reports that memchr is undefined. Under these circumstances, you should
declare memchr in the header to eliminate the unnecesary warnings.
If you use a Lint-like program checker that can detect undeclared functions
(or if your compiler has such an option), then don't declare functions that
are missing from the library. When you reference a missing function, you will
still get a meaningful error message, but won't have to wait for the linker to
tell you what you already know.
Listing 2 shows the <string.h> that I use on UNIX 4.2 BSD. It includes some
interim macro definitions for missing functions. The #ifndef ... #endif
wrapper around the entire header prevents repeated compilation of the
declarations if the header is included more than once. The wrapper isn't
needed for protection since you can redeclare functions (provided all
declarations in the same scope are the same), and everything else in the
header is either benign or protected.
I added the wrapper to simplify debugging. While debugging macros, I sometimes
look at the preprocessor output to verify the expansions. Eliminating
redundant headers from preprocessor output makes it easier to read. The
comment at the header's beginning is not in the wrapper so it still appears
wherever the header is included, even if the rest of the header does not.
One final word of caution. In Listing 2, strlen is declared to return a
size_t, even though strlen is actually defined in the library to return an
int. On machines where a signed int to unsigned int conversion performs no
transformation of the data (as on twos-complement machines), strlen returning
a size_t is perfectly safe. On other machines, you should leave the
declaration as
int strlen();
so that the compiler can recognize that
size_t n;
n = strlen(s);
involves a signed to unsigned conversion and generate the proper code. You
should also cast the result of strlen to size_t whenever strlen is used in an
expression with other ints, such as
if ((size_t)strlen(s) > 0)
This is the same technique used with sizeof when it returns an int.


Conclusion


In this article I've tried to show why it's impossible to just publish a
single portable version of the standard headers. The headers provide a
portable definition of the Standard C environment, but they do it in a
non-portable way.
Rather than writing the missing string functions in the library, I suggest you
write the remaining standard headers. Doing so solves more portability
problems and gives you the definitions you need to compile new library
functions as you write them. In <string. h>, you've already seen many of the
design problems, so most of the remaining work is simply determining what goes
into the other headers.
References
Darnell, Peter and Margolis, Philip, Software Engineering in C (1988,
Springer-Verlag).
Gardner, James, From C to C: An Introduction to ANSI Standard C (1989,
Harcourt Brace Jovanovich).
Jaeschke, Rex, Portability and the C Language, (1989, Hayden Books).
Plauger, P.J. and Brodie, Jim, Standard C (1989, Microsoft Press).
Ritchie, Dennis and Kernighan, Brian, The C Programming Language, 2nd. ed.
(1988, Prentice-Hall).
Table 1
Standard Headers

assert.h - program diagnostics
ctype.h - character testing and case mapping
errno.h - error reporting
float.h - floating type characteristics
limits.h - integral type sizes
locale.h - local customs
math.h - mathematics
setjmp.h - non-local jumps
signal.h - signal handling
stdarg.h - variable-length arguments lists
stddef.h - common definitions
stdio.h - input and output
stdlib.h - general utilities
string.h - string handling
time.h - date and time utilities

Table 2
Summary of <string.h>

Macros:

NULL

Types:

size_t


Function Prototypes:

void *memchr(const void *, int, size_t);
int memcmp(const void *, const void *, size_t);
void *memcpy(void *, const void *, size_t);
void *memmove(void *, const void *, size_t);
void *memset(void *, int, size_t);
char *strcat(char *, const char *);
char *strchr(const char *, int);
int strcoll(const char *, const char *);
int strcmp(const char *, const char *);
char *strcpy(char *, const char *);
size_t strcspn(const char *, const char *);
char *strerror(int);
size_t strlen (const char *);
char *strncat(char *, const char *, size_t);
int strncmp(const char *, const char *, size_t);
char *strncpy(char *, const char *, size_t);
char *strpbrk(const char *, const char *);
char *strrchr(const char *, int);
size_t strspn(const char *, const char *);
char *strstr(const char *, const char *);
char *strtok(char *, const char *);

Listing 1
/*
* write the definition for size_t
*/
#include <stdio.h>

main()
{
printf("typedef unsigned%s size_t;\n",
sizeof(sizeof(int)) == sizeof(int) ? "" : "long");
}


Listing 2
/*
* string.h - string hadling (for cc on UNIX 4.2 BSD)
*/

#ifndef _STRING_H_INCLUDED

#define NULL ((char *)0)

#ifndef _SIZE_T_DEFINED
typeder unsigned size_t;
#define _SIZE_T_DEFINED
#endif

char *strcat();
int strcmp();
char *strcpy();
size_t strlen();
char *strncat();
int strncmp();
char *strncpy();


/*
* interim macro definitions for functions
*/
char *index();
#define strchr(s, c) index(s, c)

extern int sys_nerr;
extern char *sys_errlist[];
#define strerror(e) \
((e) < sys_nerr ? sys_errlist[e] : "?no message?")

char *rindex();
#define strrchr(s, c) rindex(s, c)

/*
* missing functions
*/
char *memchr();
int memcmp();
char *memcpy();
char *memmove();
char *memset();
int strcoll();
size_t strcspn();
char *strpbrk();
size_t strspn();
char *strstr();
char *strtok();
size_t strxfrm();

#define _STRING H_INCLUDED

#endif






























UNIX 'termcap' Facility Improves Portability By Hiding Terminal Dependencies


Ronald Florence


Ronald Florence is a novelist, sheep farmer, occasional computer consultant,
and UNIX addict. He can be reached at ron@mlfarm or ...
{hsi,rayssd}!mlfarm!ron.


For programmers accustomed to writing for single-user systems, UNIX (and
Xenix) holds some quick surprises. All those carefully optimized, hand-coded
screens, the lightning-fast displays that write to the screen buffer, even
"well-behaved" routines that rely on BIOS calls, are suddenly useless.
Terminal displays, including the console, are treated as teletype devices
under UNIX. To perform even the simplest screen display function, such as
clearing the screen, the program must send the proper screen control sequence.
In effect, all screen displays are comparable to using the ANSI.SYS driver
under MS-DOS.
If the UNIX system had only a single terminal or if only one type of terminal
were used on the system, it would be easy enough to hand-code the proper
screen control sequences. Indeed, even if several different terminals are used
on a system, the screen control sequences can be hand coded. For example, the
function in Listing 1 could be used to clear screens.
For a closed system where most of the output is teletype format, with only
simple screen display commands, your programs may not need much more.
But what if the system is not closed? What if there are outside logins using a
variety of terminals? And what if you want to write screen displays that
utilize a wide range of terminal capabilities, including automargins and
optimized cursor motion, and make sure those displays are scaled to the size
of the terminal display? And what if some of the terminals using the system
require padding at certain speeds or have other quirks that make them
unsuitable or tricky to use with fancy screen display programs? It is possible
to keep adding options to code like Listing 1, but by the tenth terminal type,
the code starts to look like linguini.
The alternative is to use the termcap and terminfo databases of screen display
parameters and control sequences which are provided with most UNIX systems.
Termcap, which was developed at Berkeley, uses an ASCII database; the terminfo
database is compiled. A curses library of screen display and terminal input
functions is supplied with both systems. Terminfo is theoretically faster; it
supports many terminal capabilities which are normally not encoded into the
termcap database, and the curses library supplied with terminfo has many
capabilities which are not supported under termcap curses. The termcap
database is substantially easier to modify, and there are ways to incorporate
many of the capabilities of the terminfo curses into programs running on
termcap systems. This article will discuss only termcap, which is used by
Xenix and by most BSD systems.
The UNIX documentation describes the termcap routines as "low level" and the
curses routines as "higher level," in much the way that troff/nroff is a low
level formatting package, and the formatting macro packages (MM or MS) are
high level. Actually, the analogy is not really appropriate. Curses is a
screen optimization package with some convenient windowing functions. Termcap
is a straightforward package of functions to access the database of screen and
keyboard control sequences.
The termcap database is normally in the file /etc/termcap. Comments in the
file are prefaced with a # character. All lines which do not begin with the #
are considered part of the database.
Each entry in the database represents a different terminal. The entry begins
with alternate names of the terminal, separated by characters. Usually the
first name listed for the terminal is a special two-character abbreviation,
used by some older programs. The second name is used by most utilities, such
as the editor vi. The last name listed is the full name of the terminal, and
is the only name which can have blanks inserted for readability. Thus:
d1vt100vt-100pt100pt-100dec vt100:
are the names of a DEC vt-100. If you add terminal descriptions to the termcap
database, make sure that every name in your addition is unique.
The capabilities of the terminal are listed after the name, separated from one
another by colons. Newlines in the entry must be escaped with a backslash. The
capabilities are strings, boolean, or integers. Most are mnemonic. Boolean
capabilities are true if named. Strings follow an equals sign (=). Integers
follow a #. There are no spaces or tabs within capabilities or between them,
and an entry carried to a second line must repeat the :. Thus:
MTmytermMy Special Terminal:\
bs:am:cl =\E[J:ho=\E[H:lines#24:
indicates that myterm can backspace (bs), has automatic margins (am), that
there are 24 lines displayed on the screen, and gives the sequences that
should be sent to clear the screen (cl) and home the cursor (ho).
Several special sequences are used to encode the strings:\E is the escape
character (0x1b); ^X is "Control-X" or any other control key; \n, \r, \t, \b,
and \f are newline, carriage return, tab, backspace, and form feed; \^ is ^,
and \\ is \; All non-printing characters may be represented as octal escapes;
the :, which is used to separate capabilities in each entry, must be entered
as \072 if used in a string. Null characters can be entered as \200 because
the routines that process termcap entries strip the high bits of the output
late, so that \200 comes out \000.
Padding can be encoded into the strings by prefacing the string with an
integer, representing milliseconds of delay. An integer and a * indicate that
the delay is proportional to the number of lines involved in the execution of
the command. When the * is used, the delay can be stated in tenths of a
millisecond, so that 3.5* before the string for ce (clear to end of line)
would mean that the command requires 3.5 milliseconds of padding for each line
that is to be cleared.
Terminals which are identical to another entry with few exceptions can make
use of the tc string and the @ negator.
NTnewtermMy alternate terminal:lines=25:@bs:tc=vt100:
describes a terminal with 25 lines, no backspace capability, but otherwise
identical to a vt100.
One caution in using entries with tc encoding: programs with a fixed stack
(such as Xenix 286) may crash when reading tc encoded entries. The cure is to
make the stack larger with the -F option on the compile command line.
The cursor addressing string (cm) is coded with printf-like escapes. These are
described in detail in the termcap (M) entry in the UNIX documentation.
In addition to the regular termcap capabilities, which begin with lower case
letters, some UNIX systems utilize extensions. Xenix uses a variety of upper
case termcap entries to indicate special PC keys: PU for Page Up, EN for End,
GS for Start-Character-Graphics-Mode, and pseudo-mnemonics for eight-bit PC
graphics drawing characters. GNU Emacs uses upper- case capabilities to
describe terminal command sequences which are not generally used in termcap,
such as AL and DL for adding and deleting multiple lines. Programs which use
these extended termcap capabilities may not be portable to other UNIX systems.
The termcap library provides functions to retrieve the encoded information
from the database. The termcap routines first search the environment for a
TERMCAP variable. If it is found, does not begin with a slash, and the
terminal type matches the environment string TERM, the TERMCAP string is read.
If it begins with a slash, it is read as the pathname of the termcap database
(instead of the default /etc/termcap). Using the environment variable instead
of searching the database will speed up the development of new termcap
entries. If your system has a tset command which supports separate TERM and
TERMCAP environment entries, it will also speed the startup of programs which
use termcap.
One obvious use for the termcap database is in displaying formatted text to
the screen. Although there are wordprocessing programs available to run under
UNIX and/or Xenix, much text processing in UNIX systems is done by using an
editor (vi or emacs) to prepare the text with nroff/troff formatting codes,
usually with one of the macro packages such as MM. The formatted file is then
piped to a printer or type-setter, or to a screen display for proofing.
Although it is possible to prepare nroff terminal driving tables to encode the
screen control sequences needed for such formatting features as bold type,
italics or underlining, a different table would have to be encoded and
compiled for each terminal, and the user would have to indicate the terminal
type on the nroff command line:
nroff -cm -Tmyterm myfile
Also, the nroff terminal driving table format was created when daisy-wheel
printers were the cutting edge of desktop hardcopy capabilities, and the
coding is sometimes awkward to adapt to the capabilities of a terminal
display.
For simple text formatting, it is easier to parse the default nroff output,
which uses backspaces and overstrikes to generate underlined or bold
characters, and use termcap to look up the appropriate standout (bold) and
underline sequences. The program in Listing 2 (Bold.c), uses termcap library
functions to look up the terminal screen control sequences for so and se
(standout start and standout end), us and ue (underline start and underline
end), and sg, which is an integer coded quantity indicating how many spaces
the attribute change to standout mode requires. For terminals with multiple
fonts, the switchover to italic font could be encoded in us, so that
underlined text would be displayed in italics. A bold screen attribute could
be encoded in so and se, so that bold text would be displayed in bold font,
instead of in reverse video. Alternately, new termcap entries could be created
to hold the screen control sequences for bold or italic fonts.
The termcap access functions are simple and straightforward. To parse the
database, you need to allocate a buffer of 1024 characters (tbuf in Listing
2), to hold the entire termcap entry as it is retrieved by tgetent(). This
buffer must be retained through all calls to the three functions which parse
capabilities: tgetstr(), tgetflag(), and tgetnum(). Another buffer (sbuf in
Listing 2) should be allocated for the strings which will be retrieved by
tgetstr(). This should be a static buffer. The tgetstr() function is passed
the address of a pointer to this buffer. As string capabilities are read, they
are placed in the buffer, and the pointer is advanced. Using a static buffer
saves the overhead of allocating space for each string as it is retrieved.
The termcap library also provides a function tputs(), which correctly sends
screen control sequences to the display, including any needed padding. tputs()
requires a pointer to a user-supplied function which can display a single
character. The function prch() (Listing 1) invokes the macro putchar().
Although it is not used here, the termcap library includes one other function,
tgoto(), which uses the cm (cursor movement) string to go to a desired column
and line. Because togoto() will output tabs, programs which make tgoto() calls
should turn off the TAB3 bit when setting the line protocol.
The function putout() in Listing 2 is not really necessary. It is used here to
check for insertions of ^G (0x7) in the text files. ^G was chosen because it
passes through nroff transparently. It is used to trigger expanded font in
files sent to the printer. In Bold.c, it triggers the insertion of a space
between characters to simulate expanded font.
Termcap can also be used to retrieve the sequences sent by non-ASCII keys,
like the arrow or functions keys. Although the termcap curses library does not
use the arrow or functions keys, the keys can be added to programs which use
curses for screen control by making a second set of termcap calls (curses
makes it own calls to termcap), and then reading for the arrow or function key
sequences in a getkey() routine (see Listing 3, keys.c).
Reading arrow keys for terminals which use a single character code for each
arrow (such as ^H, ^J, ^K, ^L) is simple, but many terminals, such as the PC
console, send escape-prefaced strings (ESC[A, ESC[B, etc.) when the arrow keys
or other non-ASCII keys are pressed. Some systems may balk at reading strings
with a simple read() system call. It is worth fiddling with the VMIN and VTIME
values in structure termio if you cannot read key sequences with the code in
getkey(). The values in function fixquit() in Listing 3 are a good start.
The alternative is to put the strings together out of characters read one at a
time. This may be the most reliable technique for an editor or other program
that reads repeated sequences of fast input characters that might be
misinterpreted, such as an ESC followed by a [ and an alphabetic character,
which an ANSI terminal might interpret as a screen control sequence. The trick
if you are reading a character at a time is to distinguish between a lone ESC
(0x1b) and an ESC sent as the first character of an escape sequence. One
technique is to set a timeout alarm. If you get the characters that would
constitute a key string before the timeout, return the key string, otherwise
return an ESC followed by individual characters. The whole procedure takes
tinkering, and fast typists can foul it up. Hence, using a read() call is
simpler.
One problem that can arise with the arrow key is that ^\, the UNIX "quit"
character, is used as an arrow key on some terminals. Even if the "quit"
signal is disabled, the keys will still be intercepted. The easiest fix to the
problem is to change the "quit" key to an impossible value. The function
fixquit() does this.
The global variable ttytype is set by the curses termcap routines, which in
this program are called before lookupkeys(). The ttytype could be set by a
call to getenv(), as in the code for Listing 1. The header file in Listing 4
(keys.h) defines integer equivalents for the arrow and function keys; these
defines can be used in switch statements. (The values given are those used in
the terminfo header files.)
What termcap cannot do is to optimize screen output by cutting down the
overhead of repeated cursor movement sequences. The output routines in the
curses library do a fair job and are simple to use. The code for life.c in
Listing 5 uses these routines along with the arrow key routines from key. c,
and while the speed of output cannot compare with an optimized routine writing
directly to screen memory, it is quick enough on a console or a terminal
running at 19,200 baud.
Highly optimized screen output which requires even more efficiency could mean
a journey into the treacherous code of screen display routines which calculate
the cost of each move. One such package is the display routines in the Gosling
Emacs code, which quite properly carries a dire warning to those who would
venture into the tangles of the code.

Listing 1
cls ()
{
char *getenv(), *term, *cl;

term = getenv ("TERM");
if (!strcmp(term, "ansi"))
cl = "\033[2J\033[H";
else if (!strcmp(term, "wy50"))

cl = "\033*";
/* add other terminals ... */
/* if all else fails, try a form feed */
else
cl = "\f";
fputs (cl, stderr);
}


Listing 2
/*
* Bold.c - filters nroff output for terminal display
* displays bold in SO, underlines, expanded font
* copyright 1987 Ronald Florence
*/

#include <stdio.h>

#define UL 01
#define BOLD 02
#define ULSTOP 04
#define Bold() tputs(so, 1, prch), att = BOLD
#define Stopbold() tputs(se, 1, prch), att &= ~BOLD
#define Uline() tputs(us, 1, prch), att = UL
#define Stopuline() tputs(ue, 1, prch), att &= ~(ULULSTOP)

prch(c)
register char c;
{
putchar(c);
}

char *so, *se, *us, *ue;

main()
{
static char sbuf[256];
char tbuf[1024], *fill = sbuf, *tgetstr(), *getenv();
register a, c;
int i, att = 0;

if (tgetent(tbuf, getenv("TERM")) == 1 && tgetnum("sg") < 1)
{
so = tgetstr("so", &fill);
se = tgetstr("se", &fill);
us = tgetstr("us", &fill);
ue = tgetstr("ue", &fill);
}
a = getchar();
while ((c = getchar()) != EOF)
{
if (a == '_')
{
if (c == '_' && (att & UL) == 0)
Uline();
else if (c == '\b') /* nroff italics */
{
if ((a = getchar()) == EOF)
a = 0;

c = getchar();
if ((att & UL) == 0)
Uline();
}
if (c != '_' && (att & UL))
/* c is the last underline */
att = ULSTOP;
}
else if (c == '\b')
{
if ((c = getchar()) != a)
{ /* Not a bold: print the character
and pass the \b to be printed. */
putout(a);
a = '\b';
}
else
{
if ((att & BOLD) == 0)
Bold();
for (i = 0; i < 5; i++)
if ((c = getchar()) != a && c != '\b')
break;
}
}
else if (att & BOLD)
Stopbold();
putout(a);
if (att & ULSTOP)
Stopuline();
a = c;
}
}


putout(c)
register char c;
{
static int expanded;

if (c == 07) /* ^G signals expanded font */
{
expanded++;
return(0);
}
putchar(c);
if (expanded)
{
if (c == '\n')
expanded = 0;
else
putchar(' ');
}
}


Listing 3
/*
* keys.c - gets arrow and function keys from termcap,

* returns terminfo codes
* changes quit key for use as arrow
*
* define NO_SYSV for versions of curses that do not look up
* arrow & function keys from termcap
*
* copyright 1988 Ronald Florence
* changed VMIN & VTIME for wy99 @ 9600 ron@mlfarm (7/11/88)
*/

#include <curses.h>
#ifndef KEY_DOWN
#include "keys.h"
#endif

#define NKEYS 16

char
#ifdef NO_SYSV
*tcap_ids[] = {
"kd", "ku", "k1", "kr", "kh", "kb",
"k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7", "k8", "k9", 0
},
#endif
*fkeys[NKEYS];

lookupkeys()
{
#ifdef NO_SYSV
static char sbuf[256];
char **key, tbuf[1024], *fill = sbuf, *tgetstr();
int i = 0;

tgetent(tbuf, ttytype);
for (key = tcap_ids; *key; ++key)
fkeys[i++] = tgetstr(*key, &fill);
#else
fkeys[0] = KD;
fkeys[1] = KU;
fkeys[2] = KL;
fkeys[3] = KR;
fkeys[4] = KH;
fkeys[5] = KB;
fkeys[6] = K0;
fkeys[7] = K1;
fkeys[8] = K2;
fkeys[9] = K3;
fkeys[10] = K4;
fkeys[11] = K5;
fkeys[12] = K6;
fkeys[13] = K7;
fkeys[14] = K8;
fkeys[15] = K9;
#endif
fixquit();
}


getkey()

{
char cmd[6];
register k;

k = read(0, cmd, 6);
cmd[k] = '\0';
for (k = 0; k < NKEYS; k++)
if (strcmp(cmd, fkeys[k]) == 0)
return (k + KEY_DOWN);
return ((int) *cmd);
}


fixquit()
{
struct termio new;

ioctl(0, TCGETA, &new);
new.c_cc[VQUIT] = 0xff; /* in case QUIT is an arrow */
new.c_cc[VTIME] = 1; /* minimum timeout */
new.c_cc[VMIN] = 3; /* three characters satisfy */
ioctl(0, TCSETA, &new);
}


Listing 4
/*
* keys. h
* copyright 1988 Ronald Florence
*
* use with curses programs that need extended keyboard
* (if tcap.h does not include the defines)
*/

#define KEY_DOWN 0402
#define KEY_UP 0403
#define KEY_LEFT 0404
#define KEY_RIGHT 0405
#define KEY_HOME 0406
#define KEY_BACKSPACE 0407
#define KEY_F0 0410
#define KEY_F(n) (KEY_F0 + (n))


Listing 5
/*
life.c
copyright 1985, 1988 Ronald Florence

compile:
cc -O -s life.c keys.c -lcurses -ltermcap -o life
*/


#include <curses.h>
#include <signal.h>
#ifndef KEY_DOWN
#include "keys.h"
#endif


#define ESC 0x1b
#define life '@'
#define crowd (life + 4)
#define lonely (life + 2)
#define birth (' ' + 3)
#define minwrap(a,d) a = --a < 0 ? d : a
#define maxwrap(a,d) a = ++a > d ? 0 : a
#define wrap(a,z) if (a < 0) (a) += z; \
else if (a > z) (a) = 1; \
else if (a == z) (a) = 0
#define MAXX (COLS-1)
#define MAXY (LINES-3)
#define boredom 5

typedef struct node
{
int y, x;
struct node *prev, *next;
} LIFE;

struct
{
int y, x;
} pos[8] = { { 1,-1}, {1, 0}, {1, 1}, {0, 1},
{-1, 1}, {-1, 0}, {-1,-1}, { 0,-1}
};

LIFE *head, *tail;

extern char *malloc();
char
*rules[] = {
" ",
"The Rules of Life:",
" ",
" 1. A cell with more than three neighbors",
" dies of overcrowding.",
" 2. A cell with less than two neighbors",
" dies of loneliness.",
" 3. A cell is born in an empty space",
" with exactly three neighbors.",
" ",
0
},

*rules2[] = {
"Use the arrow keys or the vi cursor keys",
"(H = left, J = down, K = up, L = right)",
"to move the cursor around the screen.",
"The spacebar creates and destroys life.",
"<Esc> starts the cycle of reproduction.",
"<Del> ends life.",
" ",
"Press any key to play The Game of Life.",
0
};

main(ac, av)

int ac;
char **av;
{
int i = 0, k, die();

initscr();
crmode();
noecho();
signal(SIGINT, die);
lookupkeys();
head = (LIFE *)malloc(sizeof(LIFE));
/* lest we have an unanchored pointer */
tail = (LIFE *)malloc(sizeof(LIFE));
head->next = tail;
tail->prev = head;

if (ac > 1)
readfn(*++av);
else
{
erase();
if (COLS > 40)
for ( ; rules[i]; i++)
mvaddstr(i+1, 0, rules[i]);
for (k = 0; rules2[k]; k++)
mvaddstr(i+k+1, 0, rules2[k]);
refresh();
while (!getch())
;
setup();
}
nonl();

while (TRUE)
{
display();
mark_life();
update();
}
}


die()
{
signal(SIGINT, SIG_IGN);
move(LINES-1, 0);
refresh();
endwin();
exit(0);
}


kill_life(ly, lx)
register int ly, lx;
{
register LIFE *lp;

for (lp = head->next; lp != tail; lp = lp->next)
if (lp->y == ly && lp->x == lx)

{
lp->prev->next = lp->next;
lp->next->prev = lp->prev;
free(lp);
break;
}
}

display()
{
int pop = 0;
static int gen, oldpop, boring;
char c;
register LIFE *lp;
erase();
for(lp = head->next; lp != tail; lp = lp->next)
{
mvaddch(lp->y, lp->x, life);
pop++;
}
if (pop == oldpop)
boring++;
else
{
oldpop = pop;
boring = 0;
}
move(MAXY+1, 0);
if (!pop)
{
printw("Life ends after %d generations.", gen);
die();
}
printw("generation - %-4d", ++gen);
printw(" population - %-4d", pop);
refresh();
if (boring == boredom)
{
mvprintw(MAXY, 0, "Population stable. Abort? ");
refresh();
while (!(c = getch()))
;
if (toupper(c) == 'Y')
die();
}
}
mark_life()
{
register k, ty, tx;
register LIFE *lp;

for (lp = head->next; lp; lp = lp->next)
for (k = 0; k < 8; k++)
{
ty = lp->y + pos[k].y;
wrap(ty, MAXY);
tx = lp->x + pos[k].x;
wrap(tx, MAXX);
stdscr->_y[ty][tx]++;

}
}


update()
{
register int i, j, c;
for (i = 0; i <= MAXY; i++)
for (j = 0; j <= MAXX; j++)
{
c = stdscr->_y[i][j];
if (c >= crowd c >= life && c < lonely)
kill_life(i, j);
else if (c == birth)
newlife(i, j);
}
}

setup()
{
int x, y, c, start = 0;

erase();
y = MAXY/2;
x = MAXX/2;
while (!start)
{
move(y, x);
refresh();
switch (c = getkey())
{
case 'h' :
case 'H' :
case ('H' - '@'):
case KEY_LEFT:
case KEY_BACKSPACE:
minwrap(x, MAXX);
break;
case 'j' :
case 'J' :
case ('J' - '@'):
case KEY_DOWN:
maxwrap(y, MAXY);
break;
case 'k' :
case 'K' :
case ('K' - '@'):
case KEY_UP:
minwrap(y, MAXY);
break;
case '1' :
case 'L' :
case ('L' - '@'):
case KEY_RIGHT:
maxwrap(x, MAXX);
break;
case ' ' :
if (inch() == life)
{

addch(' ');
kill_life(y, x);
}
else
{
addch(life);
newlife(y, x);
}
break;
case 'q' :
case 'Q' :
case ESC :
++start;
break;
}
}
}

newlife(ny, nx)
int ny, nx;
{
LIFE *new;

new = (LIFE *)malloc(sizeof(LIFE));
new->y = ny;
new->x = nx;
new->next = head->next;
new->prev = head;
head->next->prev = new;
head->next = new;
}


readfn(f)
char *f;
{
FILE *fl;
int y, x;

if ((fl = fopen(f, "r")) == NULL)
errx("usage: life [file (line/col pts)]\n", NULL);
while (fscanf(fl, "%d%d", &y, &x) != EOF)
{
if (y < 0 y > MAXY x < 0 x > MAXX)
errx("life: invalid data point in %s\n", f);
mvaddch(y, x, life);
newlife(y, x);
}
fclose(fl);

}
errx(m,d)
char *m, *d;
{
fprintf(stderr, m, d);
endwin();
exit(0);
}

































































Fitting Curves To Data


Michael Brannigan


Micheal Brannigan is President of Information and Graphic System, IGS, 15
Normandy Court, Atlanta, GA 30324 (404) 231-9582. IGS is involved in
consulting and writing software in computer graphics, computational
mathematics, and data base design. He is presently writing a book on computer
graphics algorithms. He is also the author of The C Math Library EMCL, part of
which are the routines set out here.


Fitting curves to data ranks as one of the most fundamental needs in
engineering, science, and business. Curve fitting is known as regression in
statistical applications and nearly every statistical package, business
graphics package, math library, and even spreadsheet software can produce some
kind of curve from given data. Unfortunately the process and underlying
computational mathematics is not sufficiently understood even by the software
firms producing the programs. It is not difficult, for example, to input data
for a linear regression routine to a well known statistical package (which I
shall not name) used on micros and mainframes for which the output is
incorrect.
Constructing a functional approximation to data (the formal act known as curve
fitting) involves three steps: choosing a suitable curve, analyzing the
statistical error in the data, and setting up and solving the required
equations. Choosing a suitable curve is a mixture of artistic sensibility and
a knowledge of the data and where it comes from. Analyzing statistical error
can be something of a guessing game and requires some thought. Setting up and
solving the equations is computationally the most interesting. It is here that
many programs fail because they use computationally unstable methods, but more
of that later.
The number of methods for data fitting is legion and we suggest some in this
article. However, we give only one method in full and consider only 2-D data.
Anyone interested in other specific data fitting techniques may contact the
author.


Problem


Given data points (xi,yi)i=1,...,n we suppose there exists a relation
yi = F(xi) + ei, i = 1,...,n
where F(x) is an unknown underlying function and ei represents the unknown
errors in the measurements yi. The problem is to find a good approximation
f(x) to F(x). We thus need a function f to work with and some idea, however
minimal, of the errors.


How To Choose f


f will have variable parameters whose correct values (the values that solve
the approximation problem) are found by solving a system of equations, each
data point defining one equation. We call the function f linear or non-linear
if it is linear or non-linear in its parameters.
Consider some of the general principles involved in choosing a suitable
function f. We must have more data points than parameters, otherwise f will
fit the data exactly and we will not model the errors. Unless absolutely
necessary, don't use a non-linear f; solving systems of non-linear equations
uniquely is, except for special cases, nearly impossible. In most cases
polynomials are not a good choice; they are wiggly curves and nearly always
wiggle in the wrong places.
The best option in most cases is to use piecewise polynomials. The example we
give is a piecewise cubic polynomial such that the first derivatives are
continuous everywhere.
(You can, of course, use cubic splines if you want second derivatives to be
continuous, but in most cases the example set out here is superior for a
general purpose curve fitting routine. If you want the full cubic spline,
please use the B-spline formulation, no other, otherwise you get unstable
systems of equations resulting in incorrect solutions. Using the B-spline
formulation for spline approximation, you need only change the routine
coeff_cubic() in the program given in this article. The system of equations is
solved by the same routines.)
Once f has been chosen and applied to each data point, we obtain a system of
linear equations to solve, where the number of equations will be greater than
the number of unknowns. Such a system is called an overdetermined system and
no exact solution exists -- one that is exactly what we want. However,
overdetermined systems have an infinite number of inexact (approximate)
solutions; we will seek an approximation that minimizes some particular error
measure. (Mathematicians call these error measures "norms". Thus the problem
of curve fitting becomes an optimization problem.)
Of the infinite possible norms three should be considered for any curve
fitting package: the L1-norm, the L2-norm (least squares norm), and the
minimax (Chebyshev) norm (These norms are defined later in this article.).
Fortunately good algorithms exist for solving overdetermined systems of linear
equations in all three norms. For the L1-norm and the minimax norm, you use a
variation of the simplex method of linear programming; for the L2-norm you use
a QR decomposition of the matrix in preference to the computationally unstable
method of solving the normal equations. (We cannot give all the program code
here as space is limited but for more guidance the reader can contact the
author.)
Of many possible combinations the following solution is a good general-purpose
option.


Solution


We have data points (xi,yi)i=1,...,n. Let each xi belong to some interval
[a,b]. Specify k points Xj j=1,...k, on the X-axis, we call these points
knots. These knots are such that
a = X1 < X2 < < Xk = b
We can now define our function as follows: for each x in the interval
[Xj,Xj+1] define the cubic polynomial
y = [(d3 - 3dp2(x) + 2p3(x))Yj
+ (dp(x)q2(x))Yj' + (3d - 2p(x))p2(x)Yj+1
+ dp2(x)q(x)Yj+1']/d3
where
d = Xj+1 - Xj
p(x) = x - Xj
q(x) = Xj+1 - x
Thus y is a cubic polynomial with the linear parameters Yj, Yj+1,Yj',Yj+1',
which are the values and first derivatives at the knots Xj and Xj+1
respectively.
For each data point we obtain one linear equation so we can set up n linear
equations in the 2k unknowns Y1,Y1',..Y, k,Y k'. In matrix form this can be
written as
AY = b
where A is a block diagonal matrix, Y is the vector of unknowns, and b is the
vector of y values. Because A is block-diagonal, for very large data sets
optimal use should be made of the structured sparsity.
With the same knots we could also define cubic B-splines and then fit a cubic
spline to the data. We would again arrive at an overdetermined system of
linear equations with a matrix of coefficients having block-diagonal
structure. In fact the equations we have set out above form a cubic spline
with each knot Xj a double knot.


Choosing A Norm



For each possible solution Y we have errors si i=1,...,n such that
AY - b = s
where s is the vector of si values.
The L1-norm is defined to be
†abs(si)
i
The L2-norm or least squares norm is
(†si2)1/2
i
And the minimax or Chebyshev norm is
max(abs(si) :i=1,...,n)
We solve the overdetermined system of equations by finding that vector Y which
minimizes one of these norms. The choice of norm depends on the unknown errors
ei and we hope that the choice of norm will give errors si that will mirror
these unknown errors. The general rule is: choose the L1-norm if the ei are
scattered (belong to a long tailed distribution); choose the L2-norm if the ei
are normally distributed; choose the minimax norm if the ei are very small or
belong to a uniform distribution. Research has indicated that data sets have
errors nearer to the L1-norm than the L2-norm. (Errors in data are never
normally distributed, neither as they are nor in the limit. This assumption of
normally distributed errors is common in most packages, the user should
question this assumption very carefully.) So when you don't know how the
errors are distributed, use the L1-norm. The minimax norm is rarely used for
fitting curves to experimental data. However, always use the minimax norm if
you want to fit a function to another function, for example fitting a Fourier
series to a complicated function where you know the values exactly.
Whichever norm you choose, the computer solution of the equations is not
straightforward. You must choose an algorithm that is computationally stable.
(A computationally unstable algorithm is one that is mathematically correct
but when fed into a computer, produces wrong answers. For example solving
linear equations without pivoting, or solving quadratic equations from the
well-known formula. So get some professional help in choosing the algorithm.)


Program


After you have spent some time analyzing your particular data fitting problem,
decided upon a suitable function to approximate the data, and also decided
upon the norm to use for the errors in the data, you must program the result.
Unless your application requires special functions, then the approximating
function set out above is a good general purpose function. The programming for
this function or any other has the same form. The system of equations is set
up with one equation for each data point, and then the system is solved with
the required norm. For the function described here the programming is just as
straightforward.
The main routine is Hermite(), named after the mathematician who defined these
piecewise polynomials. The routine first gives the user the choice (by setting
the variable flag) of either setting the k knots lambda[] on input or using
the routine app_knots() to compute the knots. In most cases the user will
never just use the routine once but compute a first approximation then alter
the position of the knots for a second approximation. For a first
approximation set flag to true and use app_knots() to compute the knots
automatically. Then look at the result and choose new knots. A more
sophisticated method automatically chooses the number of knots k and their
position.
Once the knots are defined the routine allocates space for the matrix A of
size nx2k. After making sure all elements of the matrix are zero, the routine
calls coeff_cubic() to set up the coefficients of the matrix.
Now the program solves the overdetermined system in the appropriate norm. The
variable norm is set by the user to indicate which norm to use. (We do not
give here the three routines that solve the overdetermined system of equations
as they require lots of space, but the reader can find the algorithms in most
computational mathematics textbooks.) The routine L1_approx() uses the
Ll-norm, the routine CH_lsq() uses the least squares norm, and the routine
Cheby() uses the minimax norm. With the solution from the appropriate routine,
the function now fits the data.
Some words on the other routines. First, the routine app_knots() will compute
k knots lambda[j] so that in each interval (lambda[j], lambda[j+1]) there are
approximately the same number of x values. This is a good starting point for
our Hermite approximation and for any spline approximation that needs knots.
The routine coeff_cubic() is merely a direct translation of the formulae. This
routine uses interval(), which finds to which knot interval each x value
belongs. coeff_cubic() also uses the macro SUB() to move around the matrix
(this is my preferred method for dealing with matrices passed as parameters).
Finally there is the routine app_cubic(). This routine uses the results from
Hermite() to compute the piecewise polynomial for any value of x. Thus
app_cubic() completes the curve fitting problem.


Example


An example (using data from actual measurements of the horizontal angular
displacement of an elbow flexion against time) will show how the pieces fit
together. There are 142 measured points and these measurements are quite
accurate (the experimenters knew the kind of instruments they were using--see
the paper by Pezzack, et al). In this instance a close fit to the data points
is required. In all the figures the dark circles are the knots and the crosses
are the data points.
The solution is in Figure 1. Figure 2 shows the result when the L2-norm is
used. Figure 3 shows the result when the minimax norm is used. As would be
expected with such "clean" data, the answers are all quite good, the best
being Figure 1.
To illustrate the behavior in the presence of noise, add some significant
errors to the same data points. Using the same curve approximation method,
then Figure 4, Figure 5, and Figure 6 show the result when using the L1- norm,
L2-norm, and minimax norm respectively. As theory suggests, the Ll-norm gives
definitely superior results. This example is a straightforward application of
the method set out here -- well, nearly! You may be asking the six thousand
dollar question, "How do I choose the knots?" The answer is not
straightforward and contemporary research has different answers.
As you can see from the figures, the number and position of the knots changes
for each example. The goal is to choose the number of knots and their position
so as to give the best fit possible for the norm chosen--easy to say but not
easy to compute. All the knots in each figure have been chosen according to an
information theoretic criterion, plus a little experience on the placement of
knots. The idea behind this method is to attempt to extract the maximum amount
of information from the data points until only error remains. To do this we
need a computable value for the amount of information contained in the errors
si; we suggest using the Akaike Information Criterion. The routine changes the
number of knots and their position until there is no more information in the
errors. For those readers who wsh to go further into this problem, see the
papers by Brannigan for a full mathematical treatment of this method, the
information theoretic criterion, and an extension to multivariate data.


Bibliography


Pezzak, J.C. et al. "An Assessment of Derivatives Determining Techniques Used
for Motion Analysis," J. Biomechanics, 10 (1977).
Brannigan, M. "An Adaptive Piecewise Polynomial Curve Fitting Procedure for
Data Analysis," Commun. Statist. A10(18), (1981).
Brannigan, M. "Multivariate Data Modelling by Metric Approximants," Comp.
Stats. & Data Analysis 2, (1985).
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6

Listing 1 Coeff_Cubic
void coeff_cubic (a,p,q,x,y, lambda,k)
/*
* Set up the equations for the Hermite cubic approximation.
*/
double *a,*x,*y,*lambda;
int p,q,k;
{
double d,alpha,beta,d3,alpha3;
int i,j,col;
for (i=0; i<p; i++)
{

j = interval (lambda,x[i],k);
col = SUB(i,2*(j-1),q);
d = lambda[j] - lambda[j-1];
alpha = x[i]-lambda[j-1];
beta = d-alpha;
d3 = d*d*d;
alpha3 = alpha*alpha*alpha;
*(a+col) = (d3-3.0*d*alpha*alpha+2.0*alpha3)/d3;
*(a+col+1) = d*alpha*beta*beta/d3;
*(a+col+2) = (3.0*d-2.0*alpha)*alpha*alpha/d3;
*(a+col+3) = -d*alpha*alpha*beta/d3;
}
}

int interval (x,v,n)
/*
* Given a value v find the interval j such that v is in the interval
* x[j-1] to x[j], where x is an increasing set of n values.
*/
double x[],v;
int n;
{
int j = 0, found = 0;
if (v == x[0]) return(1);
while (!found && ++j<n)
found =( v<=x[j] && v> x[j-1]) ? 1:0;
return(j);
}


double app_cubic (x,j,lambda,res)
/*
* Given the result res[] from the routine Hermite() find the value
* of y for the given x value.
*/
double x,*lambda,*res;
int j;
{

double d,alpha,beta,d3,alpha3,sum,val[4];
int i, col;
col = 2*(j-1);
d = lambda[j] - lambda[j-1];
alpha = x-lambda[j-i];
beta = d-alpha;
d3 = d*d*d;
alpha3 = alpha*alpha*alpha;
val[0] = (d3-3.O*d*alpha*alpha+2.0*alpha3)/d3;
val[1] = d*alpha*beta*beta/d3;
val[2] = (3.0*d-2.0*alpha)*alpha*alpha/d3;
val[3] = -d*alpha*alpha*beta/d3;
for (sum=0.0,i=0; i<4; i++) sum += val[i]*res[col+i];
return (sum);
}


Hermite (Listing 2)
#define SUB(i,j,k) (i)*(k)+(j)



double Hermite (x,y,n,norm,lambda,k,flag,res,err)
/*
* Given n data points (x[],y[]) find the Hermite cubic approximation
* to this data using the k nots lambda[]. If flag = true then find the
* knots from the routine app_knots() otherwise lambda[] is set by the
* user. The 2k result is returned in res[] and the error at each point
* is returned in err[].The overdetermined system of equations is
* solved with respect to the value of norm, uses L1-norm if norm = 1,
* uses the L2-norm if norm = 2, and uses the minimax norm if norm = 3.
* The return value z is the size of the resultant norm.
*/
double *x,*y,*lambda,*res,*err;
int n,norm,k,flag;
{
double *a,z;
int i,j,l,kk,m,m2;
/*
* Find whether the knots are to be computed.
*/
if (flag) app_knots (x,n,lambda,k);
/*
* Now form the system of equations one equation per data point.
*/
m2 = n*2*k;
/*
* Allocate space for the matrix.
*/
a = (double*)calloc(m2,sizeof(double));
if (a==0) printf ("\n NO DYNAMIC SPACE AVAILABLE");
else
{
for (i=0; i<m2; i++) *(a+i) = 0.0;
coeff_cubic (a,n,m,x,y,lambda,k); /* Set up the matrix. */
switch (norm)
{
case 1:
z = Ll_approx(a,n,m,m,y,res,err]; /* L1-norm solution */
break;
case 2:
z = CH_lsq {a,n,m,m,y,res,err); /* L2-norm solution */
break;
default:
z = Cheby (a,n,m,m,y,res,err); /* Minimax norm solution */
break;
}
free (a);
}
return (z);
}

void app_knot0s (x,n,lambda,k)
/*
* Given n x[] values compute k knots lambda[] such that the
* distribution of points in each interval is nearly the same.
*/
double *x,*lambda;
int n,k;
{

int i,j,s,t;
lambda[0] = x[0];
lambda[k-1] = x[n-1];
if (k>2)
{
i = n/(k-1); j = (n-(i*(k-3)))/2;
lambda = x[j];
if (k>3)
{
s = j;
for (t=2; t<k-1; t++)
{s+=i;
lambda[t] = x[s];
}
}
}
}














































A Simple Application Environment


Mark A. Johnson


Mark Johnson has been designing software for a major R&D corporation since
1976. He received a BSCS from the University of Pittsburgh and his MSCS from
Carnegie-Mellon. His current computer interests include languages, programming
for children, business applications, and computer-generated music. Mark is
continuing to develop other DCUWCU applications.


Having used a mouse in user interfaces since 1981, I believe it to be most
convenient way to inform a computer program what you want it to do. I wanted
to use a mouse in a number of PC programs and so looked into a few application
environments. Microsoft Windows and Digital Research's GEM disappointed me,
due to the complexity that had to be mastered. The resource construction sets,
complex window management, and other overheads needed to write a simple
application led me to write my own simple application environment based on
Turbo C graphics routines and a public domain mouse interface.
My goal was to build an easy-to-use environment that provides a mouse-driven
cursor, stacked pop-up menus, and forms that contain editable fields and a
variety of selectable buttons. The environment would keep track of what the
user was doing, inform the application as needed, and clean up after itself.
An additional goal was to facilitate porting the environment to other machines
that have a mouse, bitmap graphics, console I/O, and a simple timer. I have
the same DCUWCU environment on my PC compatible and Atari ST, allowing me to
easily move applications between systems.


Operation


A typical application begins with a blank screen (or suitable greeting)
showing an arrow-shaped cursor controlled by the mouse. Pressing the right
mouse button displays a set of stacked pop-up menus. While holding the right
mouse button down, the user selects an item (or another menu) from the
front-most menu and then releases the button. If a menu item was selected,
then the application acts on that selection. If another menu was selected, it
is brought to the front of the menu stack, ready for another round of menu
item selection. Pressing the left mouse button or a keyboard key usually
causes an application-specific action, often resulting in a form appearing on
the screen that the user must fill out. When processing the form, all mouse
and keyboard events are handled by the environment. Keyboard input is directed
to the current editable field, denoted by the special input cursor. A TAB
moves the input cursor to the next editable field. An ESC (cancel) or ENTER
(accept) terminates form processing, returning data and control back to the
application. Some forms may contain small text labels, called form buttons,
that are selected (or de-selected) by moving the cursor over them and pressing
the left mouse button. There are three types of buttons: plain, radio, and
exit. A plain button is a simple on/off switch. A radio button is a
one-of-many switch, much like the buttons on a car radio. An exit button is
like the plain button, but causes form processing to end.
The application environment works equally well when no mouse is present by
using the cursor keys to simulate mouse motion and the function keys F1 and F2
to simulate the left and right mouse buttons. Pressing F2 once simulates
pressing the right mouse button; another press of F2 simulates its release. A
single press of the F1 button simulates the left mouse button.


Application Interface


The interface between application and environment was made as simple as
possible: strings are used to define forms and menus, pointers to variables
are used to store values collected by forms, and calls to functions inform the
application of user events, such as menu selection or mouse button clicks.
The application environment follows (and is named after) what is called "The
Hollywood Principle," (or "don't call us, we'll call you"). An application
developer supplies four critical routines, called when the application
environment detects various user interface events.


start(argc, argv, envp) int argc; char **argv, **envp;


This is the initialization routine called immediately after initializing the
graphics interface but before the environment is completely started. It is
passed the same arguments normally passed to a C main() routine. The start()
routine usually initializes the application and creates the menu stack using
repeated calls to add_menu().


menu(m, i)


The menu() routine is called whenever a menu selection is made. The
application environment supports a stack of pop-up menus. Any number of menus
can be supported, although only two or three are usually active at any one
time in order to minimize interface complexity (see menu_state() below). The m
argument identifies which menu was selected. When the menu is first declared
(see add_menu()), the application provides a value that identifies the menu.
This same identifer value is passed back to the application when a menu is
selected. The i argument specifies which menu item was selected, a value of 1
meaning the first item, etc.


button(b, x, y)


The button() routine is called when a mouse button is pressed. The right mouse
button is reserved for menu manipulation, all others are passed to the
application. The b argument is the button number (usually 1) and the x and y
arguments are the mouse coordinates when the button was pressed.


keyboard(key)


The keyboard() routine is called whenever a console key is struck. The
character typed by the user is contained in the single argument.


timer(t)


The (optional) timer() routine is called whenever an application-requested
timer expires. When the timer is requested, a value identifying the timer is
passed to the application environment. The same identifer value is passed back
to the application in the t argument when the timer expires.



Environment Interface


The application environment provides some basic routines that an application
can call for control and service.


finish()


The finish() routine is called whenever the application is done and the
program must exit.


add_menu(m, mdef) char *mdef;


The add_menu() routine adds a menu to the current set of pop-up menus
maintained by the environment. An application typically initializes all its
menus from the start() routine. The m argument is remembered by the
environment and passed back to the application when a menu selection is made.
The mdef argument is a string that defines the menu. For example,
add_menu(1, "Main:AboutHelpQuit")
defines a menu identified as menu 1, titled Main, and with three items: About,
Help, and Quit.


menu_state(m, on)


The menu_state() routine allows the application to activate or de-activate a
particular menu. The m argument refers to the menu defined with a previous
add_menu() call. The on argument should be set to 1 to activate or 0 to
deactivate the menu.


menu_item(m, i, str) char *str;


The menu_item() routine is used to change the name of a particular menu item.
For example, suppose a drawing program can turn a grid on and off. The
application might have a menu item called Grid when no grid is shown and
change it to No Grid using menu_item() when the grid is shown.


mouse_state(on)


The mouse_state() routine will activate or deactivate the mouse-driven cursor.
The on variable should be set to 1 to show the mouse and 0 to hide it.


mouse_shape(type, bits, hotx, hoty) char *bits;


The application can control the cursor's shape with mouse_shape. There are two
built-in forms: arrow and cross. A type value of 0 and 1 specify arrow and
cross, respectively. A type value of 3 allows the user to specify a custom
designed mouse cursor. The bits argument is a pointer to an eight-byte
character array containing the mouse bitmap (8 x 8 bits). The hotx and hoty
arguments indicate which bit in the bitmap is considered the hotspot of the
cursor. For example, the cross form has a hotspot of hotx=hoty=3, which is the
center of the 8 x 8 bitmap.


add_timer(t, wait) long wait;


Many applications, especially games, require some sense of the passage of
time. Using add_timer(), the application can arrange for timer() to be called
after some time has elapsed. The application's timer() routine can do such
things as blank the screen if no activity has taken place for many minutes or
move sprites around the screen after a few tenths of a second. The t argument
identifies a particular timer and is passed back to the application when the
timer expires. The wait argument specifies the needed delay in milliseconds
(e.g. wait=1000L is a delay of one second).


form(def, argptr1, ...) char *def;


form() displays a form on the screen, collects data from the user, and
deposits it in the variables pointed to by the argptr parameters. This routine
is somewhat similar to scanf(). The form definition string defines a number of
fields. For example,
" Name: %15s Number: %5d %[malefemale] %[over 55] %{ok}"
This form definition would result in the following being displayed in the
middle of the screen surrounded by a rectangle.
Name:____________________LINEEND____
Number:_____

[malefemale] [over 55]
{ok}
Most of the text in the definition string is used as titles. A signifies the
beginning of a new line in the form. A data field begins with a % and is
associated with a particular variable. There are five types of data fields.
(For the examples that follow, assume the following declarations occur before
the call to form(): char c, buf[11]; int x;). (See Table 1.) If a character
pointer fdef points to the string described in Table 1, the following code
fragment uses form() correctly.
char name[16], m_or_f = 0, over_55 = 1, ok = 0;
int number;
if (form(fdef,
name, &number,
&m_or_f,
&over_55,
&ok)) {
/* do stuff with name, number, etc. */
}
After filling out the form, if the user selects the {ok} button or hits ENTER,
form() returns a non-zero value. The data values collected by the form and
stored in name, number, m_or_f, and over_55 are processed further by the if
statement. If the user strikes the ESC key while filling out the form, form()
returns zero and the processing doesn't take place.


An Example


Listing 1 is a simple drawing program that illustrates how to build a DCUWCU
application. The code for this example and the complete source for the DCUWCU
have been added to the CUG Library. See the New Releases column by Kenji Hino
for more details.


Conclusion


I have used the "Hollywood Principle" design model for a number of projects
and have found it to shorten development time and result in a robust
application. The mouse is an effective user interface device and, when coupled
with pop-up windows and forms, provides clean, uncluttered operation.
I would like to acknowledge the designers of the many mouse-based user
interfaces I have used in the past, such as the Apple Macintosh, Microsoft
Windows, DR/Atari GEM, but most significantly the Xerox Mesa Development
System, for the inspiration to build this simple application environment.
Table 1
Field Example Argument Values
-------------------------------------------
text %10s &buf[0] string
number %5d &x integer
button %[abc] &c 0 or 1
radio %[abc:def:ghi] &c 0, 1, ...
exit %{ok} &c 0 or 1

Listing 1
/*
* this is a very simple drawing program that
* illustrates how to build a DCUWCU application
*
* Copyright 1989 Mark A. Johnson
*/

#include <stdio.h>
#include <graphics.h>

#define M_POINTER 0 /* mouse shapes */
#define M_CROSS 1

#define ON 1
#define OFF 0

#define MAX_OBJECT 100
#define ESC 27

#define BOX 'b' /* object types we support */
#define ELLIPSE 'e'
#define LINE 'l'
#define TEXT 't'


#define M_MAIN 1 /* handles for the menus */
#define M_FILE 2
#define M_OBJ 3
#define M_ACT 4

#define A_COPY 1 /* action requests for button() */
#define A_MOVE 2
#define A_EDIT 3

#define min(a,b) ((a) < (b) ? (a) : (b))
#define max(a,b) ((a) > (b) ? (a) : (b))

typedef struct { int type, l, t, r, b; char select, *data; } Object;

Object objects[MAX_OBJECT]; /* the table of objects defined so far */
int last_object; /* the end of the object table */

int map[] = { /* maps a M_OBJ menu item to an object */
0, BOX, ELLIPSE, LINE, TEXT };

char *about = /* form used on the M_MAIN About item */
" Draw This! byMark A. Johnson %{continue}";

char *help = /* help message for wrong keyboard input */
"quit refresh : box line ellipse text : delete copy move edit";

char filename[20]; /* save the filename we're working with */
char text[100]; /* extra buffer for text i/o */

int actn_obj = 0; /* flag for button(), some action req */
int make_obj = 0; /* flag for button(), need to create */
int slct_cnt = 0; /* count of selected objects */
int first; /* helps make_object collect points */
int grid = 0; /* grid displayed, snap coords */

extern int Maxx, Maxy, MaxColor;

/* start routine, called by the application driver, gets things going */

start(argc, argv) char **argv; {
add_menu(M_MAIN, "Main:AboutQuitRefreshGrid");
add_menu(M_FILE, "File:ReadWriteSavePrint");
add_menu(M_OBJ, "Objects:BoxEllipseLineText");
add_menu(M_ACT, "Actions:DeleteCopyMoveEdit");
menu_state[M_ACT, 0);
if (argc> 1) {
strcpy(filename, argv[1]);
read_objects();
}
}

/* no timers in this application , but DCUWCU needs an entry anyway */

timer() {}

/* button routine called every time button 1 is depressed */

button(b, x, y) {

if (make_obj) { /* need points to make an object */
make_object(x, y);
}
else if (actn_obj) { /* got a point for a copy or move */
action_object(x, y);
}
else { /* do a selection */
select_object(in_object(x, y));
}
check_menu();
}

/* menu routine called every time a menu item is selected */

menu(m, i) {
char junk = 0, on;
switch (m) {
case M_MAIN: /* main menu */
switch (i) {
case 1: form(about, &junk); break;
case 2: quit(); break;
case 3: refresh(); break;
case 4: do_grid(); break;
}
break;
case M_FILE: /* file menu */
if (i < 3 && !get_name())
break;
switch (i) {
case 1: read_objects(); break;
case 2: case 3: write_objects(); break;
case 4: print(); break;
}
break;
case M_OBJ: /* objects] */
start_make(map[i]);
break;
case M_ACT: /* actions */
switch (i) {
case 1: kill_object(); break;
case 2: start_actn(A_COPY); break;
case 4: start_actn(A_MOVE); break;
case 4: start_edit(); break;
}
break;
}
}

/* routine called everytime a key is struck */

keyboard(c) {
switch (c) {
case 'p': print(); break;
case 'g': do_grid(); break;
case 'r': refresh(); break;
case 'q': quit(); break;
case 'b': start_make(BOX); break;
case 't': start_make(TEXT); break;
case 'l': start_make(LINE); break;

case 'm': start_actn(A_MOVE); break;
case 'c': start_actn(A_COPY); break;
case 'd': kill_object(); break;
case 'e':
if (slct_cnt)
start_edit();
else start_make(ELLIPSE);
break;
default: msg(help);
}
}

/* time to go, see if they really want to */

quit() {
char yes = 0, no = 0;
char *f_exit = "Are you sure? %{yes} %{no}";
if (form(f_exit, &yes, &no) && no == 0)
finish();
}

/*
* miscellaneous support routines
*/

/* reset the current grid size */

do_grid() {
char gridval, ok = 0, nok = 0, x;
switch (grid) {
case 8: gridval = 1; break;
case 16: gridval = 2; break;
default: gridval = 0; break;
}
x = form("Change Grid Size %[none:8:16]%{ok} %{cancel}",
&gridval, &ok, &nok);
if (x == 0 nok) return;
grid = gridval * 8;
refresh();
}

/* print the current screen somewhere, Epson-compatible graphics mode */

print() {
static char grhd[] = { ESC, 'L', 0, 0 }; /* 960 bit graphics */
static char grlf[] = { ESC, 'J', 24, '\r' }; /* line feed */
static char prbuf[960];
int x, y, i, b, n, any, pixel, max;
max = min(Maxx, 960);
grhd[2] = max;
grhd[3] = max >> 8;
mouse_state(OFF);
b = 0x80;
any = 0;
for (y = 0; y < Maxy; y++) {
for (x = 0; x < max; x++) {
if (getpixel(x, y)) {
any = 1;
prbuf[x] = b;

}
}
b >>= 1;
if (b == 0) { /* out it goes */
if (any) {
prn(grhd, 4);
prn(prbuf, max);
}
prn(grlf, 4);
b = 0x80;
any = 0;
for (x = 0; x < max; x++)
prbuf[x] = 0;
}
}
mouse_state(ON);
}

/* print the n bytes out the printer port */

prn(s, n) char *s; { while (n-) biosprint(0, *s++, 0); }

/* select or de--select an object */

select_object(obj) {
int i;
Object *o;

if (obj == --1) { /* de--select all */
for (i = 0; i < last_object; i++) {
o = &objects[i];
if (o-->select) {
o-->select = 0;
highlight(o, 0);
}
}
slct_cnt = 0;
}
else {
o = &objects[obj];
o-->select = !o-->select;
highlight(o, o-->select);
slct_cnt += o-->select ? 1 : --1;
}
}

/* get a filename from the user, return 0 if abort */

get_name() {
return form("Path: %20s", filename);
}

/* based on current select state, set the top--most menu */

check_menu() {
menu_state(M_ACT, slct_cnt > 0);
menu_state(M_OBJ, slct_cnt <= 0);
}


/* start to make an object by collecting points */

start_make(tape) {
char *s;
switch (make_obj = type) {
case BOX: s = "box: top left corner..."; break;
case ELLIPSE: s = "ellipse: top left corner..."; break;
case LINE: s = "line: one end..."; break;
case TEXT: s = "text: starting..."; break;
}
msg(s);
mouse_shape(M_CROSS);
first = 1;
}

/* if enough points have been collected, make the object */
make_object(x, y) {
static int fx, fy;

if (grid) snap(&x, &y);

switch (make_obj) {
case TEXT:
*text = 0;
form("text: %20s", text);
add_object(TEXT, x, y, x + strlen(text)*8, y+8, text);
make_obj = 0;
mouse_shape(M_POINTER);
msg("");
break;
default:
if (first) {
fx = x;
fy = y;
first = 0;
line(x--3, y, x+3, y);
line(x, y--3, x, y+3);
if (make_obj == LINE)
msg("other end...");
else msg("bottom right corner...");
}
else {
add_object(make_obj, fx, fy, x, y, 0L);
msg("");
make_obj = 0;
mouse_shape(M_POINTER);
}
}
}

/* snap the coordinates to the nearest grid point */

snap(xp, yp) int *xp, *yp; {
int g2 = grid/2, g4 = grid/4, x = *xp, y = *yp;
x = ((x + g2) / grid) * grid;
y = ((y + g4) / g2) * g2;
msg("x %d-->%d y %d-->%d", *xp, x, *yp, y);
*xp = x;
*yp = y;

}

/* move, copy, or edit a figure */

action_object(x, y) {
int i, dx, dy;
Object *o;

if (grid) snap(&x, &y);

/* find reference point and compute distance moved */
dx = dy = (actn_obj == A_EDIT ? 0 : 10000);
for (i = 0; i < last_object; i++) {
o = &objects[i];
if (o-->select) {
if (actn_obj == A_EDIT) {
dx = max(o-->r, dx);
dy = max(o-->b, dy);
}
else {
dx = min(o-->l, dx);
dy = min(o-->t, dy);
}
}
}
dx = x -- dx;
dy = y -- dy;

/* do it to all selected items, de-selecting as you go */
for (i = 0; i < last_object; i++) {
o = &objects[i];
if (o-->select) {
o-->select = 0;
highlight(o, 0);
switch (actn_obj) {
case A_COPY:
highlight(o, 0);
add_object(o-->type,
o-->l + dx, o-->t + dy,
o-->r + dx, o-->b + dy,
o-->data);
break;
case A_MOVE:
draw_object(o, 0);
o-->l += dx;
o-->t += dy;
o-->r += dx;
o-->b += dy;
draw_object(o, 1);
break;
case A_EDIT:
draw_object(o, 0);
set_coords(o,
o-->l, 0-->t,
o-->r + dx, o-->b + dy);
draw_object(o, 1);
break;
}
}

}

/* deselect all and reset the mouse */
actn_obj = 0;
slct_cnt = 0;
mouse_shape(M_POINTER);
msg("");
check_menu();
}

/* read objects from a file */

read_objects() {
int type, t, l, r, b;
FILE *f = fopen(filename, "r");
if (f != NULL) {
last_object = 0;
while (fgets(text, 100, f)) {
sscanf(text, "%c %d %d %d %d '%[^']\n",
&type, &l, &t, &r, &b, text);
add_object(type, l, t, r, b, text);
}
fclose(f);
msg("%d objects loaded", last_object);
}
else msg("can't open '%s'", filename);
}

/* write objects to a file */

write_objects() {
int i;
Object *o;
FILE *f;
if (*filename == 0 && !get_name())
return;

if ((f = fopen(filename, "w")) != NULL) {
for (i = 0; i < last_object; i++) {
o = &objects[i];
fprintf(f, "%c %d %d %d %d '%s'\n",
o-->type, o-->1, o-->t, o-->r, o-->b,
o-->type == TEXT ? o->data : "");
}
fclose(f);
}
else msg("can't write '%s'", filename);
}

/* save the given string in malloc'ed memory */

char *
strsave(s) char *s; {
char *malloc();
char *r = malloc(strlen(s)+1);
if (r) strcpy(r, s);
else msg("out of memory!!!");
return r;
}


/* re--display all the objects on the screen */

refresh() {
int i, x, y, gy;

Object *o;
clearviewport();
setcolor(MaxColor);
if (grid) {
gy = grid/2;
for (x = grid; x < Maxx; x += grid)
for (y = gy; y <Maxy; y += gy)
putpixel(x, y, 1);
}
for (i = 0; i < last_object; i++) {
o = &objects[i];
draw_object(o, 1);
if (o->select) highlight(o, 1);
}
}

/* (de)highlight the current selected item */

highlight(o, color] object *o; {
setcolor(color);
rectangle(o-->l--2, o-->t--2, o-->l+2, o-->t+2);
rectangle(o-->r--2, o-->b--2, o-->r+2, o-->b+2);
}

/* give the user some feedback */

msg(fmt, a, b, c, d) char *fmt; {
static int lastback = 0;
setfillstyle(EMPTY_FILL, 0);
bar(0, 0, lastback, 8);
sprintf(text, fmt, a, b, c, d);
setcolor(MaxColor);
outtextxy(0, 0, text);
lastback = strlen(text) * 8;
}

/*
* object handling
*/

/* see if x, y are in an object, begin looking at start + 1 */

in_object(x, y) {
static int last = 0;
int l, r, t, b;
Object *o;
int i = last+1, n = last_object;
while (n-) {
if (i >= last_object) i = 0;
o = &objects[i];
l = min(o-->l, o-->r);
r = max(o-->l, o-->r);
t = min(o-->t, o-->b);

b = max(o-->t, o-->b);
if (x >= l && x <= r && y >= t && y <= b)
return (last = i);
i++;
}
return (last = --1);
}

/* add an object to the object table */

add_object(type, l, t, r, b, data) char *data; {
Object *o = &objects[last_object++];
char *s;
o-->type = type;
set_coords(o, l, t, r, b);
o-->select = 0;
if (type == TEXT)
o-->data = strsave(data);
draw_object(o, 1);
}

/* set the coordinates properly */

set_coords(o, l, t, r, b) Object *o; {
if (o-->type == LINE) { /* no fixup on these */
o-->l = l;
o-->t = t;
o-->r = r;
o-->b = b;
}
else {
o-->l = min(l, r];
o-->t = min(t, b);
o-->r = max(l, r);
o-->b = max(t, b);
}
}

/* draw an object on the screen */

draw_object(o, color) Object *o; {
int x, y, xrad, yrad;
setcolor(color);
switch (o->type) {
case TEXT:
x = strlen(o->data) * 8;
setfillstyle(EMPTY_FILL, 0);
bar(o-->l, o-->t, o-->l + x, o-->t + 8);
outtextxy(o-->1, o-->t, o-->data);
break;
case BOX:
rectangle(o-->l, o-->t, o-->r, o-->b);
break;
case LINE:
line(o-->l, o-->t, o-->r, o-->b);
break;
case ELLIPSE:
x = o-->l + (o-->r -- o-->l)/2;
y = o-->t + (o-->b -- o-->t)/2;

xrad = o-->r -- x;
yrad = o-->b -- y;
ellipse(x, y, 0, 360, xrad, yrad);
break;
}
}

/* delete an object */

kill_object() {
int i, j;
Object *o;
for (i = j = 0; i < last_object; i++) {
o = &objects[i];
if (o-->select) {
highlight(o, 0);
draw_object(o, 0);
o-->select = 0;
}
else {
if (i > j)
objects[j++] = objects[i];
else j++;
}
}
last_object = j;
slct_cnt = 0;
check_menu();
}

/* start an edit on the selected objects */

start_edit() {
int i;
Object *o;
/* edit the text objects now */
for (i = 0; i < last_object; i++) {
o = &objects[i];

if (o-->type == TEXT && o-->select) {
o-->select = 0;
highlight(o, 0);
draw_object(o, 0);
strcpy(text, o-->data);
if (form("edit: %20s", text)) {
free(o-->data);
o-->data = strsave(text);
o-->r = o-->l + strlen(text)*8;
}
draw_object(o, 1);
slct_cnt-;
}
}

if (slct_cnt > 0) { /* must be other stuff */
start_actn(A_EDIT);
}
check_menu();
}


/* initiate an action on selected objects */

start_actn(actn) {
switch (actn) {
case A_COPY:
msg("copy to...");
break;
case A_MOVE:
msg("move to...");
break;
case A_EDIT:
msg("editing...");
break;
}
actn_obj = actn;
mouse_shape(M_CROSS);
}













































Spiffier Windows For Turbo C


Tony Servies


Tony Servies is a programmer/analyst with World Computer Systems in Oak Ridge,
Tennessee. Presently he is working on a project to develop computer-based
training programs for the U.S. Navy. His computer interests include utilities
and C programming. You may contact him at Route 1, Box 143, Greenback, TN
37742.


Want to spice up your user interface with flashy windows with only a minimal
amount of coding and time? With a few lines of code and Borland's Turbo C,
it's possible.


Turbo C Window Interface


Two functions can be used to create text windows in Turbo C: gettext() and
puttext(). Each function either gets a screen image or puts an image to the
screen, respectively. The programmer supplies only the window coordinates and
a character string pointer (or character array, if you will); the function
does the rest. These remarkable routines do some rudimentary screen I/O
quickly and cleanly.
One drawback though is the inherent lag between the time you write to the
window area and the moment the text is displayed. The user 'sees' any text
writing that you perform. Most applications today require a window flashed on
the screen intact, such as in a pull-down menu.
The code in Listing 1 allows writing to a window before it is displayed on the
screen. Then when the window is flashed on the screen, it is complete.
You call puttext_write with the x,y coordinates, window size, the character
string to display, the attribute for the string, and the pointer to the window
buffer. The x,y coordinates start from the upper left corner with the location
0,0. The size of the window is given with the number of columns (width) and
the number of rows (heighth). The string to display is simply a character
string stored in standard C format; a '\0' character terminates the string.
The buffer is a pointer to an area of characters that denotes the window area.
The string attribute is the usual color attributes found in almost every
reference manual on PCs.


How It Works


puttext_write first checks that you are not positioning the data beyond the
physical bounds of the window. Of course, this routine will wrap any text past
the end of a line onto the following line (unless it is the last line in the
window).
The routine then gets the pointer address for the last character and attribute
pair in the window area, called maxbuffer in the subroutine. The offset for
the proper x,y location is added to the buffer so that it points to the
correct character.
While the buffer location is less than the maxbuffer pointer and the character
in the string is not the end of string terminator ('\0'), the while loop
updates the character and attribute of the buffer. The loop terminates only
when the buffer overflows (buffer >= maxbuffer) or at the end of string
(*string == '\0'). Now, just put the window on the screen and you're ready to
go. I've included a quick and dirty sample program illustrating the flashy
windows routine (Listing 2).
Note how easy it is to create a window. Just use a character array of
XSIZE*YSIZE*2 bytes. (You multiply the area by two because each displayed
character is followed by a byte of attribute information (color, blink,
etc.).)
The program then clears the window and sets all of the attributes. In this
example I set all attributes to magenta characters on cyan background. Then
the routine that does the actual call to puttext_write() loops through ten
times. After the page is full, I put the window on the screen with the
puttext() command and wait for half a second. The routine loops through nine
more times until it completes the for loop. I then restore the original screen
with the puttext() call for the original screen area (oldbuffer).
This routine should enable you to enhance your pull- down, pop-up, and
user-entry screens. Feel free to modify the code to account for border areas,
highlighted text, etc.

Listing 1
puttext_write (x, y, xsize,ysize,string,attr,buffer)
int x,y,xsize,ysize;
char *string, attr, *buffer;
{
char *maxbuffer;

if (x >= xsize y >= ysize) /* Range Errors */
return;

maxbuffer = buffer+(xsize*ysize*2)-1;
/* maxbuffer points to the attribute of the last character */

buffer += (((y*xsize)+x)*2);
/* buffer points to the first character to write */

/* While buffer is not overrun and there are characters left
* to print. */
while ((buffer < maxbuffer) && (*string != '\0')) {
*buffer++ = *string++; /* Do character */
*buffer++ = attr; /* Do attribute */
}
}


Listing 2

#include <stdio.h>
#include <conio.h>

#define XSIZE 50
#define YSIZE 15

char newbuffer[XSIZE * YSIZE * 2]; /* Allow for Attributes */
char oldbuffer[XSIZE * YSIZE * 2];

main()
{
int i, j;
char key_string[15];

/* Get the existing screen area and store it in oldbuffer.
* Subtract 1 from size, since the 1st position is 0.
*/
gettext (5,5,5+XSIZE-1,5+YSIZE-1,oldbuffer);

/* Clear the new window area (newbuffer) */
for (i = 0; i < YSIZE; i++) {
for (j = 0; j < XSIZE*2; j+=2) {
newbuffer[i*XSIZE*2+j] = ' '; /* Blank Space */
newbuffer[i*XSIZE*2+j+1] = '\35'; /* Attribute */
}
}

/* Loop through 10 times */
for (j = 0; j < 10; j++) {

/* Print YSIZE lines */
for (i = 0; i < YSIZE; i++) {
sprintf(key_string,"Value %.3d",i+(j*(int)YSIZE));
puttext_write(1,i,XSIZE,YSIZE,key_string,\
'\35',newbuffer);
}

/* Show it on the screen */
puttext(5,5,5+XSIZE-1,5+YSIZE-1,newbuffer);
delay(500);
}

/* Restore the original screeen */
puttext(5,5,5+XSIZE-1,5+YSIZE-1,oldbuffer);
}


















The C Programmer's Reference: A Bibliography Of Periodicals


Harold Ogg


This article is not available in electronic form.






















































Standard C


Formatted Input




P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standard committee.


This is the fourth in a series of columns on input and output under Standard
C. (See "Evolution of the C I/O Model," CUJ August '89, "Streams," CUJ October
'89, and "Formatted Output," CUJ November '89.) The topic this month is how to
perform formatted input. You can think of it as a natural, but not essential,
companion to formatted output.
As I emphasized last month, you really must perform output somewhere in every
program that you write. If the output is to be directly digestible by human
beings, as is often the case, then you want the program to produce readable
text. The formatted output functions help you produce readable text that
reflects the values of encoded data in your program.
On the other hand, not all programs read input. Those that do can read data
directly, using an assortment of standard library functions, and interpret it
as they see fit. Converting small integers and text strings for internal
consumption are both five-finger exercises that most C programmers perform
easily. It is only when you must convert floating point values, or recognize a
complex mix of data fields, that standard scanning functions begin to look
attractive.
Even then the choice is not always clear. The usability of a program depends
heavily on how tolerant it is to variations in user input. You as a programmer
may not agree with the conventions enforced by the standard formatted input
functions. You may not like the way they handle errors. In short, you are much
more likely to want to roll your own input scanner.
Obtaining formatted input in not simply the inverse of producing formatted
output. With output, you know what you want the program to generate next and
it does it. With input, however, you are more at the mercy of the person
producing the input text. Your program must scan the input text for
recognizable patterns, then parse it into separate fields. Only then can it
determine what to do next.
Not only that, the input text may contain no recognizable pattern. You must
then decide how to respond to such an "error." Do you print a nasty message
and prompt for fresh input? Do you make an educated guess and bull ahead? Or
do you abort the program? Various canned input scanners have tried all of
these strategies. No one of them is appropriate for all cases.
It is no surprise, therefore, that the history of the formatted input
functions in C is far more checkered than for the formatted output functions.
Most implementations of C have long agreed on the basic properties of printf
and its buddies. (A notable exception is the I/O library I originally wrote
for the Whitesmiths C compiler. It nicely regularized the names of functions
and format conversion specifications, but at a serious cost in compatibility.
Eventually, we had to abandon our special dialect of I/O.) By contrast, scanf
and its ilk have changed steadily over the years and have proliferated
dialects.
Committee X3J11 spent considerable time sorting out the proper behavior of
formatted input. Once we agreed on which input conversions to include in
Standard C, we had to agree on exactly what they did. Implementations varied
on the valid formats for numeric fields. They were all over the map on how to
respond to invalid input. They seldom clarified how scanf interacts with
ungetc and other I/O functions.
All these decisions had to be made in an atmosphere of general
dissatisfaction. A vocal minority wanted major changes in the formatted input
functions. An almost silent majority didn't want to be bothered with details
about functions they considered useless at best, dangerous at worst. Given all
these handicaps, I think X3J11 did rather a good job of clarifying the
formatted input functions and making them useful.
After that introduction, I will rashly assume that you still care about the
formatted input functions. The rest of this column discusses the scan
functions, so called because they all have scan as part of their names. These
are the functions that scan input text and convert text fields to encoded
data. All are declared in the standard header <stdio.h>. To use the scan
functions, you must know how to call them, how to specify conversion formats,
and what conversions they will perform for you.


Calling Scan Functions


The Standard C library provides three different scan functions, declared as
follows:
int fscanf(FILE *stream, const char *format, ...);
int scanf(const char *format, ...);
int sscanf(char *src, const char *format, ...);
The function fscanf obtains characters from the stream stream. The function
scanf obtains characters from the stream stdin. Both stop scanning input early
if an attempt to obtain a character sets the end-of-file or error indicator
for the stream. The function sscanf obtains characters from the
null-terminated string beginning at src. It stops scanning input early if it
encounters the terminating null character for the string.
Note that all of the functions accept a varying number of arguments, just like
the print functions. And just like the print functions, you had better declare
any scan functions before you use them by including <stdio.h>. Otherwise, some
implementation may go crazy when you call your undeclared scan function.
All the functions accept a read-only format argument, which is a pointer to a
null-terminated string. The format tells the function what additional
arguments to expect, if any, and how to convert input fields to values to be
stored. (A typical argument is a pointer to a data object that receives the
converted value.) It also specifies any literal text or whitespace you want to
match between converted fields. If scan formats sound remarkably like print
formats, the resemblance is quite intentional. But there are also important
differences. I will revisit formats in considerable detail later in this
column.
All the functions return a count of the number of text fields converted to
values that are stored. If any of the functions stops scanning early for one
of the reasons cited above, however, it returns the value of the macro EOF
(defined in the standard header <stdio.h>). Since EOF must have a negative
value, you can easily distinguish it from any valid count, including zero.
Note, however, that you can't tell how many values were stored before an early
stop. If you need to locate a stopping point more precisely, break your scan
call into multiple calls.
A scan function can also stop scanning because it obtains a character that it
is unprepared to deal with. In this case, the function returns the cumulative
count of values converted and stored. You can determine the largest possible
return value for any given call by counting all the conversions you specify in
the format. The actual return value will be between zero and this maximum
value, inclusive.
When either fscanf or scanf obtains such an unexpected character, it pushes it
back to the input stream. (It also pushes back the first character beyond a
valid field when it has to peek ahead to determine the end of the field.) How
it does so is similar to calling the function ungetc. There is a very
important difference, however. You cannot portably push back two characters to
a stream with successive calls to ungetc (and no other intervening operations
on the stream). You can portably follow an arbitrary call to a scan function
with a call to ungetc for the same stream.
What this means effectively is that the one-character pushback limit imposed
on ungetc is not compromised by calls to the scan functions. Either the
implementation guarantees two or more characters of pushback to a stream or it
provides separate machinery for the scan functions.
Note that the scan functions push back at most one character. Say, for
example, that you try to convert the field 123EASY as a floating point value.
The field is, of course, invalid. Even the subfield 123E is invalid, since the
conversion requires at least one exponent digit. What will happen is, the
subfield 123E is consumed and the conversion fails. No value is stored and the
scan function returns. The next character to read from the stream is A. This
behavior matters most for floating point fields, which have the most ornate
syntax. Other conversions can usually digest all the characters in the longest
subfield that looks valid.
As a final point, the Standard C library does not provide any of the functions
vfscanf, vscanf, or vsscanf. These are obvious analogs to the print functions
vfprintf, vprintf, and vsprintf which I described last month. X3J11 simply
felt that there was not enough call for such scan functions to require them of
all implementations.


Writing Formats


Last month, I described the print formats as a mini programming language. The
same is, of course, true of the scan formats. I also commented earlier that
print and scan formats look remarkably alike. This should serve as both a
comfort and a warning to you.
The comfort is that the print and scan functions are designed to work
together. What you write to a text file with one program should be readable as
a text file by another. Any values you represent in text by calling a print
function should be reclaimable by calling a scan function. (At least they
should be to good accuracy, over a reasonable range of values.) You would even
like the print and scan formats to resemble each other closely.
Doug McIlroy, at AT&T Bell Laboratories, makes a stronger statement. He feels
that any good formatted I/O package should let you write identical formats for
print and scan function calls. A formatting language that is not symmetric, he
feels, is deficient. I believe that Standard C comes close to achieving this
goal. It is at least possible for you to write symmetric formats (those that
read back what you wrote out). Be warned, however, that developing symmetry
can take a bit of extra thought.
And here lies the danger. The fact remains that the print and scan format
languages are different. Sometimes the apparent similarity is only
superficial. You can write text with a print function call that does not scan
as you might expect with a scan function call using the same format. Be
particularly wary when you print text using conversions with no intervening
whitespace. Be somewhat wary when you print adjacent whitespace in two
successive print calls. The scan functions tend to run together fields that
you think of as separate.
The basic operation of the scan functions is, indeed, the same as for the
print functions. Call a scan function and it scans the format string once from
beginning to end. As it recognizes each component of the format string, it
performs various operations. Most of these operations consume characters
sequentially from a stream (fscanf or scanf) or from a string stored in memory
(sscanf).
Many of these operations generate values that the scan function stores in
various data objects that you specify with pointer arguments. Any such
arguments must appear in the varying length argument list, in the order in
which the format string calls for them. For example,
sscanf("thx 1138", "%s%2o%d",
&a, &b, &c);
stores the string "thx" in the char array a, the value 9 (octal eleven) in the
int data object b, and the value 38 in the int data object c.
It is up to you to ensure that the type of each actual argument pointer
matches the type expected by the scan function. (The pointer must, of course,
also point to a data object of the expected type.) Standard C has no way to
check the types of additional arguments in a varying length argument list.

Not every part of a format string calls for the conversion of a field and the
consumption of an additional argument. In fact, only certain conversion
specifications gobble arguments. Every conversion specification begins with
the % escape character and matches one of the patterns described below. The
scan functions treat everything else either as whitespace or as literal text.
Whitespace in a scan format, by the way, is whatever the standard library
function iswhite (declared in <ctype.h>) says it is. That can change if you
call the function setlocale (declared in <locale.h>) before you call the scan
function. Your program begins execution in the "C" locale, where whitespace is
what you have learned to know and love.
A sequence of one or more whitespace characters in a scan format is treated as
a single entity. It consumes an arbitrarily long sequence of whitespace
characters from the input. (Again, whitespace is whatever the current locale
says it is.) The whitespace in the format need not resemble the whitespace in
the input in any way. The input can contain no whitespace. Whitespace in the
format simply guarantees that the next input character (if any) is not a
whitespace character.
Any character in the format that is not whitespace and not part of a
conversion specification calls for a literal match. The next input character
must match the format character. Otherwise, the scan function returns with the
current count of converted values stored. A format that ends with a literal
match can produce ambiguous results. You cannot determine from the return
value whether the trailing match failed. Similarly, you cannot determine
whether a literal match failed or a conversion that follows it. For these
reasons, literal matches have only limited use in scan formats.
For completeness, I should point out that a literal match can be any string of
multibyte characters. Each sequence of literal text must begin and end in the
initial shift state, if your target environment uses a state-dependent
encoding for multibyte characters. I suspect, however, that you will have
little need to match Kanji characters with scan formats in the next few years.


Conversion Specifications


A scan conversion specification differs from a print conversion specification
in fundamental ways. You cannot write any of the print conversion flags and
you cannot write a precision (following a decimal point). On the other hand,
scan conversions have an assignment-suppression flag and a conversion
specification called a scan set. Following the % you write three components.
All but the last component is optional. In order:
You write an optional asterisk (*) to specify that the converted value is not
to be stored.
You write an optional field width to specify the maximum number of input
characters to match when determining the conversion field. The field width is
an unsigned decimal integer. Many conversions skip any leading whitespace,
which is not counted as part of the field width.
You write a conversion specifier to determine the type of any argument, how to
determine its conversion field, and how to convert the value to store. You
write a scan set conversion specifier between brackets ([]). All others
consist of one or two character sequences from a predefined list of about
three dozen valid sequences. The two-character sequences begin with an h, l,
or L, to indicate alternate argument types. I describe scan sets and list all
valid sequences in Table 1. Don't write anything else in a scan format if you
want your code to be portable.
The goal of each formatted input conversion is to determine the sequence of
input characters that constitutes the field to convert. The scan function then
converts the field, if possible, and stores the converted value in the data
object designated by the next pointer argument. (If assignment is suppressed,
no function argument is consumed.)
Unless otherwise specified below, each conversion first skips arbitrary
whitespace in the input. Skipping is just the same as for whitespace in the
scan format. The conversion then matches a pattern against succeeding
characters in the input to determine the conversion field. You can specify a
field width to limit the size of the field. Otherwise, the field extends to
the last character in the input that matches the pattern.
The scan functions convert numeric fields by calling one of the standard
library functions strtod, strtol, or strtoul (declared in <stdlib.h>). A
numeric conversion field matches the longest pattern acceptable to the
function it calls.


Scan Sets


A scan set behaves much like the s conversion specifier. It stores up to w
characters (default is the rest of the input) in the array of char pointed at
by ptr. It always stores a null character after any input.
It does not, however, skip leading whitespace. It also lets you specify what
characters to consider as part of the field. You can specify all the
characters to match, as in:
"%[0123456789abcdefABCDEF]"
which matches an arbitrary sequence of hexadecimal digits. Or you can specify
all the characters that do not match, as in:
"%[^0123456789]"
which matches any characters other than digits.
If you want to include the right bracket (]) in the set of characters you
specify, write it immediately after the opening [ (or [^). You cannot include
the null character in the set of characters you specify.
Some implementations may let you specify a range of characters by using a
minus sign (-). The list of hexadecimal digits, for example, can be written
as:
"%[0-9abcdefABCDEF]"
or even, in some cases, as:
"%[0-9a-fA-F]"
Please note, however, that such usage is not universal. Avoid it in a program
that you wish to keep maximally portable.
Table 1
Conversion Specifiers
 In the descriptions that follow, I summarize the match
pattern and conversion rules for each valid conversion
specifier. w stands for the field width you specify, or the
indicated default value if you specify no field width. ptr
stands for the next argument to consume in the varying
length argument list:
 c -- stores w characters (default is 1) in the array of
char whose first element is pointed at by ptr. It does not
skip leading whitespace.
 d -- converts the integer input field by calling strtol
with a base of 10, then stores the result in the int pointed
at by ptr.
 hd -- converts the integer input field by calling strtol
with a base of 10, then stores the result in the short
pointed at by ptr.
 ld -- converts the integer input field by calling strtol
with a base of 10, then stores the result in the long
pointed at by ptr.
 e -- converts the floating point input field by calling
strtod, then stores the result in the float pointed at by
ptr.
 le -- converts the floating point input field by calling
strtod, then stores the result in the double pointed at by

ptr.
 Le -- converts the floating point input field by calling
strtod, then stores the result in the long double pointed
at by ptr.
 E -- is the same as e.
 lE -- is the same as le.
 LE -- is the same as Le.
 f -- is the same as e.
 lf -- is the same as le.
 Lf -- is the same as Le.
 g -- is the same as e.
 lg -- is the same as le.
 Lg -- is the same as Le.
 G -- is the same as e.
 lG -- is the same as le.
 LG -- is the same as Le
 i -- converts the integer input field by calling strtol
with a base of zero, then stores the result in the int
pointed at by ptr. (A base of zero lets you write input
that begins with 0, 0x, or 0X to specify an actual numeric
base other than 10.)
 hi -- converts the integer input field by calling strtol
with a base of zero, then stores the result in the short
pointed at by ptr.
 i -- converts the integer input field by calling strtol
with a base of zero, then stores the result in the long
pointed at by ptr.
 n -- converts no input, but stores the cumulative
number of matched input characters in the int pointed at
by ptr. It does not skip leading whitespace.
 hn -- converts no input, but stores the cumulative
number of matched input characters in the short pointed
at by ptr. It does not skip leading whitespace.
 ln -- converts no input, but stores the cumulative
number of matched input characters in the long pointed
at by ptr. It does not skip leading whitespace.
 o -- converts the integer input field by calling strtoul
with a base of eight, then stores the result in the
unsigned int pointed at by ptr.
 ho -- converts the integer input field by calling
strtoul with a base of eight, then stores the result in the
unsigned short pointed at by ptr.
 lo -- converts the integer input field by calling
strtoul with a base of eight, then stores the result in the
unsigned long pointed at by ptr.
 p -- converts the pointer input field, then stores the
result in the void * pointed at by ptr. Each
implementation defines its pointer input field to be consistent with
pointers written by the print function.
 s -- stores up to w non-whitespace characters (default
is the rest of the input) in the array of char pointed at by
ptr. It first skips leading whitespace, and it always stores a
null character after any input.
 u -- converts the integer input field by calling strtoul
with a base of 10, then stores the result in the unsigned
int pointed at by ptr.
 hu -- converts the integer input field by calling
strtoul with a base of 10, then stores the result in the
unsigned short pointed at by ptr.

 lu -- converts the integer input field by calling
strtoul with a base of 10, then stores the result in the
unsigned long pointed at by ptr.
 x -- converts the integer input field by calling strtoul
with a base of 16, then stores the result in the unsigned
int pointed at by ptr.
 hx -- converts the integer input field by calling
strtoul with a base of 16, then stores the result in the
unsigned short pointed at by ptr.
 lx -- converts the integer input field by calling
strtoul with a base of 16, then stores the result in the
unsigned long pointed at by ptr.
 X -- is the same as x.
 hX -- is the same as hx.
 lX -- is the same as lx.
 % -- converts no input, but matches a percent
character. (%)














































Doctor C's Pointers (R)


The Memory Management Library




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091
or via UUCP at uunet!aussie!rex.


The C run-time library has long had a family of routines that enable a
programmer to allocate and free memory at run-time, at his pleasure. This
capability is a powerful one and was adopted (and somewhat expanded) in ANSI
C.
Oftentimes you define an array of elements (necessarily of fixed size) only to
find that, in most cases, you don't use all the elements or that, in some
cases, you need just a few more. What you need is the ability to have variable
sized arrays. However, according to the definition of C, the dimension of an
array in a definition must be a compile-time integer constant. That is, the C
language does not support such constructs. (Note that the Numerical C
Extensions Group, of which I am the convener, is investigating the possibility
of adding such a construct.) However, this idea can be implemented using the
memory allocation routines in the standard library.
The beauty of these allocation routines is twofold: the programmer determines
just when space is allocated and exactly how long it is kept, and, if the
program is written correctly, you can change the manner in which the space is
allocated and freed, transparently. Let's discuss the second point further.
ANSI C defines the term storage duration by saying "An object has a storage
duration that determines its lifetime. There are two storage durations: static
and automatic." I prefer to also add a third duration, dynamic. An object
having dynamic storage duration is one allocated by the programmer using the
library. (For the purposes of this discussion, the address space from which
dynamic objects are allocated will be referred to as the heap. This term is
widely used for this purpose but is not used in the ANSI C Standard.)
Consider the following example:
#include <stdlib.h>

void f()
{
char c1[100];
static char c2[100];
char *c3;
c3 = malloc(100);

c1[10] = 'a';
c2[10] = 'a';
c3[10] = 'a';
}
Ignoring the possibility of malloc() failing to allocate memory, c1, c2, and
c3 can be used to designate the automatic, static, and dynamic arrays,
respectively. Since the notation for referencing all three arrays is
identical, the executable code can be ignorant of the object's storage
duration. You can change from automatic to dynamic, from dynamic to static,
etc., with no real impact on the code, if you design it appropriately to begin
with.
The allocation functions somehow magically change the address space of our
program at run-time. The way in which this is done is specific to an
implementation and may vary widely. In any case, an understanding of such
details is unnecessary to use the allocation functions effectively. All you
need know is that if they succeed, the requested space is allocated
contiguously and you are given a base address.


The Parent Header


In the not too distant past, there were only four or five "standard" headers.
Apart from those, there was a wide variation as to which functions were
provided and in which header (if any) they were declared. ANSI C requires the
allocation functions to be declared in the header stdlib.h. Many
implementations currently declare them in malloc.h as well as, or instead of,
stdlib.h. I have also seen quite a lot of old code that contained explicit
declarations for these functions, presumably because no header in their
implementation contained them.
As a result of ANSI C, the declarations of these functions has changed both
with regard to return as well as argument types. ANSI C adopted the concept of
a void pointer from C++. This solved two important issues: it provided a
bridge for porting code across byte and word (and other) architectures where
different pointer types may actually have different physical representations,
and secondly, it provided a way to represent a generic pointer, one that
simply contained an address of some (unknown) object type.
Since the allocation routines are not given any information about the type of
object a programmer wishes to store in the allocated space, the pointers used
and returned by these functions were prime candidates for type void *. A
consequence of this is that the returned value no longer need be explicitly
cast. For example in the following case:
int *pi;
pi = (int *)malloc(10 * sizeof(int));
pi = malloc(10 * sizeof(int));
the assignments are equivalent since a void pointer is assignment-compatible
with all other pointer types. (Historically, it was common to see such casts
even though they generally were not needed. That is, strict pointer
assignment-compatibility checking was not enforced as is now required by
ANSI.)
If some of your code explicitly declares the allocation functions as having
return types of char *, without such casts you will get errors when compiling
in strict ANSI mode if the target of the assignment has type other than char
*. The best solution to this is to remove the explicit declaration and include
stdlib.h instead.
With ANSI's adaptation of function prototypes from C++, stdlib.h now describes
the allocation routines' argument type information as well. Again, all pointer
types here have type void * but this is of no consequence since any "real"
pointer type is compatible with void * and, as such, objects of such type can
be passed in.
ANSI C has invented the type size_t, the type of a sizeof expression. This
type is typedefed in numerous standard headers including stdlib.h and is used
in various library function prototypes (including the allocation functions)
for the type of sizes and counts. Since sizes and counts can never be
negative, size_t is an unsigned integer type. However, the underlying type of
size_t is implementation-defined and may be unsigned int or unsigned long.
Historically, descriptions of the allocation functions stated that sizes and
counts had type unsigned int.


The Allocation Functions




calloc



#include <stdlib.h>
void *calloc(size_t nmemb, size_t size);
calloc() allocates contiguous space for nmemb objects, each of whose size is
size.
The space allocated is initialized to all-bits-zero. Note that this is not
guaranteed to be the same representation as floating-point zero or the null
pointer constant NULL.


free


#include <stdlib.h>
void free(void *ptr);
free() causes the space (previously allocated by calloc(), malloc(), or
realloc()) pointed to by ptr to be freed.
If ptr is NULL, free does nothing. Otherwise, if ptr is not a value previously
returned by one of these three allocation functions, the behavior is
undefined.
The value of a pointer that refers to space that has been freed is
indeterminate, and such pointers should not be dereferenced.
Note that free() has no way to communicate an error if one is detected.
On some systems, most noticeably MS-DOS, freed space may not actually be given
back to the operating system. (It likely will, however, be available for
future allocations within that program.) It might only be really released when
the program terminates. One consequence of this is that if you try to execute
another program from within a running program that has freed up memory using
free(), there still might not be sufficient physical memory available to start
the new program.


malloc


#include <stdlib.h>
void *malloc(size_t size);
malloc() allocates contiguous space for size bytes. The space allocated has no
guaranteed initial value.


realloc


#include <stdlib.h>
void *realloc(void *ptr, size_t
size);
realloc() changes the size of the space pointed to by ptr to have size size.
If ptr is NULL, realloc() behaves like malloc(). Otherwise, if ptr is not a
value previously returned by calloc(), malloc(), or realloc(), the behavior is
undefined. The same is true if ptr points to space that has been freed.
size is absolute, not relative. If size is larger than the size of the
existing space, new uninitialized contiguous space is allocated at the end;
the previous contents of the space are preserved. If size is smaller, the
excess space is freed; however, the contents of the retained space are
preserved.
If realloc() cannot allocate the requested space, the contents of the space
pointed to by ptr remain intact.
If ptr is non-NULL and size is 0, realloc() acts like free().
Whenever the size of space is changed by realloc(), the new space may begin at
an address different from the one given it, even when realloc() is truncating.
Therefore, if you use realloc() in this manner, you must beware of pointers
that point into this possibly-moved space. For example, if you build a linked
list there and use realloc() to allocate more (or less) space for the chain,
it is possible that the space will be "moved," in which case the pointers now
point to where successive links used to be, not where they are now. You should
always use realloc() as follows:
ptr1 = realloc(ptr, new_size);
if (ptr1 != NULL) {
ptr = ptr1;
...
}
This way, you never care whether the object has been relocated since you
always update ptr each call, to point to the (possibly new) location.


General Comments


The way in which a heap is physically organized can vary widely. On some
systems, the stack and the heap (and possibly even the static data area) share
the same address space. On others, each may have its own address space. Some
MS-DOS implementations provide both near and far heaps.
Historically, many C implementations have permitted the allocation of zero
bytes to be successful. That is, a non-NULL pointer is returned. Since ANSI C
does not permit zero-sized objects to be defined, this practice was hotly
debated during X3J11 deliberations. As a compromise, if you attempt to
allocate zero bytes, it is implementation-defined whether a null pointer or a
unique pointer is returned.
We are told that if an allocation attempt fails, NULL is returned. The common
approach I've seen to this is to display some error message and call exit().
However, most applications I have seen could ill-afford to actually do this
since it would leave either disk files and/or shared memory data areas
compromised. For example, if you cannot get more dynamic space, you may have
quite some work to undo your current situation before you can gracefully
terminate or continue. On the other hand, failure to allocate more memory when
doing an in-memory sort can simply be handled by writing the sorted tree to
disk, freeing the memory, and starting on the next set of strings. In such
cases, the failure to allocate memory is not fatal. In cases where it is, you
must consider the ramifications of receiving a NULL return at design time, not
during maintenance when the first failure occurs.
When heap allocation fails it might well be useful to find out how much you
can get. Unfortunately, ANSI C does not provide this capability.
Several implementations (including Microsoft's) do provide some help in this
area. Either they can tell you how much is available now in one allocation or,
how many allocations you can make of a given size. (The two need not add up to
the same number of bytes since each time you request bytes, extra bytes may
also be fetched to help manage the space allocated.)
Similarly, ANSI C provides no help in debugging heap-related problems by
"walking the heap links" and the like. Again, it's up to the quality of the
implementation.
On some systems (VAX/VMS, for example) the cost of allocating memory
dynamically can be somewhat expensive. As such, a caching approach may be
taken. That is, when you free memory the larger of the freed block and the
cache currently held, will be kept. The idea is that if you alternately
allocate and free, each new allocation will have some chance of getting memory
from the freed cache.
ANSI C guarantees that any non-NULL address returned by the allocation
functions will be aligned appropriately so it can be dereferenced via any
pointer type. On systems that require object alignment, this means that space
is allocated in multiples of some cluster value (such as machine words, for
example.) On such systems, more memory may actually be allocated than you
requested. If your program contains a bug and copies (slightly) beyond the end
of allocated memory, the bytes overwritten may be those extra ones and no
error occurs. However, if you change the request to a few extra bytes, the bug
may manifest itself. The most common example I see is as follows:

char name[30];

getname(name);
pc = malloc(strlen(name));
strcpy(pc, name);
Here, strcpy() adds a null character to the destination but no space was
allocated for it. If the length was odd and malloc() allocates an even number
anyway, the problem will not be observed. However, with even length names it
may well appear.
It is considered good style to explicitly free allocated memory when you are
done with it. Presumably, if you don't, this is done when your program
terminates (although this is not so stated by ANSI C.) Note that if you
"forget" where your allocated space resides (by overwriting the pointer value
returned by malloc(), for example), there is no way of getting that address
back. One relatively easy way of having this happen is to use:
ptr = realloc(ptr, new_size);
If realloc() fails, you have lost the address of the original area.
An alternate memory allocation system also exists in many systems. It usually
involves using sbrk(). The two schemes are incompatible and must not be used
in the same program. ANSI C does not include this alternate scheme.


Transparent Heap Usage


It is possible that your program calls the allocation routines even if you
don't call them yourself. For example, some library routines might need
dynamic space to efficiently handle variable size amounts of local
information.
Many systems have a fixed limit on the number of open files they support.
However, others do not. They can achieve this by building a linked list of
FILE objects using the allocation routines. They may even include stdin,
stdout, and stderr in this list, in which case, the program startup code may
contain calls to malloc(), etc. Compile and link an empty main() program and
look at the linker map to see if these library functions are called at
startup.


Multi-Dimensional Arrays


Occasionally, it may be necessary to allocate a multi-dimensional array on the
heap. This can be done just as easily as for single-dimensioned arrays once
you master the required pointer declaration. For example,
double (*pd)[10];
pd = malloc(50 * sizeof(double));
pd[3][2] = 1.234;
By declaring pd to be a pointer to an array of 10 doubles, pd can be
subscripted to two levels. pd[3] designates the fourth row of 10 elements and
pd[3] [2] designates the third column in that row. (If you are confused about
the difference between a pointer to double and a pointer to an array of
double, you will have to wait for a future column.)




































Implementer's Notebook


Life With Static Buffers




Don Libes


This article is not available in electronic form.

















































Applying C++


Designing And Implementing A Text Editor Using OOP, Part 1




Tsvi Bar-David


Tsvi Bar-David is president of Deerworks and currently a faculty member in the
Software Engineering Department at Monmouth Collge. He received his PhD in
mathematics from the University of California at Berkeley. Previously, he was
employed at Bell Labs in the development and delivery of UNIX, C++, and
Object-Oriented courses.


In my July 1989 column on training for object-oriented programming, I
presented a simple framework for object-oriented design. Today we embark upon
a journey -- likely to last several columns -- in which we apply the design
framework to the problem of constructing a simple text editor. Along the way
we will develop some types which are not only useful in building the editor,
but also as tools in general, and so can serve as members of a general-purpose
object library.
Most languages, including C++, require that the solution to a problem be
represented as a main program. This we will do. Yet, our goal is not to design
and build programs, but rather to identify and construct useful object types,
out of which we can construct an infinite number of programs.
In a sense akin to mathematics, we are constructing a solution not to just one
problem, but rather to a family of related problems -- for example, the
problem of editing text. It is precisely this approach to problem solving, I
believe, that permits an object-oriented design (design as a noun, the result
of the design process) to be easily modified, enhanced and re-used. The
brevity of the main programs that we build reflects this approach; typically
these programs instantiate an object or two and then invoke a couple of member
functions. Bertrand Meyer [2] takes and supports very much the same position
in the eiffel language. Indeed, eiffel has no main program; one simply selects
a first object to which to send a message. The action associated with that
message goes ahead and creates other objects and sends messages to them ad
infinitum.


Design Framework


You should refer to the July 1989 column for details of the design framework.
In sum, the framework manages a process that maps a requirements document to
an implementable design document.
To quote the earlier column: "The heart of object-oriented design is the
identification of the types in the program and the relationships between them.
To identify a type is to specify its behavior (public interface). To identify
relationships means bringing to light the relationships (inheritance and
parametric types) in the behavior of the types. One can then implement the
behavior in many ways."
Here is the pseudo-code for the design process.
initial decomposition(on requirements document);
while( stopping condition has not been met ) {
abstraction;
type relationships;
type decomposition;
}

return design specification;
In order to begin the design process, we need a behavioral description of the
object we want to build, namely the text editor.


Describing The Editor


The ced editor allows the user to create new text files or edit existing ones.
The editor views the file as just a sequence of characters (thus the 'c' in
ced) with no other structure, such as a sequence of lines. Since newline
('\n') is just an ordinary character, we can easily recover the traditional
line structure of a file by using ordinary edit operations. In addition, the
editor maintains the notion of current point in the file. The point is
regarded as being between two characters. The notion of current point is
pretty close to the concept of current offset in UNIX files.
At this point, we have to make a requirements decision about the user
interface to the editor. For the sake of simplicity, assume that the editor
has a traditional command line interface like edlin on MS-DOS systems or ed on
UNIX systems (the input command stream looks like a sequence of lines). Each
line consists of an optional integer prefix followed by a character. The table
below associates commands with the characters that invoke them.
N.B. Bracketed arguments are optional
[n] g -- Move point to just before nth character (zero-based). Default value
for n is 0.
[n] p -- Print n characters starting at the first character after point,
followed by a newline. n defaults to 1. Increment point by n.
i -- Insert an arbitrary number of characters before point. Terminate
insertion with '.' on a line by itself.
[n] d -- Delete n characters starting at the first character after point. n
defaults to 1.
[n] y -- Paste whatever was last deleted n times just before point. n defaults
to 1.
w [file] -- Write out the internal representation of the file (the buffer) to
the named file. The primary default for file is the filename command line
argument to ced. If ced was invoked without a filename, it selects the last
file written to.
q -- Exit the editor.
? -- Print out useful information, like filename, point and size of file.
Normally the editor scans standard input for commands. However, for
flexibility, the editor should be able to get its command stream from a file
or possibly some other source, like a string or a window.
When the editor is invoked with an argument at the command line interface,
such as
prompt> ced filename
the editor opens an existing file for editing or creates an empty file of that
name. In either case, point is located just before the first character in the
file. If the editor is called without an argument
prompt> ced
it manages an editing session. The user decides how to explicitly write the
contents to a named file.
A typical edit session might look like:

36g
i
hello there
.
g
50p
w
q


Initial Decomposition


Our task now is to identify the high-level types from the requirements, out of
which we will construct the editor. Certainly File is one of these types and
is used in two ways: as the file to be created or modified, and as the command
stream (typically standard input from the terminal).
In our description of the editor write command, we briefly mentioned the
internal representation of the file under edit, traditionally known as the
buffer. Is the Buffer type synonymous with the File type? We can answer this
question more easily once we have described (the abstraction step) the public
interface of both File and Buffer; namely, if the public interfaces (really,
the manual pages) of two types are the same, then the types are one and the
same.
At the risk of getting ahead of ourselves, let's try to answer this question
right now. Assume that a File object essentially has the semantics of a
standard I/O FILE object (as supported by the standard run-time library of the
ANSI C compiler [1]).
Files and Buffers may very well share the offset or point concept. On the
other hand, whatever a Buffer is, it must support the editor commands listed
in the requirement section, particularly insertion and deletion. Yet, there
are no native insertion and deletion operators on Files. The operation that
puts a character into a file (putc( int, FILE *)) can be considered as
inserting only when appending the file, not if the file offset is anywhere in
the middle of the file (it will overwrite the character at the offset). This
is not the behavior we are looking for.
We conclude then that a Buffer is not a File, and so we must design and
implement the Buffer abstraction. Now we may be able to implement Buffer in
terms of File (as some implementations of the full-screen editor will do), but
that is merely (yes, merely!) a matter of implementation and is not to be
confused with the behavior or semantics of the Buffer.
Is the editor itself a type? Even though giving the Editor a type may seem
unnatural at first, we will reap the benefits already mentioned.
Our design policy is clear, albeit extreme -- everything in the application is
an object of one type or another. So what is the behavior of an editor object?
An editor object interprets the command stream and performs actions both upon
a buffer and the user interface, which for now is just standard output. That
is, the editor coordinates three objects: the input (command) stream, a
buffer, and an output stream (a view of the buffer).
For simplicity of design, assume that an editor object manages precisely one
buffer, which corresponds to at most one file. I say "at most" and not
"precisely one" since the edit program ced can be invoked with no arguments.
In such a case, the program presumably contains an editor object which manages
a buffer, which currently does not correspond to any file. Later on, we will
build an edit program based upon the Editor type which manages multiple
buffers and files, something in the spirit of emacs.
Now that we have identified the object types Editor, Buffer, and File we must
now perform the design process on each of the types.
We'll start with File since it is the most familiar of the types in our
working list. But why even bother representing File as a class when all C++
compilers already support the standard I/O FILE structure? There are several
reasons:
Consistency. We want objects of all types in our application -- other than
built-in types of the language -- to be represented by classes. This provides
developers and maintainers a uniform feel of object orientation. The message
expression
object.memberfunction()
will be the sole means of communicating with an object. Using a standard I/O
function like putc( 'a', fp) directly on a FILE pointer (fp) would violate
this desideratum.
Insulation. We can regard our File type as an application-specific type
layered on top of the environments's existing I/O support. This helps to make
the editor more portable. When you port the editor to a new operating system,
only the implementation of File need change. Other code that uses File doesn't
change one iota. But we can do better. Since every C++ compiler's run-time
support library contains FILE, we can just implement/layer File on top of
FILE. Furthermore, there won't be much of a run-time penalty for this
layering, if we declare all of the member functions of File to be inline!
For File's public interface you need the five classic operations of a minimal
interface.
open -- connect the program to the named file or create it.
close -- sever the connection between the program and the file.
iseof -- returns true if at end-of-file, otherwise false.
get -- get a character and advance the file offset.
put -- put a character and advance the file offset.
We can get rid of the explicit open and close member functions elegantly by
using a constructor and destructor respectively. The advantage of this
approach is that an instantiated File object is guaranteed to be initialized
properly. Furthermore, mapping close to the destructor guarantees that when
the File object dies (goes out of scope) in the program, the associated file
in the file system is automatically closed, without the client programmer
having to explicitly close it. The public interface of File as a C++ class is
class File {
public:
File( char *name = "", char
*mode = "r");
~File();
Truth iseof();
int get();
void put( int c);
private:
// data members
};
The constructor takes two arguments, and both are provided with defaults. Here
are the intended semantics. The declaration
File f;
invokes the default constructor File( "", "r" ), which connects the object f
to standard input for reading.
File f( filename);
invokes File( filename, "r") and so opens filename for reading.
File f( filename, mode);
opens filename with some mode (with the same semantics as fopen()). So, for
example,
File f( "foo", "w" );
opens the file foo for writing.
Before we wax too lyrical about the joys of using constructors in place of an
open function, we must face a design problem. Just after the constructor runs,
how do we know that the file is really open? If the open failed for any reason
(the file doesn't exist, we don't have the correct permissions, etc.), it
would be nonsensical to invoke any member function against the object.
One solution is to forget the constructor approach and just endow File with an
explicit open function with the following form
typedef int Truth;// boolean type
Truth File::open( char *filename, char *mode);
The open function could report success or failure of the operation in a manner
similar to the C/C++ library functions fopen() and open() ---- by returning a
boolean value (the value is regarded as boolean by convention).

However, for those who want to stick with the constructor approach, here is
another solution to the problem. Endow File with a member function
// returns TRUE if open succeeded in constructor
Truth File::isok();
whose sole purpose is to report on the status of the open performed in the
constructor. Perhaps a separate isok() function is unnecessary; iseof() can
report on open. But assigning this function to iseof() is bad design for two
reasons. First, checking for end-of-file is conceptually a completely separate
matter from checking to see if the open succeeded. And secondly, how are we to
interpret the return value of iseof() on a newly created file for writing? To
play it safe, we will have two predicate functions.
The File type was originally developed to support a lexical scanner object. To
make implementing the scanner easier, we included the following additional
member functions in File's public interface:
class File {
public:
...
void unget(int c);
int peek();
...
};
unget() pushes the character c back onto an input stream. c is the next
character get() gets. peek() returns the value of the next character without
removing it from the input stream.
As we have alluded, we can piggy-back or layer the implementation of File on
top of standard I/O FILE. One easy implementation is found in Listing 1.
The only thing difficult about this layered implementation was figuring out
that we needed a state data member for recording the status of the open. All
the member functions, with the exception of the constructor, are one-liners.


Wrap-up


In this column we have begun applying an object-oriented design framework to
the problem of constructing a text editor. Starting from a description of the
editor's behavior, we have identified three types of objects: Editor, Buffer,
and File. We discussed how File might be used by other types and let that
guide us in identifying its public interface. We then wrote a portable
implementation of File layered on top of the standard I/O FILE abstraction.
In the next column, we will continue on our journey, focusing our attention on
the Buffer abstraction. In the course of designing Buffer, we will become
acquainted with two useful parametric container types, Sloop[T] and Yacht[T],
that will make the implementation of Buffer rather simple.


Bibliography


[1] Brian Kernighan and Dennis Ritchie, The C Programming Language, second
edition, 1988, Prentice Hall.
[2] Bertrand Meyer, Object Oriented Program Construction, 1988, Prentice Hall.
(addresses object oriented design, including parametric types).

Listing 1
class File {
public:
File( char *name = "", char *mode = "r")
{
if( *name )
fp = fopen( name, mode);
else if( *mode == 'r' )
fp = stdin;
else
fp = stdout;
state = (int)fp;
}

~File() { if( fp) fclose( fp); }

Truth isok() { return state; }

Truth iseof() { return feof( fp); }

int get() { return getc( fp); }

void unget(int c) { (void)ungetc( c, fp); }

int peek() { int c = get(); unget(c); return c; }

void put( int c) { putc( c, fp); }
private:
FILE *fp;

int state;
};





























































Questions & Answers


Readability, Portability, And Coding Style




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


Q
I would appreciate your comments on the following questions and problems:
1. Type char: signed or unsigned?
Most compilers consider chars as signed by default. We, European users, make
extensive use of ASCII codes above 127 and the signed chars default does not
seem to be the best choice. Which mode, in your opinion, is "better"? Why are
constant chars considered as ints? The following:
char c = '‚';
if (c == '‚')
will work only if default char is unsigned. Otherwise, a cast to (char) is
necessary to get the program to work, yet the constant ‚ is clearly a char,
not an int.
2. Good use or abuse of #defines and typedefs?
What does one think of the current practice of #defineing or typedefing native
C types, like char into BYTE, unsigned char into BYTE or UBYTE, char * into
TEXT, int into COUNT, int into BOOL, etc.
Is there really a reason for this (except (sometimes!) for portability, of
course)? There is no such things (as far as I know) in the standard library
header files! Moreover, when strictly prototyped programs are compiled the
result is generally a long list of type mismatch errors (often pointer
mismatches between (char *) and (unsigned char *)).
3. New C programming style
What do you think of the 'new' (?) C style programming, … la PASCAL, with
(long) identifiers mixing lowercase and uppercase and banishing the
underscore?
Thanks for your opinion and sincerely yours,
Hubert Toullec
Angers, France
A
In the ANSI C committee meetings, there was considerable discussion as to
whether a particular feature of the language should be made right or whether
backward compatibility should be preserved, to avoid "breaking" existing
programs that used documented features of the language. If George Burns (in
"Oh God") remade the world from scratch, he "would make the avocados with
smaller seeds"; judging from the committee's discussion of this topic,
remaking C is much more complex.
Several features were left unchanged for the sake of backward compatibility
including the priority of the operators (even though some of the bitwise
operators could be used more comfortably if the priorities were modified).
Similarly, the type of plain chars was specifically left unchanged and thus
remains unspecified (i.e. not specifically typed as signed or unsigned). I
agree with you that unsigned chars are more useful. I sometimes use the char
type to hold small integer values, but they are usually non-negative integers.
The char data type has been converted to int since the early days of the
language. That eliminates having separate rules for character arithmetic.
Character constants should be treated the same way (signed or unsigned) as
character variables. Note that standard ASCII includes only seven bit
characters, so none of its values have the high order bit set. The C language
does not specify that programs must run if you include non-ASCII characters.
(Actually it specifies exactly which source characters are acceptable, but
that basically is the ASCII set). With your example,
char c = '‚';
if (c == '‚')
you have used a character that is not specified as being standard. The
compiler is not even obliged to compile the code. If you used the octal or
hexadecimal escape sequence to represent the character, then the compiler
would treat it as a regular character constant. I compiled with Quick-C and
ran the program in Listing 1 with one unexpected result. The results were:
Unequal -118 138
(char) Equal -118 -118
Hex Equal -118 -118
Hex (char) Equal -118 -118
Notice that the compiler treated both the char variable and the char constant
as signed. However, it treated the non-standard character as a regular integer
value. Some compilers provide a runtime switch on the interpretation of
character variables. You might try using one that has such a switch.
On your next question, I am strongly in favor of using typedefs to define
logical data types. Using typedefs is preferable to using #defines for
consistency's sake, as there are many types which cannot be described in terms
of a #define.
Declaring variables with typedefs captures a significant amount of information
for the maintenance programmer. Unfortunately the C standard, in my opinion,
does not go far enough in checking the use of typedefs. My favorite
illustration is:
typedef SPEED double;
typedef TIME double;
typedef DISTANCE double;
SPEED compute_speed(time, distance)
TIME time;
DISTANCE distance;
{
SPEED speed;
if (distance != 0.0)
speed = time / distance;
else
speed = 0.0;
return speed;

}
and in another program:
SPEED car_speed;
TIME car_time;
DISTANCE car_distance;
car_speed = compute_speed(car_time, car_distance);
car_speed = compute_speed(car_distance, car_time);
Under the ANSI standard, both of these function calls are compatible, but
logically one is erroneous. Some super lint or the compiler itself may one day
use the typedef information for error checking.
I agree that there is a problem with the type checking performed when
comparing or assigning unsigned char pointers and regular char pointers. This
problem is most irritating when it forces you to write the declaration as:
unsigned char *string = "ABC";
with a cast as:
unsigned char *string = (unsigned char *) "ABC";
The ANSI committee debated whether it would be okay to not require such a cast
in an initialization statement, but decided that consistency in typing was
more important.
Of course, I strongly urge using full names for the type names, e.g. BOOLEAN
instead of BOOL, etc.
On your final question, I am in favor of readable and meaningful variable and
function names. Some people may have heard of studies that conclude otherwise,
but ALongVariable-Name appears less readable to me than a_long_variable_name.
The latter appears closer to what you would expect to read in normal text.
How much you should use abbreviations in naming is an open issue. The more
abbreviations you use, the more you will have to remember and the more the
maintenance programmer will have to infer and comprehend when reading the
program. For example, XMT for transmit and TX for transaction may be common,
but does CMP stand for compare or compute?
Q
I am developing a simulation program for study of our company's manufacturing
plant using C Language compilers on IBM-PC/AT Machine.
I shall be thankful to you for sending information on various software tools
in C language for incorporating graphics in the Program.
P.K. Gupta
Gujarat, India
A
The only package with which I personally have extensive experience is
Essential Graphics by South Mountain Software, Inc., 76 So. Orange Avenue,
South Orange, NJ 07079 (201) 762-6965 ($299 list, $230 street). You can
distribute products built with Essential Graphics royalty-free, and you can
use direct coordinates (your x,y values specify an exact pixel location) or
world coordinates (your x,y values are transformed into a pixel location), the
latter at some price in speed.
The names in this package are somewhat unintelligible, since the developers
tried to stay with an eight character name. For example: grbx draws a box,
grwx draws an x at a point, hsrect draws a rectangle with a hatch style and a
label. As I mentioned above, I would prefer something like graph_box,
graph_write_x, and hatch_rectangle_with_label.
Essential Graphics also supports loading and saving PC Paintbrush .PCX files.
There are several other packages on the market, including Halo Graphics and
Advantage Graphics. Perhaps some of our readers may have comments on these or
other packages.


Reader Responses:




Commodore 128


In the May 1989 issue of The C Users Journal, I took note of the questions by
Mr. David Ockrassa regarding printing special characters such as the braces,
vertical bar, and tilde on the Commodore 128. Before I started programming the
Amiga in C, I dealt with the same problem. 
The problem is two-fold in nature. Because these characters are not in the
standard font set of the Commodore 128, the C language packages for that
machine generally include an editor that re-defines several characters bitmaps
to conform to the missing ones. These are saved with the file as a non-ASCII
byte. The problem occurs when the file is printed, because the redefined
characters may or may not have the same font set as that of the printer being
used. 
The solution is to write a small printer utility in C. The accompanying code
(Listing 2) accomplishes this task, and is available on most commercial
bulletin boards. I wrote several printer drivers of this type for the
Commodore 128 for use with different printers that have a few more features
than the included code such as pagination and filename/date headers. 
John D. Clark
St. Louis, MO


MS Dynamic Data Exchange:


This letter is in response to Ken Libert's request for material concerning MS
Dynamic Data Exchange.
If you contact Microsoft's product support services and ask for Windows
Software Development Kit support, you can request their Application Notes
concerning Dynamic Data Exchange. With this publication you get a disk
complete with examples and source.
The DDEAPP example allows you to initiate a session with Excel and actually
exchange cell data in multiple formats.
Tim Kuntz
University of Pittsburgh

Listing 1
main()
{
char c = '‚';
char c1 = '\x8A';

if (c == '‚'

printf("\n Equal %d %d", c, '‚');
else
printf("\n Unequal %d %d", c, '‚');
if (c == (char) '‚')
printf("\n (char) Equal %d %d", c, (char) '‚');
else
printf("\n (char) Unequal %d %d", c, '‚');

if (c == '\x8A')
printf("\n Hex Equal %d %d", c, '\x8A');
else
printf("\n Hex Unequal %d %d", c, '\x8A');
if (c == (char) '‚')
printf("\n Hex (char) Equal %d %d", c, (char)'\x8A'
else
printf("\n Hex (char) Unequal %d %d", c, '\x8A');
}


Listing 2
/* Printer driver for Gemini 10x */
#include <<stdio.h>>
main(argc, argv)
{
unsigned int count;
FILE infile, outfile;
char c;

outfile = 5;
open(outfile, 4, 7, " ");
for(count = 0; count << argc; count++)
{
infile = fopen(argv[count], "r");
while((c = getc(infile)) != EOF)
{
switch(c)
{
case '{':
c + 123;
break;
case '}':
c = 125;
break;
case '\\':
c = 92;
break;
case '~':
c = 126;
break;
case '':
c = 124;
break;
case '_':
c = 95;
break;
default:
if(islower(c))
c += 32;
else

c -= 128;
}
putc(c, outfile);
}
close(infile);
}
close(outfile);
}























































New Releases


Prolog And 'Curses' Added To Library




Kenji Hino




New Releases




CUG297 -- Small Prolog


Henri de Feraudy (France) has submitted a public domain Prolog interpreter.
His Small Prolog follows a Cambridge syntax (LISP-like syntax) that has
advantages for meta-programming and small code. The Small Prolog includes most
of standard built-in (predicates) based on Clocksin and Mellish's descriptions
in Programming in Prolog, although it can be extended by creating more user
defined built-ins. The disk includes C source files, make files,
documentation, and many Prolog example files that demonstrate Prolog features
for C programmers who may be unfamiliar with Prolog. The source code is very
portable and will compile under Turbo C v1.5 and Mark William Let's C v4 on PC
clones, Mark William C v3.0 and Megamax Laser C on Atari ST and Sun C compiler
on Sun-3.


CUG298 -- PC Curses


Jeffrey S. Dean has contributed PC Curses, v0.8. This shareware release of PC
curses is a C window functions library designed to provide compatibility with
the UNIX curses package. By fully utilizing the PC features, this package is
coded much simpler than the UNIX version. For example, there is no need for
cursor motion and screen output optimization on PC. Currently, there are two
major versions of curses database under UNIX; one is termcap, the other
terminfo. However, PC curses derives primarily from the former version, with
some features of the latter version. Moreover, additional routines (not in the
original curses package) are provided for the PC user. The distribution disk
includes a couple of demo programs, Small and Large model library for
Microsoft C v5.0 and Turbo C v1.5 compilers, and documentation that describes
all the functions in the library. The source code is obtained by paying a $20
fee directly to the author.


Updates




CUG220 -- Window BOSS


Phillip A. Mongelluzzo (CT) from Star Guidance Consulting has submitted
Revision 07.01.89 of The Window Boss. This release provides additional data
entry routines along with support for user-defined physical sizes (i.e. 43 and
50 line EGA/VGA screen sizes).


CUG198 -- MicroEmacs Source


Willam Bader has extensively updated a text editor, MicroEmacs v3.9. His
update includes not only bug fixes of the old version, but also additional
commands, portability improvement, and performance enhancement.
The new feautures of MicroEmacs are built-in emulation DEC EDT editor support
for VT100/VT200 keypads, function keys and scrolling regions, better VMS
support such as filter-buffer command and preservation of record format
attributes, extra commands such as insert a C format octal escape sequence,
scroll the screen horizontally, callable interface of Emacs (you can call
Emacs as a function), VMS subshell routines, support for ANSI color, BINARY
mode for MS-DOS, pull-down menu, and more.
The enhancements include a faster search routine, faster lookup for normal
keys and FNC macro, faster display routine.
Bader has tested the new version of MicroEmacs using the following compilers
and operating systems:
VAX11c under VMS4.1 on VAX-11/750, Microsoft C 5.0 under MS-DOS 3.20, Turbo C
1.5 under MS-DOS 3.20, CI86 2.30J under MS-DOS 3.20, Microsoft C under XENIX
386, cc under SunOS 3.5 on Sun 3/360C, cc under SunOS 4.0 on Sun 386i, cc
under BSD 2.9 on PDP-11/70.
In order to create an executable code for your environment, you need to turn
on/off the switches of Machine/OS definitions, Compiler definitions, Terminal
Output definitions, and Configuration options in the header file, estruct.h.
The distribution setting is to compile under MS-DOS using Turbo C.







































































On The Networks


How To Get Net Software




Sydney S. Weinstein


Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and President
of Datacomp Systems, Inc., a consulting and contract programming firm
specializing in databases, data presentation and windowing, transaction
processing, networking, testing and test suites and device management for UNIX
and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron
Road, Huntingdon Valley, PA 19006-2320.


First, an introduction, and a thank you. I am the new "Contributing Editor" of
the "On The Networks" column. I have written before for The C Users Journal
so, hopefully I won't be a total stranger to you. And, as David Fiedler said
in the last CUJ, I am the Elm coordinator. (Elm, itself, is a large piece of
freely distributable software.) I can be reached at syd@DSI.COM, for those
with Internet access, or at {bpa, vu-vlsi}!dsinc!syd for those without
Internet access.
I don't plan any change in the scope or content of this column. I will attempt
to report on the latest freely distributable software available on Usenet and
the Internet. Also as David did, I am willing to forward a list of neighboring
sites for access, provided you send me a self-addressed, stamped envelope. If
you have net access but need a news neighbor, I will reply to electronic mail
asking for nearby news sites.
To David Fiedler, a well earned thank-you for his two year tenure in this
spot. Many megabytes of useful software were highlighted here. His tireless
attempts to find neighbors for those sites that requested it, is also
gratefully appreciated. It was with his help that our site found its first
news neighbor several years ago. However, I highly doubt I can keep up with
his run of puns.


Some Definitions


For the past two years, the terms Usenet, Internet, internet, and "the net"
have been bantered about in this column. I would like to add a new one:
"freely distributable software." Some definitions are in order.
Usenet, often times referred to as "the net" is a loose collection of
cooperating computers. In the past, all of Usenet ran UNIX, but now with other
computers and operating systems supporting UUCP, hosts could be running
anything from MS-DOS to VAX/VMS. All that is required to be considered a
computer on Usenet is that you communicate via the UNIX to UNIX Communications
Protocol (UUCP) to another computer. Usenet consists of electronic mail, file
transfers, and network news. It is via network news that most of the programs
you read about in this column are distributed.
If your computer talks to Usenet or to another computer via some protocol
other than UUCP, you are considered to be on an internet (lower case "i"),
short for inter-network. This just means that you are using some network other
than the UUCP-based Usenet. This generic internet includes "the Internet" and
several other networks such as CSNET and BITNET. The actual connection to
Usenet is via a gateway computer that talks to both the network you use and
Usenet.
The Internet (capital "I") is the computer network loosely managed by the
Network Information Center at SRI. The Internet is a collection of networks
that grew out of the Defense Department's ARPANET (Advanced Projects Research
Agency Network). Usenet sites make phone calls to other computers; the
Internet is mostly machines connected with dedicated leased lines. These lines
usually run faster than the dial-up lines used by UUCP. The Internet has many
sub-networks associated with it, including NSFNET, the National Science
Foundation Networks. These newer networks run at much higher speeds and
currently also pick up a lot of the long distance traffic for Usenet's Network
News. In my area, the local NSFNET related network is called PREPnet and has a
backbone consisting of 1.544Mb/s (million bit per second) data links and each
site either has a 1.544Mb/s or a 56kb/s (thousand bit per second) hookup to
the network. The main backbone NSFNET is now all 1.544Mb/s data links and is
quickly upgrading to 45Mb/s data links as they become available. Whereas only
mail and news is usually available over the Usenet via UUCP, the Internet runs
the TCP/IP protocol and supports news (NNTP, Network News Transfer Protocol),
mail (SMTP, Simple Mail Transfer Protocol), remote logins to any computer on
the network provided you have an account there (telnet), and remote file
transfer (FTP, file transfer protocol), and many other services. All of these
services coexist and work in real time.
The problem with the Internet providing much of the bulk transfers for Usenet
is that they use two different addressing methods. Since a large amount of the
software mentioned in this column comes from Usenet or the Internet, you'll
need to understand how to format the two types of addresses. A UUCP or Usenet
address is made up of site names separated by exclamation points, as in
bpa!dsinc!syd. If a site wants to mention more than one "well-known site" to
use as a route, it usually lists them in curly braces as in {bpa,
vu-vlsi}!dsinc!syd (meaning you can use either bpa!dsinc!syd or
vu-vlsi!dsinc!syd). Such addresses assume that you know the complete path from
your site to one of the named "well-known sites". Some systems run programs to
help with this routing, and Usenet's UUCP Mapping Project publishes maps to
automate this process. However, not all sites have registered to be listed in
these maps. Registration is free and accomplished by sending your entry to
rutgers!uucpmap. The maps are continuously updated and distributed via the
comp.mail.maps news group.
On the Internet, all sites have a unique "Fully Qualified Domain Name" which
is administered by the NIC. My site's domain name is node.DSI.COM, where node
is the individual computer at my site. Thus, my full current address is
syd@dsinc.DSI.COM, but our mailer, like the mailers at a lot of Internet
sites, is smart and knows how to forward the mail to me even if you send it to
syd@DSI.COM. This allows me to move around within the DSI.COM domain without
having to tell everyone a new address. The Internet does not require users to
know the complete path to the site; it is sufficient to know the domain name.
Now a word of warning. Mixing both @ and ! in the same address leads to
trouble. Not everyone follows the standard and processes the addresses
correctly. Converting sitea!user@DSI.COM to a UUCP address would produce
dsinc!sitea!user. Note that the @ has higher precedence than the !. Many sites
get this wrong, causing your mail to bounce (be returned to you as
undeliverable). Some sites, ours included, allow UUCP mail to have addresses
including domain names in the ! path, as in dsinc!host.domain.type!user. Where
allowed, this convention is usually more reliable than mixing the ! and @s.
Lastly, what is Public Domain Software and what is Freely Distributable
Software? Much of the software described in this column is free in that no
licensing fee is required for personal users. In some cases even commercial
users aren't required to pay a licensing fee. However, almost all of the
software mentioned in this column is not in the Public Domain. For software to
be in the Public Domain, either the copyright must expire (and not be renewed)
or the authors must specifically renounce copyright protection. The copyright
to most software mentioned in this column is reserved by the author or some
group. Though the copyright is reserved, the holders have given the user the
right to use and distribute the software without fee. This does not place the
software in the public domain. You still cannot sell this software nor pretend
that you wrote it. Many of the licensing agreements restrict how the software
can be used for business purposes. Freely Distributable Software is also
different from Shareware. Shareware expects (but doesn't require) the user to
pay a fee if they intend to continue using the program. Freely distributable
software does not.
Now, how do you get the software mentioned in this column? Much of the
software mentioned in this column is distributed in Usenet's network news,
especially in the comp.sources.unix or the comp.sources.misc news groups. Game
software is in the comp.sources.games group. There are also groups for amigas,
atari sts, macs, suns, and computers running the X windowing system. The
Usenet news groups are distributed via a store-and-forward broadcast from
Usenet neighbor to Usenet neighbor either via UUCP or NNTP. However, news
articles are kept online at a particular site for only a short period of time,
usually less than two weeks. By the time a piece of software appears in this
column, it will have been expired and deleted for a long time.
Thus, it is necessary to access a news archive site. Many sites around the
country have agreed to archive specific news groups. These sites are listed in
the comp.archives news group. Many sites are also identified as archive sites
in their Usenet Mapping Project map entry. Some have even been listed in this
column. These sites allow access to their archives to retrieve the sources.
How one accesses the archives depends on where they are and how that site has
set up access. Most archives are available for either FTP or UUCP access and a
few even allow both.
If a site supports FTP access, you need to be on the Internet to access them.
FTP allows for opening up a direct connection to the FTP server on their
system and transferring the files directly to your system. FTP will prompt for
a user name and optionally a password. Most FTP archive sites allow a user
name of anonymous. If it then prompts for a password, any password will work,
but convention and courtesy dictate that you use your name and site address
for the password.
If a site supports UUCP access, anyone with UUCP can access the archives. Most
sites of this type publish a sample entry for the Systems (L.sys) file showing
the system name, phone number of their modems, the connection speeds
supported, and the login sequence. Using the uucp command, one can poll the
system directly and retrieve the software. Many sites post hour restrictions
on when you should access the modems. Courtesy dictates that you follow their
requests, and some sites enforce the limit with programs. Be sure to call far
enough before the end of the period to complete your transfer in time.
A third method, used for smaller files, allows access to an electronic
mail-based archive server. With these sites, you send an electronic mail
message to the archive server's mailbox name specifying the files you wish.
The files are then returned to you via electronic mail. Remember that many
sites have a limit on the size of a single mail message, so don't ask for too
much at once. Also remember that the archive server is a program, so phrase
your request exactly as specified in the instructions for that archive server,
and limit your message to exactly that request. Other comments in the message
could confuse the program and it might not honor your request.
Lastly, for those sites not connected to any network, some sites will copy the
software onto your media if you send them a disk or tape along with return
postage and a mailer. Other sites sell media with the software already copied
onto it. This is especially useful for the largest distributions, such as the
X windowing system, which runs multiple tapes.
For those sites without Internet access but who do subscribe to UUNET, UUNET
will retrieve the files via FTP for you and make them available for UUCP
access.


And to come...


Starting in February, back to more new software from Usenet's source
newsgroups and news from the Internet and public access sites. If you have an
archive of UUCP-accessible software and would like even more accesses to it,
drop me a note via electronic mail and I'll try to get it into an upcoming
column. Until then, a slight paraphrase of David's tag line: see you on the
nets!


















PC-METRIC -- A Measuring Tool For Software


Larry Versaw


Larry Versaw is a systems engineer at Electronic Data Systems' Corporate
Communications Division. His 1984 masters thesis was entitled Measuring the
Size, Structure, and Complexity of Software. He may be contacted care of 5400
Legacy Drive, Plano, TX 75024.


Have you ever wanted to compare the complexity of two programs or to tell how
long it took to develop them? Have you ever needed a precise measure of
programmer productivity? No one yet can produce truly reliable answers to
these problems, but researchers in the 1970s invented many software metrics
and have since conducted hundreds of experiments to see what information could
be derived by analyzing program source code. Some metrics purport to measure
software complexity; others gauge program size or calculate how well
structured a program is. The researchers developed many static code analyzers
for use in their software metrics experiments, but few such tools were
commercially marketed. PC-METRIC, developed by SET Laboratories, Inc., is one
of the few stand-alone software metrics programs, if not the only one, said
today.
To evaluate PC-METRIC, I tried it out on 80 source files containing 25,000
lines of working C code. That exercise proved PC-METRIC to be a reliable
product, an efficient program measurement tool that would be indispensable to
anyone wishing to use software metrics in his work.


The Product


For this article I evaluated v1.1, then v2.3 of PC-METRIC. In addition to the
C language versions which I examined, SET Laboratories has produced metrics
programs for Ada, Assembler, COBOL, FORTRAN, Modula-2, and Pascal. Some
languages are supported on systems other than MS-DOS.
PC-METRIC specializes in static code analysis; that is, it reports certain
quantifiable attributes of program source code without executing it. These
attributes include number of source lines, number of executable statements,
and a dozen other quantities which are derived by counting certain program
elements. Software metrics experiments have usually shown correlations between
these kinds of metrics and actual, observed software management factors, such
as programmer skill, number of remaining bugs, and actual programming effort.
PC-METRIC is based on the work of several of the pioneers in software metrics,
notably Tom McCabe and Maurice Halstead. McCabe [McCabe 1976] proposed a
measure of program control flow complexity based on a program's directed
control flow graph. This metric, called cyclomatic complexity, may be
calculated as one plus the number of branches (if statements, loops,
alternatives in case statements) in a program. PC-METRIC reports two variants
of cyclomatic complexity for each function it analyzes. McCabe's metric is
widely accepted and intuitively satisfying as a complexity measure because it
represents the amount of program logic that must be understood and retained to
understand an algorithm.
One of the most imaginative and ingenious models of software, including
software size, was developed by the late Professor Halstead [Halstead 1977].
Halstead's system, labeled software physics, is ultimately based on counts of
operators and operands in program source code. Several of PC-METRIC's metrics,
including length, estimated length, purity ratio, volume, the effort metric,
estimated time to develop, and estimated errors, are implementations of
Halstead's software physics formulae. Some have seriously questioned the
theoretical basis underlying Halstead's model, and Halstead's attempt to bring
theory from the realm of psychology to bear on software development has been
widely discounted [Coulter 1981, Perlis 1981]. On the other hand, some rather
impressive correlations have been observed between certain of Halstead's
metrics and such management factors as code quality, programming time, and
debugging effort [Gordon 1979, Curtis 1979, Funami 1976, Paige 1980].
If you are experienced with software metrics, you may find some of your
favorite metrics missing from PC-METRIC's repertoire. However, PC-METRIC
supports more measures of program size and complexity than are actually
needed. Most size and complexity metrics are highly correlated with each
other, so that beyond the first two or three, additional size and complexity
metrics are redundant. In a study which analyzed great quantities of source
code written by diverse programmers in C, Ada, PL/I and Pascal, no
statistically significant differences were found among the reliability of
different size metrics [Versaw 1984]. They all measure the same attribute,
after all. Variations in programming style notwithstanding, it is my belief
that lines of code remains as good a measure of program size as any other
measure we have today, and is almost as good a measure of complexity as any
other. Research continues on the subject, but on a smaller scale than ten
years ago.
Installing and using PC-METRIC is simplicity itself. A user must learn only
one command, CMET, which runs interactively or in batch mode. PC-METRIC is
configurable to different dialects of C, by modifying a table of key words and
symbols, which is stored in an ASCII text file.
As PC-METRIC analyzes source code, it produces two reports. The complexity
analysis report lists metrics values calculated for each function, and the
combined values for the entire module being considered. In the new version of
PC-METRIC, SET has remedied the worst problem with version 1.1, which was its
inability to analyze units of source code larger than one file. The second
report file, called the exceptions report, highlights all measured values
which lie outside of predetermined, user-defined limits.
Both the analysis report and exceptions report are output as ASCII files. In
the current version of PC-METRIC, these reports are suitable for printing
without any manual editing or reformatting. The current version of PC-METRIC
provides a CONVERT utility which can convert the report data into a
comma-delimited text file suitable for uploading into many spreadsheet or
database packages. This is an especially valuable addition to the PC-METRIC
package.
Program attributes which cannot be measured by simply counting certain
operators and identifiers are beyond the scope of PC-METRIC, unfortunately.
These would include attributes such as the degree of information hiding,
module coupling, function binding, and efficiency. If we could only measure
these attributes objectively and automatically, it would greatly enhance the
practice of software engineering. Where PC-METRIC does excel is in calculating
reliably the most common size and complexity metrics with a minimum of fuss at
a reasonable speed (4000 lines per minute on a 10 MHz AT type computer).


System Requirements


PC-METRIC requires far less memory and disk space than any C compiler would,
so hardware requirements do not limit the use of PC-METRIC.


The Audience


PC-METRIC is intended primarily for two kinds of users. The first is software
developers who would use a statistical analysis of their code as a help in
identifying overly complex modules or functions. The PC-METRIC manual
correctly identifies programmer feedback as an important application of
PC-METRIC. The second kind of person who needs PC-METRIC is the manager or
software project leader who would use software metrics as an tool to monitor
programmer compliance to local standards of function size, module complexity,
or other quantifiable program aspects.


Documentation


All bases are covered in PC-METRIC's three-part manual. Part 1 provides a
well-written tutorial on the field of software metrics, concentrating on the
specific metrics obtainable with PC-METRIC. It even includes a brief annotated
bibliography of software metrics literature. Users with little prior exposure
to the field of software metrics should be sure to read this part.
Part 2 describes how to install, configure, and use PC-METRIC. It also is
well-organized and gives the right amount of examples. PC-METRIC's counting
strategy is documented toward the end of this section.
Part 3, "Applying PC-METRIC", instructs users on what to do with all those
numbers PC-METRIC generates. It first documents the indispensable new CONVERT
utility mentioned above. Then it explains ways to interpret the results: how
to properly use software measures as a feedback tool or resource estimation
tool, in practice.


Support


SET Labs offers technical support by telephone for their products and will
answer general questions on software metrics. SET offers site licensing as
well as individual licenses. If you have a particular machine or language for
which you would like a version of PC-METRIC, SET Labs will usually do a port
for the price of a single site license.


Conclusions


PC-METRIC is an indispensable tool, and perhaps the only tool in its class,
for analyzing program size and complexity by those software metrics it
provides. By cleaning up the reports and by providing the CONVERT utility, the
new version of PC-METRIC has enhanced users' ability to analyze and apply
program metrics.

PC-METRIC applies state of the art methods for objectively measuring two basic
attributes of program source code: size and complexity. The usefulness of
these measures is variable, but not because of any deficiency in PC-METRIC
itself. PC-METRIC, and counting programs in general, find their surest
application in measuring adherence to a specific coding standard.
I recommend PC-METRIC for programmers and managers as a tool for monitoring
adherence to their coding standard which could, and probably should, include
some complexity metrics. I recommend it also as a tool for identifying overly
complex modules that need extra testing or rewriting. The list price is $199.
You can contact SET Labs for more information at P.O. Box 86327, Portland, OR
97283 (503) 289-4758.
References
Coulter, Neal S., Applications of Psychology in Software Science, Proceedings
of IEEE COMPSAC 81, (1981), 50-51.
Curtis, Bill; Sheppard, Sylvia; and Milliman, Phil, Third Time Charm: Stronger
Prediction of Programmer Performance by Software Complexity Metrics,
Proceedings of Fourth International Conference on Software Engineering,
(1979), 356-360.
Funami, Y., and Halstead, M.H., A Software Physics Analysis of Akiyama's
Debugging Data, Proceedings of the Symposium on Computer Software Models,
(1976), 133-138.
Gordon, Ronald, A Quantitative Justification for a Measure of Program Clarity,
IEEE Transactions on Software Engineering, IV (March 1979), 121-128.
Halstead, Maurice, Elements of Software Science, New York, Elsevier, 1977.
McCabe, T.J., A Complexity Measure, IEEE Transactions on Software Engineering,
II (December 1976), 308-320.
Paige, M., A Metric for Software Test Planning, Proceedings of IEEE COMPSAC
80, (1980), 499-504.
Perlis, Alan J.; Sayward, Frederick G.; and Shaw, Mary, editors, Software
Metrics: An Analysis and Evaluation, Cambridge, Massachusetts, MIT Press,
1981.
Versaw, Larry, A Tool for Measuring the Size, Structure and Complexity of
Software, thesis, Denton, Texas, North Texas State University, 1984.



















































GRAD Graphics Library


Ron Burk and Helen Custer


Ron Burk has a BSEE from the University of Kansas and has been a programmer
for the past 10 years. He is currently president of Burk Labs, a small
software consulting firm.


Helen Custer holds degrees in Computer Science, English, and Psychology from
the University of Kansas and is currently a Senior Software Technical Wrter
for a Fortune 500 company. She has coauthored books on C, GW-BASIC, and
Z-BASIC.


Both may be contacted at Burk Labs, P.O. Box 3082; Redmond, WA 98073-3082.


The GRAD Graphics Library, written by Conrad Kwok, is a shareware package for
drawing simple graphics images, including circles, lines, ellipses, arcs, and
rectangles. It can also fill regions, display characters, and dump screen
graphics to your printer. The 50-odd graphics functions are carefully
documented in a 100-page users manual.
The functions are written for PC/XT/AT clones using Microsoft C v4.0. The
package is written in Microsoft C and 8088 assembly language. GRAD can also be
compiled with Turbo C, and directions for doing so are included with the
disks; a fewer minor changes are required.
GRAD supports both CGA (640 x 200) and HGA (720 x 348) graphics cards, but
unfortunately, it only supports one device at a time. You link with the
library that corresponds to the device you want to use; there is no
auto-detect of the graphics card adaptor. The routines are modularized in such
a way that it may be possible to make them work with other graphics devices by
changing one or two files of source code. However, a graphics device is not
absolutely necessary, as GRAD allows you to define up to nine virtual graphics
screens at run time.
GRAD also supports several printers, including the Epson FX-80, the Okidata
ML192, and compatibles, or laser printers using the JLASER card. You can also
configure other printers to work with GRAD.
The GRAD user's manual and assorted documentation files thoroughly document
the functions that are available in GRAD. The writing is friendly and, in
addition to GRAD, the files document a number of concepts relating to graphics
libraries in general. It describes how fonts are viewed by a graphics package,
how to use a graphics coordinate system, and what a virtual graphics screen
is, among other things. Example code is provided for functions that are
difficult to describe.


Pixels Vs. Lines


The graphics screen on most personal computers is pixel-oriented; it is made
up of dots that you can turn on or off. A pen plotter, on the other hand, is
line-oriented; everything it draws is made up of line segments. The GRAD
library is oriented towards pixel graphics. For example, it supports the
ability to "grab" a rectangular portion of the screen and transfer it to
another part of the screen. That sort of operation could not be implemented
with a pen plotter.
You could, however, use GRAD as a PC device driver for a more general set of
line-oriented library functions. GRAD supplies almost all of the primitives
you would need for such a project. Also, most printers support pixel graphics,
so they can serve as hard-copy devices for programs that use pixel graphics.
If your printer is similar to the Epson FX-80 or the Okidata ML192, you can
adapt the software to work with your printer. An appendix at the back of the
manual documents that process. In the general case, however, you may have to
buy the source code from the author to make GRAD work with your printer.


Standard Transformations


In some kinds of graphics, you find yourself drawing the same basic symbol in
slightly different ways (different proportions, different locations on the
screen, and so on). Three types of transformations of graphics are commonly
supported by high-level graphics libraries:
Translation--Moving a graphic element (a square, for example) to a new
location.
Scaling--Making a graphic element appear shorter and fatter, or taller and
thinner.
Rotation--Turning a graphic element around an axis.
GRAD supports graphics translation by allowing you to change the value for the
upper left corner, or "origin", of your frame. For example, if you wish a
graphic element to appear multiple times in your final drawing, you can create
a subroutine that draws the element, then call that subroutine multiple times.
Between calls to the subroutine, you simply change the origin for the element.
GRAD does not support graphics scaling or rotation. If you want to draw the
same symbol with different heights or widths, you must implement the scaling
with your own code. Likewise, if you want to rotate a graphic image so that it
appears sideways or upside-down, you'll have to write your own code to do
this.
One reason you might want to do a transformation such as scaling is to solve
the problem of aspect ratio. Aspect ratio is the ratio of a pixel's height to
its width. GRAD assumes that each pixel is square, the same height as width.
However, on a typical CGA monitor, each pixel is rectangular instead of
square, that is, its aspect ratio is not 1:1.
The aspect ratio problem becomes very clear when you ask GRAD to draw a circle
on a CGA monitor. It draws a true circle, but because the pixels are not
square, the result on the screen is a "stretched" circle (an ellipse). In a
line-based graphics library, this problem can be solved by applying the
appropriate scaling transformation just before translating the line into
pixels. In GRAD, however, there isn't much you can do except take the problem
into account in your code and draw a rectangle to get a square, an ellipse to
get a circle, and so on.


Virtual Screens


A virtual screen is just like the real screen in every way--you just can't see
it. Suppose you want your graphics program to have the ability to undo the
last drawing request the user made. One way to accomplish the visual part of
this task is to use a virtual screen. For each user request that is not an
undo request, you first perform the previous request on the virtual screen,
then perform the new request on the real screen. If the request is an undo,
you could simply copy the virtual screen to the real screen.
GRAD provides virtual screens which it calls "frames". A frame is a
rectangular memory area where a graphic image is stored. If the memory area
corresponds to video memory, then the graphic is visible on the screen. If the
memory area is regular memory, the frame is a virtual graphics screen. A
graphic image created in this area can only be seen by dumping it to the
printer or by copying it to the video memory. Frames are especially useful for
windowing operations, as described in the following section.


Drawing Attributes


GRAD allows you to specify line styles and writing modes. Normally, when you
draw a line across the screen, you get a solid line. A line style, however,
allows you to specify that all lines are dotted lines, or dashed lines, or
almost any pattern of dots and dashes you like.
Another drawing attribute that GRAD lets you specify is the writing mode. On a
pen plotter, a line is a line--you can never erase an existing line. On a
graphics screen, however, there are several interesting possibilities.
Usually, you want the screen to look like it would on a pen plotter. This is
called OR mode, since it is accomplished by bitwise ORing the pixels to be
drawn with the screen pixels' existing value. GRAD also supports an XOR mode
and an AND mode.
The XOR mode can be used to "erase" lines, because if you draw a line in OR
mode and then redraw the line in XOR mode, the line disappears. This isn't
perfect, however; if there is a second line on the screen that intersects the
first one, it will have a "hole" in it, because the pixel where the two lines
intersected is turned off. You can also use XOR mode to achieve a kind of
reverse-video effect, by turning on a block of pixels, switching to XOR mode,
then drawing on the block.
Drawing lines in AND mode doesn't make much sense, because the only pixels
that will get turned on are those that were already on. In other words, it
will look as though nothing got drawn. AND mode is useful for Bit-Block
Transfers, however.
Bit-Block Transfers, or bitblts (pronounced "bitblits"), are at the heart of
windowing systems that operate in graphics mode. For example, moving a window
from one place to another is a bitblt operation; so is removing a window
(copying a block of background pattern to it). GRAD provides basic bitblt
operations that allow you to transfer blocks between virtual screens and to
and from files. GRAD's bitblt operations obey the current writing mode, so you
can combine the block transfers with the bit-wise modes to do things like
erase a window or cause a window to appear in reverse-video.



Clipping


Clipping is the ability to restrict graphics output to a specific (usually
rectangular) region of the screen. For example, if you are using an inch-high
strip along the bottom of the screen to display status information about your
program, you want to ensure that no other part of your output strays into that
area. If your graphics library supports clipping, you can define a clipping
rectangle. Your program can then continue to draw anywhere it likes, but only
that portion of the drawing that lies within the clipping region appears.
GRAD allows you to specify a single, rectangular clipping region called a
"window". There is no on/off function to disable or enable the defined
clipping region. Instead, GRAD supplies a ResetWin() function that redefines
the clipping region to be the entire virtual screen (which effectively turns
clipping off).


Drawing Text


Whether you are drawing business charts or flowcharts, you inevitably need to
display text along with your graphics. There are two general ways to draw text
in graphics mode on a pixel-oriented device. Bitmapped fonts are the kind you
normally see in text mode on a PC screen. As the name implies, they are
defined in terms of a set of bits that are on or off, each bit corresponding
to a pixel of the overall character. Bitmapped fonts are easy to define and
fast to display, but difficult to scale up and down in size, and difficult to
clip except on character boundaries. Stroke fonts, on the other hand, are
stored as line segments and, therefore, can usually be scaled up and down in
size, stretched in any direction (to form slanted text, for example), and even
rotated to arbitrary angles.
GRAD has no stroke fonts but supports bitmapped fonts. These fonts can be
stored on files and loaded into memory dynamically, as needed. This is useful
when you want to use many fonts, but don't want to consume a lot of memory.
You can get the effect of rotated fonts (you can get one of four, 90-degree
rotations) by using a specially rotated font file. There are 18 font files on
the GRAD disk. Although most of these are variations on a couple of fonts,
they provide good examples of what you can do. You can also make bitmapped
fonts that have variable width. This looks more professional, especially with
larger fonts.
GRAD also supplies a graphics input function that reads from the keyboard.
This is very handy when you need to query the user while you are drawing
graphics, since you will want the keyboard input to be echoed on the screen
with graphic text. Remember that just calling gets() probably won't produce
the desired result when the screen is in graphics mode.


Picture Segments


If you are writing a program that allows the user to manipulate the graphics
drawn on the screen, you may want to provide a way for them to control units
of the picture more complicated than individual pixels or lines. For example,
the user of an architectural program may want to move an entire wall
(including windows and doors) as a unit. Picture segments support this type of
operation. A picture segment is just a sequence of drawing commands that you
can store, retrieve, and use to draw the same object in a variety of places on
the screen. GRAD does not provide picture segments as such, but it defines a
draw() function which is a step in that direction.
draw() takes three arguments: a C string containing drawing commands and two
integer arguments that can be used to parameterize the commands in the string.
The key feature of the graphics commands that you store in the string is that
they are relative to the current drawing coordinate. For example, here is a
command that draws a rectangle at the current location.
Draw("RT10 DN5 LF10 UP5", 0, 0);
It always draws the same size rectangle; however, it could be parameterized
like this:
Draw("RT%OX DN%OY, LF%OX, UP%OY", 3, 10);
In this case, the arguments to draw() alter the symbol that is specified in
the command string.
Notice that you could build up command strings, save them to files, and bring
them back later -- just as you would use a symbol library. GRAD just supplies
the basic command string ability, though. You would have to design your own
functions to manage a symbol library.


Graphics Environment


If you were going to implement a symbol library, you would want the drawing of
symbols to be modular. The symbol might draw in a different line style,
graphics mode, or font, or use a different clipping window than the calling
routine. A modular library would ensure that each symbol routine resets all
these attributes back to their original values after the symbol is drawn.
Fortunately, there is an easier solution. GRAD groups attributes like the
current origin, clip region, line style, font, and so on, into a bundle which
it calls an environment. The modular symbol or graphics routine can simply
save the current environment before it begins drawing, and restore it after
all its graphics operations are complete.


Utility Programs


The GRAD disk contains several utility programs, which Conrad Kwok wrote as
sample programs for the library. The first program, Interp, is an interpreter
for GRAD library functions. You can place a series of GRAD function calls in a
file, then give that file name as an argument to Interp. Interp interprets the
graphics commands and draws the resulting graphic on the screen. This is a
fast way to experiment with the library, since you don't have to recompile
anything to make changes to what you're drawing.
The input to the interpreter mimics the analagous C functions. Whenever a
particular function returns a value, you can simply write:
var1 = function(val1, val2, ...)
The variable that you name (in this case, var1) is created and initialized by
the value returned by the function. Similarly, for functions that return
values through pointers, you can type something like this:
function (&var1, &var2, ...)
The variable names available for use are hard-coded in the program, but the
source to the interpreter is supplied, so you could easily extend it. Listing
1 shows a sample input file which draws an ellipse around the text "The C
User's Journal".
MPrint (Merge Print) is a variation of Interp. MPrint allows you to specify a
file containing lines of text that are merged with the graphics drawn by the
interpreter. In other words, you can print graphics in graphics mode on the
printer and print the text portions in text mode, which is much faster than
printing text in graphics mode.


Distribution and Licensing


The GRAD graphics library is a good, basic, integer graphics system. It
contains a complete set of primitives which could be used as a base for a more
sophisticated graphics package, a floating-point package, for example. The
main disadvantage of the library is that it is not written for multiple
graphics adaptors. You must compile the library for a specific adaptor.
Conrad Kwok, GRAD's author, is distributing this graphics package as
shareware. If you find his program useful, he requests that you send a
contribution of $20 to him. If you send $20 or more, you will receive updates
to the library. If you send a contribution of $60 or more, you will get the
source for the latest version of GRAD, as well as a programmer reference
manual which documents the internal data structures and algorithms used in the
library. The source is copyrighted.
The licensing terms for GRAD are as follows:
You may freely copy and distribute the GRAD library and related programs
provided the documentation and sample programs are not modified in any way.
However, you may write additions to the library and distribute those along
with the original library.
You may not charge a fee for distributing the library or your enhancements to
it. However, you may charge a small fee for the cost of the disk, shipping,
and handling.
Your program must be in the public domain and must contain a message
indicating that it contains code from GRAD, written by Conrad Kwok.
If your program does not meet the above requirements, you must get written
permission from Conrad Kwok before distributing it.

































































Publisher's Forum
To show our appreciation for your readership and to commemorate The C Users
Journal's second anniversary, we've bound a combination calendar and reference
card into this issue. P.J. Plauger prepared the reference card. It summarizes
calling conventions for Standard C library functions. Susan, our staff artist,
prepared the calendar. We hope you find at least one side useful.
This issue begins our third year of publishing The C Users Journal. It also
marks our first issue on a monthly publication cycle.
Two years ago, when we first combined The C Journal and The C Users Group
Newsletter, the Journal was 72 pages and went to 6800 subscribers. This issue
of 144 pages will be distributed to over 23,000 subscribers; another 5400
copies will go to newsstand distributors.
The magazine and related activities now employ 16 persons -- up from about ten
a year ago. To accommodate this extra staff, we've just moved into larger
quarters, about two blocks from our old office. (We're all moved in, but we're
not yet 100 percent functional. There are still little things missing -- like
my terminal, and Donna's doorknob, and Kenji's return air vent, and ...)
We think all these signs are cause to celebrate. (Well, maybe all but the
moving ... that's pretty traumatic.) Since it's your interest in C that has
stimulated this activity, we wanted to share the celebration with you.
Unfortunately, it's difficult to coordinate a celebration involving over
23,000 persons scattered around the globe. We considered mailing you each a
party favor with instructions about when to toot your whistle, but the
reference card seemed more practical. If nothing else, we're always practical.
(Personally, I'm celebrating by trying to catch up on some lost sleep.)
We hope you like the card. We offer it with our heartfelt gratitude; thanks
for reading the magazine, thanks for writing for the magazine, and thanks for
advertising in the magazine. We'll be doing our best to earn your continued
participation.
Sincerely yours,
Robert Ward
Editor/Publisher



















































New Products


Industry-Related News & Announcements




Oasys Offers Green Hills C++


Oasys, Inc. has introduced the Green Hills C++ compiler, which supports cross
and native mode development. Green Hills C++ is integrated with the Oasys
680x0 and 88000 Cross Tool Kits, enabling embedded systems developers to take
advantage of object-oriented techniques. Green Hills C++ supports Kernighan
and Ritchie C and complies with ANSI C standard.
Green Hills C++ provides object oriented programming features such as data
abstraction, strong type checking, and overloading of function names and
operators. New C++ features include classes with scope, and overloading new
and arrow operators. Green Hills C++ also includes compiler optimizing
techniques such as inlining, loop unrolling and register caching.
Green Hills C++ compiler is available from Oasys on the Sun-3. Oasys claims
that the compiler will be ported to other UNIX workstation and minicomputers
soon.
Oasys supports Designer C++, the C++ translator developed by Glockenspiel,
Ltd. Oasys will provide current customers with the ability to upgrade to the
Green Hills C++ compiler.
For more information contact Oasys at 230 Second Ave., Waltham, MA 02154 (617)
890-7889; FAX (617) 890-4644.


Library Brings UNIX Functions To Hercules Card Users


Certified Scientific Software has announced a subroutine package that allows
programmers using most PC-based UNIX systems to take full advantage of
Hercules-type monochrome graphics adapters. The package includes the standard
UNIX plot(3) subroutines plus many enhancements, such as patterned fills of
circles, rectangles and user-defined shapes; two fonts -- 8x8 pixel and 8x16
pixel -- for labels; clipping windows; five pixel write-modes, including
bit-set, bit-clear and exclusive-or; and routines to support double buffering
using the Hercules adapter's two graphics pages, making animation effects
possible.
The subroutines use only integer code, so they will run efficiently whether or
not floating-point hardware is installed. A 10-page manual and demonstration C
code is included.
The package is currently available for Interactive Systems 386/ix; AT&T Sytem
V/386; Microport System V/AT; XENIX 286 v2.2/2.3 and 386 v2.3; and VENIX
v2.3/2.4.
A single-user license is priced at $99, plus $2 shipping and handling. The
subroutines may be licensed for incorporation in programs for resale by
special arrangement.
For more information or a review copy, contact Certified Scientific Software,
P.O. Box 802168, Chicago, IL 60680 (312) 326-6098. Send e-mail to:
UUCP:{seismo,harpo,ihnp4, linus,allegra}!harvard!certif!herc
INTERNET:certif!herc@ harvard. harvard.edu


Screen Manager Professional Updated To Version 1.5B


Logical Alternatives, Inc. has released version 1.5B of the Screen Manager
Professional for C programmers. The S.M.P. is a tool box of over 150
pre-written functions for complex windowing, menu generation and interactive
context sensitive help features.
To maximize performance and minimize memory overhead, the windowing functions
are written in assembly language. The smallest possible program size using the
S.M.P. functions is approximately 7K. The menu system, on the other hand, is
written in C, providing flexibility and allowing the programmer to customize
the function.
Other features include: keyboard filtering for data entry systems, OS and
compiler independence, full video support, background processing,
reconfigurable memory allocation, and a 300-page ring bound manual. This
product also includes an event driven mouse support system, which makes S.M.P.
comparable to a text-based Microsoft Windows programming interface. Full
technical support is available including a new bulletin board for professional
programmers: The LAB (814) 234-1881.
The introductory price for S.M.P. v1.5B is $250, (with source code, $350).
Screen Manager Professional supports Microsoft C, Borland's Turbo C, Watcom C,
Lattice C, and Zortech C++.
For more information contact Donald McCandless, Marketing Director, Logical
Alternatives, Inc., Calder Square, P.O. Box 10674, State College, PA 16805
(814) 234-8088, BBS: (814) 234-1881, FAX: (814) 234-6864.


TE Version 3.0 Announced


Sub Systems, Inc. has released TE Developer's Kit v3.0. The new version
includes a TES small window editor routine. An application program can utilize
TES without programming changes to the routine. The application program passes
a set of parameters which specifies the window coordinates, maximum file size
and an input buffer or an input file. The output is either a buffer or a file.
The TES routine supports screen scrolling functions, word-wrapping, and block
commands. It requires 60K of memory and supports Microsoft and Borland C
compilers. The package includes the complete source code.
This version of TE Developer's Kit retains TE text editor source code and
library routines from the earlier version. The package lists for $125. For
more information contact Sub Systems, 159 Main St. #8C, Stoneham, MA 02180
(800) 447-6819 or (617) 438-8901.


Powerline Updates Source Utilities


Powerline Software, Inc. has released new versions of their programming
utilities Source Print v4.0 and Tree Diagrammer v3.0.
Powerline has added graphics drivers to support over 400 printers. These new
features include support for many printers (including laser printer), support
for C, Pascal, and dBASE from a variety of language development companies.
Both Source Print (a source code formatting utility) and Tree Diagrammer (an
"organizational chart" diagrammer) are software tools for all PC programmers
coding in C, C++, dBASE, Pascal, BASIC, FORTRAN, and Modula-2.
For more information contact Powerline Software Inc. at their new address: 826
Douglass Street, San Francisco, CA 94114 (415) 346-8325.


Emulator Mimes Xenix Console



Hansco Information Technologies, Inc. has released its new terminal emulator
system, HIT/Ansi.
HIT/Ansi is a memory-resident program for MS-DOS compatible computers that
emulate the Xenix color console. The program may be called up while running
any MS-DOS application with a hot key so that the computer functions as a
terminal to a host Xenix machine. When the hot key is pressed again, the
computer returns to MS-DOS and to whatever program was running.
Using less than 48K of RAM, HIT/Ansi supports color (CGA, EGA, and VGA) or
monochrome systems, 12 function keys and local printers in the foreground or
background through the parallel port.
A descriptive brochure and demonstration diskette for the product are
available upon request.
For more information contact Hansco Information Technologies, Inc., 185 West
Ave., Ste. 304, Ludlow, MA 01056 (800) 548-9754 or (413) 547-8991.


Saber And TI Join Efforts


Saber Software, Inc., developer of Saber-C has announced a joint software
development agreement with Texas Instruments, Inc.
Engineering teams from both companies are using Saber-C for cooperatively
developing new software technology that will be used in software products TI
and Saber plan to introduce in the future.
Texas Instruments will also use Saber-C widely for its own internal
development projects. Saber-C runs on UNIX, Sun Microsystems Sun-3, Sun-4, Sun
386i and SPARCstation workstations. Saber-C is also available for DEC's
VAXstation, and Ultrix.
For more information, contact Saber Software, Inc., 185 Alewife Brook Parkway,
Cambridge, MA 02138 (617) 876-7636; FAX (617) 547-9011.


Watcom Ships v7.0 For 386 Hosts


Watcom is now shipping the Watcom C v7.0/386 optimizing compiler and run-time
library for the Intel 80386 architecture. Already available for the 16-bit
MS-DOS environment with the 80X86 processors, Watcom C v7.0 is now available
for the 32-bit 80386 processor.
Watcom C v7.0/386 ports MS-DOS applications to 32-bit native mode, enabling
full 386 performance without 640K limitations.
Watcom C v7.0/386 generates code for 32-bit protect mode and can access large
data areas without source modification or special compiler options. Watcom C
v7.0 possesses 386-specific instructions, sophisticated addressing modes and
32-bit linear addresses. Porting to the 386 architecture involves recompiling
existing programs and linking with the 386 library to enable addressing of up
to 4 gigabytes of memory.
Applications compiled with Watcom C v7.0/386 operate with MS-DOS extenders
which enable use of 80386 protect mode. Both the 80386 software tools from
Phar Lap Software and OS/386 from A.I. Architects support use of Watcom C
v7.0/386 32-bit protect-mode with MS-DOS.
Watcom C v7.0/386 includes the compiler run-time library, a "compile and link"
utility, and Touch utilities, an object file disassembler, a patch utility,
and the Watcom C Preprocessor.
The list price for Watcom C v7.0 /386 is $895. For more information, contact
Watcom at 415 Phillip Street, Waterloo, Ontario, Canada, N2L 3x2 (519)
886-3700, FAX (519) 747-4971, or call the Watcom C order and inquiry line toll
free: (800) 265-4555.


Sterling Castle Offers Logic Gem In Single Language Versions


Sterling Castle is shipping a "single language edition" of Logic Gem v1.5, its
logic processor and code generator.
This edition includes one of BASIC, FORTRAN, Pascal, dBase and C, plus English
for documenting procedures, writing pseudocode, and building rule bases for
expert systems. The products are identical except that one programming
language choice appears in the language menu instead of five.
LogicGem includes an editor, interpreter and compiler and runs on PC, XT, AT,
PS/2 or compatibles. LG requires 640K of RAM, PC/MS-DOS 2.0 or greater and can
be used with a color or monochrome monitor. LG's "Programmer's Edition"
complete with documentation has a suggested retail of $99. The single language
edition, sold only directly from Sterling Castle, is $49.95 with complete
documentation and on 3.5" or 5.25" disks. The full purchase price of the
single language edition is applicable against a later purchase of the
multi-language programmer's edition.
There is a 90-day money-back guarantee, free technical support and 24 hour
bulletin board service. Upgrades to v1.5 are free to registered users.
Contact Sterling Castle, 702 Washington St., Ste. 174, Marina Del Rey, CA
90292. Inside CA (213) 306-3020 or (800) 323-6406; Outside CA (800) 722-7853;
FAX (213) 821-8122.


CI Adds Profiler To QNX


Computer Innovations has added a new utility which provides statistical
profiling of a program to the Computer Innovations C86 C Compiler for QNX. The
profiler points out parts of the program that use the most CPU time, done in
terms of source file constructs that the programmer can easily relate to: by
module, function, or line number.
The profiler is currently included with the C86 C Compiler package, and is
available for downloading (by registered C86 users) from the Computer
Innovations Bulletin Board Update System.
For more information contact Computer Innovations, Inc., 980 Shrewsbury Ave.,
Tinton Falls, NJ 07724 (201) 542-5920.


Spell Checker Works With C


Geller Software Laboratories, Inc. has introduced SpellCode, a spell checker.
SpellCode works with C, Pascal, BASIC, databases and Lotus spreadsheets as
well as dBase and all work-alike interpreters and compilers.
SpellCode includes a comprehensive English dictionary and a special dictionary
of common computer terms. The user can also create as many customer
dictionaries as needed.
It is available from Geller Software Laboratories, Inc., 35 Stephen St.,
Montclair, NJ 07042 for a special introductory price -- $49.95. For more
information call (201) 746-7402.


MetaWare Available On SystemV/386


MetaWare's High C compiler will be offered on the Santa Cruz Operation (SCO)
and AT&T UNIX System V/386 operating system.
The High C compiler features over a dozen different global optimizations,
including global allocation of values to registers, removal of invariant
expressions from loops, live/dead analysis, dead code elimination, and
constant and copy propagation.

MetaWare's High C compiler also features a code generator that makes use of
386/387 instruction sets including support of in-line transcendentals and
floating-point long doubles (80 bits). The code generator also features
in-line intrinsic function; in certain cases, the compiler replaces a call to
the C library with the actual in-line instructions, resulting in code that is
smaller and performs fewer operations.
The High C compiler provides ANSI compatibility, cross-language calling,
acccurate and helpful diagnostics, and maximum configurability. Developers can
select from a wide variety of compiler features through the use of toggles and
programs.
MetaWare supports the complete Intel 80x86 microprocessor family including the
8086, 80186, 80286, 80386, and 80486, and the Intel i860; Advanced Micro
Devices' Am29K; Sun Microsystem's Sun386i, Sun-3, and Sun-4 workstations;
Motorola's 680x0 family of processors; IBM's PS/2, RT, and 370; and DEC's VAX.
Operating system support includes UNIX 4.x BSD, UNIX System V.x, SunOS, IBM's
AIX, DEC's Ultrix, MS/PC-DOS, OS/2, DRI's FlexOS, AIA's OS/286 & 386, Phar
Lap's 386DOS-Extender, DEC's VMS, and others. Most platforms are supported
with native and cross compilers.
For more information contact MetaWare Incorporated, 2161 Delaware Avenue,
Santa Cruz, CA 95060-5706 (408) 429-6382; FAX (408) 429-9273.


FairCom Announces Update For c-tree File Handler


FairCom has announced c-tree File Handler/Server v4.3, which provides
functions to store, update and retrieve fixed or variable length data in
random or sequential order. c-tree comes with source code and employs portable
client/server architecture.
The new version has a high speed sorted key load routine enabling virtually
linear time index creation regardless of the number of index entries. Another
function returns the key value at an approximate given percentile of the
ordered key value list. The new version also estimates the number of entries
between two key values.
c-tree v4.3 has new make files and scripts for OS/2, Watcom, MPW v3.0 and
Commando tool support for all of MPW. There is server support for LightSpeed C
on the Mac and server/client support for Turbo C. Reuse of depleted nodes in
single-user and c-tree Server modes of operation is possible.
Version 4.3C lists at $395 (plus shipping and handling). To order contact
FairCom Corp, 4006 W. Broadway, Columbia, MO 65203, (800) 234-8180 FAX (314)
445-9698.


Coromandel Releases C-Trieve For MS-Windows Environment


Coromandel has announced the release of its C-Trieve-ISAM file manager for
MS-Windows. C-Trieve/Windows, now shipping, is based on the X/Open standard.
It also runs under MS-DOS, XENIX, UNIX and DESQview. C-Trieve can be used by
both C and C++ programmers.
C-Trieve/Windows is a library of routines that allows the programmer to build
custom data management applications. C-Trieve/Windows is based on a
Client-server model. A single server can support multiple clients and maintain
application integrity using locking and transactions.
C-Trieve/Windows is based on C-Trieve which is the native file manager of
Coromandel's RDBMS, C-SQL. The current offering includes dBase and Btrieve.
C-Trieve users can upgrade to C-SQL and continue to use their files; no need
exists to translate or modify the data for SQL access.
For more information contact Coromandel Industries, Inc., 108-27, 64th Road,
Forest Hills, NY 11375 (718) 997-0699; FAX (718) 997-0793.


Eigenware Tech Offers CSL Buyer's Guide


Eigenware Technologies now has available a 45 page buyer's guide for the C
Scientific Programming Library. This guide provides a description of the CSL
product and several other related products and services.
These other products include compilers, editors, technical monograph, and TeX
typesetting software used for CSL documentation.
Detailed ordering and international shipping information is also supplied in
the buyer's guide. The guide is available for $5 from Eigenware Technologies,
13090 La Vista Drive, Saratoga, CA 95070. For more information call (408)
867-1184.


QuickGeometry Receives Upgrade


Building Block Software has released QuickGeometry Library v1.01, a collection
of math subroutines for developing CAD/CAM, parametric design, NC programming,
post processing, finite element analysis or other similar programs.
The major enhancements are the addition of support for Turbo C, and internal
changes that simplify interfacing to graphics libraries.
The QuickGeometry Library provides CAD/CAM programmers with routines for
standard geometric operations required for CAD/CAM software development. In
addition, the QuickGeometry Library provides routines that read and write DXF
files, and that manage lists.
Selling for $199, the product includes source code, object code for MS-DOS,
extensive documentation, working example programs, one hour of telephone
support and a 30-day money-back guarantee.
For more information contact Building Block Software, PO Box 1373, Somerville,
MA 02144 (617) 628-5217.























We Have Mail
[Editor's Note: Yes, we omitted the listing from last month's letters column.
It appears as a separate article in this issue, Dealing With Memory Allocation
Problems. -- rlw]
Dear Sir,
It has been many years since I sent a letter to a periodical, some 16 or 17
years to be precise. I have some 22 years of programming background, ranging
from systems programming to applications and telecommunications. As the
original designer and author of SHADOW (IBM mainframe telecommunications
system), and co-designer of MANAGE-IMS I feel I can speak with some
experience.
I mention my background not to attempt to impress, but to add some weight to
my words about the latest fad in the C world. C++.
When C was first inflicted on us I welcomed it and disliked it, however, two
facts stand out. First, K&R are undoubtedly very bright people with much
insight. Secondly, ANSI cleaned up the loose ends and now C is a serious
commercial language. C is now one of the four that the IBM SAA endorses. I
have written in C since 1982, using MVS, UNIX and the micro versions.
Many years ago we in the mainframe world discovered the benefits of control
blocks, pointers and vector tables. In fact the control block structure of any
dynamic operating system is, no ifs, no ors, no buts about it, is an object
oriented programmed system. This "new" concept of O.O.P. (object oriented
programming) is what worries me.
First, it is not new. We have used object oriented systems for all the 22
years I have been in the industry. I have a fear that OOP will become OOPS. I
feel that as far as C goes, C++ is violating the cardinal rule "IF IT ISN'T
FIXED, DON'T BREAK IT"!
I have studied OOP systems, the new window systems are OOP, and on the whole
well done. They exist without the ?benefit? of C++. As is stands, C supports
objects very well. I have an example in the C language forum of Compuserve,
complete in and of itself for anyone who cares to study it. In short, C++ is a
farce. C++ I feel was implemented by some well intentioned people who have no
serious commercial programming expertise, and certainly no IBM mainframe
internals experience.
C++ is a random collection of items, a mixed bag of minor changes, and the OOP
extension. The minor additions attack the heart of structured programming (for
example allowing data to be defined anywhere code may exist). They had some
good ideas, existing for a quarter of a century in the mainframe world, such
as defaults. Yet the defaults are positional as opposed to pure keyword! When
keyword parameters are introduced into functions and macros then a whole new
world is opened up. C++ felt it was better to stick to methods flying in the
face of good mainframe experience and thus limit its abilities. The data
reference, the change to casting, the inline functions are questionable at
best, and ignore the potential increase in power of the processor and the
optimisation ability of future compilers. Programmers are made to get involved
with optimisation, not the machine.
Overloaded functions I admit are a benefit. They are the base of the C++
object implementation. I ask myself if that benefit isn't perhaps the only
benefit of C++. The object oriented side of C++ does nothing, except
inheritance, that any C compiler today can do. And if serious preprocessors
were defined with global symbols then inheritance can also be implemented.
What I am saying is that rather than C++, let us have a full preprocessor with
typical mainframe abilities, and skip the rest. C was designed to be bare
bones, enhanced (very successfully) with functions. The quantum jump should be
a preprocessor and proper macro and language preprocessor such as the IBM
assembler macro facility. The next quantum leap is not the poorly thought out
ideas of C++. In creating an object based system, much thought has to go into
the structure, and this is true whether C++ is inheritance and scope, easily
controlled in other ways if C is used, employing run time inheritance and
binding.
I am getting suspicious that perhaps AT&T felt it was losing control of its
brilliant child, "C" and needed to show that perhaps they were still in the
lead. I suspect that since OOP was becoming more the rage that they jumped on
the bandwagon. They used that to reestablish their leadership. The C++ authors
wanted to become the next generation of venerated programmers, to be the next
K & R. I am sorry, but as senator Benson put it, "they are no K&R".
OOP was not invented by AT&T, it is a long established method for handling
interrupt and interrupt driven systems. The resurgence of OOP came about with
among other things the need to handle the dynamic world of dynamic objects
such as in windowing systems and the like. OOP is a good discipline where
applicable. It has many uses in the distributed processing world of the
future.
I hope that the readers will take a closer look at C++ and study some OOP
systems implemented in C and realise that C++ is a farce, a joke being
perpetrated on the data processing world. I am all for positive change, this
isn't it. I am recommending to my company that C++ not be implemented. I note
that there will be no ANSI C++, they have seen the light.
I thank you for your patience,
Simon Wheaton-Smith
2902 N. Manor Dr. West
Phoenix, AZ 85014
You're welcome to my patience, but not to any support for your position.
I wonder if K&R had any IBM mainframe internals experience? If not, perhaps we
should make them rescind C? -- rlw
Dear CUJ,
Please allow me to introduce myself. My name is Chris Proctor. I'm an IBM
mid-range systems contractor. I felt compelled to write you a letter to tell
you why I would not be renewing my C Users Group subscription.
I am relatively new to C programming and I was hoping that your magazine would
provide me with helpful hints and programming tips that would help me become a
better C programmer. Unfortunately, in most issues I found nothing that was
beneficial to me. Please believe me when I tell you that I am not "knocking"
your magazine at all. I'm sure that if I was more knowledgeable in C, your
magazine would be very interesting. But, quite frankly I don't understand half
of the articles in each issue.
What I would like to see is an article or section of each issue dedicated to
the basics of C, or at least programming tips that the layman can understand.
I can't believe that I am the only one that has not renewed my subscription
because the articles are "over my head". Perhaps, something like I have
mentioned may even increase subscriptions just from people glancing through
the C Users Journal on the magazine rack.
I realize that you have to appeal to the masses and not the exceptions and if
that's the case, I'll probably subscribe to the magazine when I feel that it
would be of some use to me.
You have an excellent magazine. Keep up the good work.
Sincerely yours,
Chris Proctor
21352 Avenida Ambiente
El Toro, CA 92630
I too would like to see some quantum of good tutorial material in every issue,
in addition to the more demanding copy. Unfortunately, we don't get very many
well-written tutorial submissions. If my readership includes some willing but
uninspired authors, here's your chance. Send us a concise but thorough
tutorial on some aspect of C. We need more such submissions than we are
currently receiving. -- rlw
Dear Howard,
I was pleased that my article, "The C Programmer's Reference: A Bibliography
of Periodicals," appeared in print in your January, 1990 issue. However, I was
dismayed to learn that I had inadvertently omitted a couple of worthy entries.
These annotations, with the appropriate citations, are as follows:
C Gazette (quarterly, $6.50/issue, $21.00/year) C Gazette, 1341 Ocean Avenue
#257, Santa Monica, CA 90401.
A "code-intensive" quarterly which thrives on printing lots of C code (and
some C++). Specializes in MS-DOS and OS/2, but no UNIX. An in-depth
publication aimed at intermediate and advanced C programmers. Few
advertisements and few reviews. For programmers who are serious about their C
code.
Journal of C Language Translation ($235.00/year) Journal of C Language
Translation, 2051 Swan's Neck Way, Reston, VA 22091.
An academic quarterly which just recently commenced publication. Aimed at
compiler writers and programmers who must implement the ANSI standard in
language products. Covers extensions to the standard, such as implementation
of numerical representation, etc. No advertisements and few reviews. An
important resource for programmers in this narrow niche.
I had compiled the original bibliography some time ago, and from the holdings
of a corporate library. I assumed that the library's holdings were relatively
complete, and I overlooked the two periodicals above.
I hope that this letter will fill the gap. I regret it if anyone was offended,
and I trust that this information will further assist readers of The C Users
Journal in their language research.
Sincerely,
Harold C. Ogg
Chicago State University
The Paul and Emily Douglas Library
Ninety-Fifth Street at King Drive
Chicago, Illinois 60628-1598
(For those wondering, Howard is our editorial coordinator. I should let him
respond to this letter, but he's buried somewhere under some manuscripts and
pasteups.)
I appreciate the information. In addition to his column for CUJ, Rex Jaeschke
also writes a C column for DEC Professional -- not a "C magazine", but at
least another C resource.
If you regularly refer to a C-related information source we failed to include,
please write and we'll mention it here in a future issue. -- rlw
Dear Mr. Ward;
I'm glad the C Users Journal is starting to publish articles on the Macintosh,
its development environment, and its operating system. Keep 'em coming! Nice
article by Allan Brown [Bruton] in the October '89 edition. True, the
Macintosh toolbox does add some additional complexity, but once one becomes
accustomed to it -- and it may take quite a bit of time becoming fluent in
"toolboxese" -- one can be assured, though, that there is less likelihood of
code obsolescence and greater possibilities for code portability among the
various Macintosh hardware platforms and operating systems by following the
development guidelines and using the toolbox calls for performing window
manipulations.
Anyway, I tried executing the code presented on page 99 (Listing 1), and the
code as written does not draw a set of nested rectangles as promised at the
beginning of the article. When one executes the code specified in Listing 1,
nested triangles are drawn on the screen. To obtain nested rectangles the
variable yb will have to initialized to read
yb = 25;
rather than
yb = 300;
as printed in the article. That's the only change necessary for having the
Macintosh draw nested rectangles.
Thanks again for printing an article of interest to programmers who program
the Macintosh in C.

Yours truly,
Clifford J. Campo
123 Fennerton Road
Paoli, PA 19310
Gee, you mean rectangles have four sides? Maybe I should spend more time
watching Sesame Street with my son.
Thanks for the correction, and thanks for noticing our Macintosh coverage.
We've really worked hard to get those stories. -- rlw
Dear Robert:
I'd like to offer several comments to your "Publisher's Forum" in the August
1989 issue.
I like the new glossier paper; I think it makes the pages easier to turn
because there's less friction between them. Goodness knows, we readers don't
want too much friction. (Truly, I do like it better.)
I can't tell you what a relief it is to read that you're refusing to get
involved in C puns. At least in your articles. Your advertisers more than make
up for it. (Of course, it's not just CUJ advertisers...) Too bad X3J11 didn't
outlaw C puns as part of the ANSI standard.
Regarding swimsuits, etc.: I agree that would be out of place in CUJ. There's
plenty available elsewhere. However, your comment, "Wouldn't you rather
explore lex than sex?" leaves me concerned. Have you somehow arrived at the
assumption that real programmers are so obsessed with digital high tech that
they will forego sex? Of course not. How do you think we burn off all of that
Jolt and pizza? Not at a keyboard surely!
Speaking of sex and assumptions, and here I am finally being serious, there's
a big one or two in your comment, "We've even considered running pictures of
all the staff (especially the women since most of them are single and most of
you are male).", namely that all male CUJ readers are straight. I assure you,
it ain't so! About 10% of most any population is gay and lesbian, and while I
haven't seen any polls to confirm that this is true of programmers, I have no
reason to feel I should believe otherwise. So, if you were to do swimsuits, it
would only be fair to include your female and male staff. Fair to your
straight women readers too, don't forget them!
CUJ is great, please keep it up (speaking of standards and high ones at that)!
Sincerely
Bill Lee
5132 106A Street
Edmonton
Alberta, CA T6H 2W7
What can I say? -- rlw
To The C_Users Group,
Concerning Numerical Software Tools in C. It is a fine book for those starting
to program in C. Any book in your Advanced topic area, I as well as all
others, assume that Advanced means just that -- Advanced! An advanced book
would be like Numerical Recipes in C by Press et.al. from Cambridge University
Press. You truly need to re-analyze what is considered advanced considering
that more and more books actually treating advanced topics are coming out. In
the past, few knew anything about C. Since it is now the #1 language of
choice, advanced isn't the advanced of yesterday. The book which I'm sending
back should be considered elementary to intermediate. Even though it was
published in 1987, does not mean that it is advanced. Further, four routines
of the most elementary type, does not in my view constitute "Tools". Tools to
me are a compendium of primitives that one may use in developing one's own
applications. This book falls way short of that. Again further, the price is
outrageous for what one receives.
Jerry Rice, PHD.
504 Eastland St.
El Paso, Texas, 79907
In all truth, I haven't read this book. In fact there are more then a few
books among the 100 or so that we carry that I haven't read. Except when I
have personal knowledge of the book's contents, we rely upon publisher's
descriptions when categorizing the book. -- rlw






































User Interface Language Eases Prototyping


Vincent Guarna and James Krause


This article is not available in electronic form.






















































Using 'Screen Machine'


Rick Knoblaugh


Rick Knoblaugh is a Systems Engineer specializing in systems programming for
PCs. He is the coauthor of Screen Machine, a screen design/prototyping/code
generation utility. He may be reached at 15014 River Park Dr., Houston TX
77070.


Prototypes and code generators can significantly reduce development costs. In
this article I'll discuss a recent consulting project and show how the "Screen
Machine" -- a prototyping tool which I am making available to other
programmers as shareware -- assisted in prototyping, generating C code for the
user interface, and documenting the system.


The Application


My project was a student grade tracking application for a high school. The
software allows student names and grades to be scanned into a PC clone using
an optical mark reader, a scanning device which reads forms which have been
marked with a pencil. Student names and grades can also be manually entered or
edited.
The product enables teachers to maintain their grade books on a PC. Grade
tracking and printing tasks, such as letters to parents, are all handled in a
menu-driven environment. Thus, the application required menus, data entry
screens and help screens.
I began by planning the major components of the software, such as the scanner
communications and the decoding of the scanned data. Next I needed to develop
a user interface from which all program functions could be selected. For this
phase the user interface prototyping software was invaluable.


Benefits Of Prototyping


In the past programmers who have developed interactive programs, have
painstakingly designed the appearance of screen displays on paper and then
written the code for these user screens.
Today, many developers are using some type of screen prototyping software.
Most prototyping tools permit screen design using a powerful screen editor.
Screen editors make it much easier to manipulate blocks of data, to center
screen data, and to experiment with color and other aspects of screen
appearance.
In addition to a screen editor, prototyping packages usually include some
control facility that allows branches to various screens to depend on user
input. This allows the developer to create the "look and feel" of a user
interface before any code is written. Prototyping also lets the user become
more involved in the design of the user interface. More importantly, it allows
the programmer to be more creative and to develop an interface that makes
sense.
Some prototyping tools also provide code generation for the screen displays.
Once the screen design is finalized, the program automatically generates the
associated source code.


Screen Machine


Screen Machine runs under MS-DOS and consists of a screen editor/code
generator, a mini-language for prototyping the flow of application screens,
and a TSR screen capture program which allows any text mode screen to be
imported into the screen editor.
Screen Machine can generate source code for screens in your choice of C,
BASIC, Turbo Pascal, 8086 assembler, and dbase. Screen Machine is limited to
handling display portions of screens only; it does not handle data input. The
prototyping module permits the input of single keystrokes, allowing screens to
be displayed when the operator selects a menu option or presses a specific
key.


Designing Screens With SCREEN


I experimented with the appearance of the grade tracking application screens
using Screen Machine's screen editor and code generator, SCREEN.EXE. As with
most applications, I started with the main menu (Figure 1).
The SCREEN box drawing feature makes it easy to put borders around menus and
other screens. Text can be centered on a given line of the screen or within
the graphics character borders of a drawn box. You can even shift the entire
screen left or right to aid in centering screen data and attributes. Other
screen editor features include: inserting and deleting lines, copying and
moving blocks, selection of color, reverse video, undo of last editing
function, key stroke macros, and online help.
I saved my designed application screens in Screen Machine screen data files.
(Screens can be saved with or without attributes). If no color or reverse
video is needed, the screens can be saved as ASCII text files.


Prototyping The Interface


Once the data files for all application screens are complete, the programmer
develops an executable simulation of the application interface using the
Screen Machine's mini-prototyping language module, SHOW.COM. The completed
simulation will display the main menu, accept keystrokes, and based on these
keystrokes, select other application screens for similar processing.
The SHOW mini-language consists of display/keystroke input statements, case
statements, and goto and gosub statements. The heart of these is the
display/keystroke input statement, whose syntax is:
Filespec [basekey max] [/Tn]
[/An] [/Xn]
Filespec names the screen data file to be displayed. (e.g. I saved my main
menu screen data file in C:\GRADE\MAINMENU.SRN.) The basekey is optional and
represents the lowest-valued key accepted as input from the user when the
screen is displayed. The basekey is one of these:
A specific key, enclosed in quotation marks (e.g. "1").
A decimal scan code value (unquoted) (e.g. 59 for the <F1> key).
An unquoted asterisk (*), which is taken to mean "any key".
The max cannot be specified unless basekey is specified; it is the
highest-valued key accepted as input. If input from a given screen falls
neatly within a range of keystrokes (e.g. if on my main menu only "1" to "9"
were used, and not <Alt><H>), specifying basekey and max eliminates all
unwanted keystrokes.
The T switch specifies a time value in seconds -- useful for creating timed
"slide shows". SHOW will display the screen data file and then wait n seconds
(0-255) before displaying the next screen.

The A switch displays a screen data file in a certain attribute. This is
generally only used if you have not saved attributes in your screen data
files.
The X switch is the key on which a "getout" is performed. "n" is specified in
the same manner as basekey and max, i.e. either a quoted character or an
unquoted scan code. A "getout" is accepted as a valid key press and performs
any pending return or else returns to the operating system.
Case statements allow branches to other portions of a SHOW command file to
depend upon keystrokes input via the display/keystroke input statement. The
syntax for the case statement is:
case [key] [range] [S: G: R:]
[label name]
If a keystroke matches key or falls within range, control is transferred to
label name. If S: is present, the transfer is executed as a gosub, meaning the
address of the next display/keystroke input statement is put onto the "stack"
and control is transferred to the label. G: does a goto transfer to the label.
R: returns to the label (similar to BASIC).
The syntax for labels is the same as in MS-DOS batch files (i.e. a ":"
followed by a label name).
The grade tracking SHOW command file appears in Listing 1. The top of the
command file displays the main menu which is stored in the Screen Machine
screen data file, MAINMENU.SRN. The asterisk after the file name instructs the
SHOW program to wait after displaying the main menu until some key is pressed.
The /X indicates that if a 9 is pressed, the SHOW command file should
terminate and return to MS-DOS. The case statements perform gosubs to other
labels in the command file. For example, if the user presses a 6, SHOW will
gosub to the label otherprint where the print options menu is displayed and
processed.
The strange looking NUL screen data file name followed by the case * G:top is
necessary because the limited SHOW command set only allows unconditional
branching to be initiated in case statements. Case statements can only be
performed after a screen data file has been displayed by a display/keystroke
input statement. The reserved screen data file NUL only satisfies the case
statement by simulating a screen display and a key stroke entry.
The asterisk indicates that if any key is pressed, a goto should be performed
to the label top. After the appropriate gosub is processed from the main menu,
control transfers back to the top of the command file.


Generating The Source Code


A SCREEN program configuration option allows you to select the language to be
generated.
When SCREEN generates C code, it declares a structure _scrn and defines a
global array of structures of type _scrn (Listing 2). Notice that the array of
structures is named with screen_ followed by the name of the screen data file
to prevent naming conflicts.
After including these statements in your program, you can either write a
routine to display the arrays of structures, or include the routine supplied
with Screen Machine in your program as in Listing 3.
The routine uses the BIOS software interrupt 10h function 9 to display the
arrays of structures. Function 9 writes a character and attribute at the
current cursor position. The Microsoft C library function _settextposition is
used to position the cursor.
The function disp_screen is called passing the name of the array of structures
to be displayed and a flag indicating whether the screen should be cleared
prior to displaying the data. disp_screen clears the screen using the
background color defined in the variable color_back_grnd. This should be set
to the desired background color.


Data Entry


Because Screen Machine handles only display portions of screens, I used my own
general-purpose data entry routines for those portions of the application
where data entry was required.


Screen Capture


A third Screen Machine module CAPTURE.COM is a TSR program that allows text
mode screen displays to be captured and stored on disk. This utility makes it
easy to include application screens, complete with sample data in the users
manual.
CAPTURE takes over the shift print screen function (interrupt 5). When the
program is invoked and becomes resident, command line options specify the file
name under which screen data files are to be stored and whether or not
attributes should be included in the screen data files. If attributes are not
desired, screen data is stored in ASCII text files.
Captured screens can also be used as input into the screen editor/code
generator. This means that any text mode screen can be translated into source
code which will display that screen in one or all five supported programming
languages. This capability can be used when translating an application from
one language to another or if you want to generate source code for screens
created with a prototyping tool that doesn't support source code generation.


Conclusion


Screen Machine does several things satisfactorily. Its lack of support for
input fields may preclude your using it for some applications. Certainly, if
you need really detailed simulations of your programs such as sound effects
and emulation of disk I/O, you should use a more full-featured commercial
prototyping program. Also, if you require a graphics interface then Screen
Machine will not help you.
Figure 1
Grade Book Main Menu

1) Scan Grades
2) Edit/View Grades
3) Print Grade Book
4) Scan Names
5) Print Rosters
6) Other Print Functions
7) Set Teacher Information
8) Drop Lowest Grade
9) Exit
 For help, press < Alt > < H >.

Listing 1
/*SHOW command file for grade tracking program.
/* ----------------------------------------------

:top

mainmenu.srn * /x"9" /*display main menu, accept any key,
/*exit to dos if "9"

case "1" s:scangrades /*gosub to display the appropriate screens
case "2" s:editgrades
case "3" s:printgrades
case "4" s:scannames
case "5" s:printrosters
case "6" s:otherprint
case "7" s:setteacher
case "8" s:droplow
case 35 s:mainhelp /*<alt><h> help

/*When all gosubs return, branch back to top. You can only branch
/*as part of a case statement and you can only have a case statement
/*after display/keystroke input statement. Thus, the special NUL
/*screen name can be used to branch anytime.

nul /*special reserved display/keystroke input
statement
case * g:top /*branch back to top of command file

/*-----------------------------------------------
:scangrades
scangrad.srn /*display scan grades screen and wait for
a key
case * r: /*return

/*-----------------------------------------------
:editgrades
editgrad.srn * /x1 /*display edit/view grades screen and wait for
/*a key, return to caller if esc (scan
/* code 1) is pressed

case 35 s:edithelp /*if <alt><h> (scan code 35) is pressed, go
/*display the edit/view grade help

nul
case * g:editgrades /*go back to edit/view grade screen

/*-------------------------------------------------
:printgrades /*display print grades screen
prtgrade.srn *
case * r:

/*-----------------------------------------------
:scannames /*display scan names screen
scanname.srn *
case * r:

/*-----------------------------------------------
:printrosters /*display print rosters screen
prtrost.srn *
case * r:

/*----------------------------------------------
:otherprint /*display other print options menu
prtmenu.srn "1" "6" /x"6" /*accept only 1-6, return to caller if
6

case "1" s:report1 /*branch to report 1 screen
case "2" s:report2 /*branch to report 2 screen
case "3" s:report3 /*branch to report 3 screen
case "4" s:report4 /*branch to report 4 screen
case "5" s:report5 /*branch to report 5 screen

nul
case * g:otherprint

/*------------------------------------------------
:report1
reportl.srn *
case * r:

/*------------------------------------------------
:report2
report2.srn *
case * r:

/*------------------------------------------------
:report 3
report3.srn*
case * r:

/*------------------------------------------------
:report4
report4.srn *
case * r:

/*------------------------------------------------
:report5
report5.srn *
case * r:

/*------------------------------------------------
:setteacher
setteach.srn * /*display set teacher information screen
case * r:

/*------------------------------------------------
:droplow
droplow.srn * /*display drop lowest grade screen
case * r:

/*------------------------------------------------
:edithelp
edithelp.srn * /*display edit/view help screen and return to
case * r: /*caller when any key is pressed

/*------------------------------------------------
:mainhelp
mmhelp.srn * /*display main menu help screen and return to
case * r: /*caller when any key is pressed


Listing 2
struct _scrn {
char *chrs; /*pointer to screen text*/
char cw; /*column where text appears*/

char rw; /*row where text appears*/
char att; /*attribute in which text appears*/
};

struct _scrn screen_mainmenu[]={

Click Here for Figure


Listing 3
/*various include files*/
#include <stdio.h>
#include <graph.h>
#include <bios.h>
#include <dos.h>

#define FALSE 0
#define TRUE 1
#define VIDEO 0x10 /*software interrupt 0x10 */
#define WRITE_ATTR_CHAR 9 /*function 9 */

void disp_screen(struct _scrn *, unsigned short );

struct _scrn {
char *chrs;
char cw;
char rw;
char att;
};

struct _scrn screen_mainmenu[]={

{"Š‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰œ",21,6,31},
{" ",21,7,31},
{" Grade Book Main Menu ",21,8,31},
{" ",21,9,31},
{" 1) Scan Grades ",21,10,31},
{" 2) Edit/View Grades ",21,11,31},
{" 3) Print Grade Book ",21,12,31},
{" 4) Scan Names ",21,13,31},
{" 5) Print Rosters ",21,14,31},
{" 6) Other Print Functions ",21,15,31},
{" 7) Set Teacher Information ",21,16,31},
{" 8) Drop Lowest Grade ",21,17,31},
{" 9) Exit ",21,18,31},
{" ",21,19,31},
{" For help, press <Alt><H>. ",21,20,31},
{"…‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰x",21,21,31},
{"\0",0,0,0}
}";
main()
{

disp_screen(screen_mainmenu, TRUE); /*clear screen and then display the
screen defined by screen_mainmenu*/
}
long color_back_grnd= 1; /*all screens will use a blue
background*/
/*-----------------------------------------------------------

 disp_screen - Use ptr passed to array of structures 
 containing &text; col; row; and attribute. 
 Use BIOS int 10h function 9 to display the 
 data. 
 
 If cls_flag is TRUE, clear the screen before
 displaying the data. When clearing the 
 screen, use the attribute defined in the 
 variable color_back_grnd 
------------------------------------------------------------*/

void disp_screen(p, cls_flag)
struct _scrn *p;
unsigned short cls_flag;
{
char wcol;
char * wsptr;

union REGS inregs, outregs;


if (cls_flag)
{
_setbkcolor(color_back_grnd);
_clearscreen(_GCLEARSCREEN);
}

inregs.h.ah = WRITE_ATTR_CHAR; /*print char and attribute*/
inregs.x.cx = 1; /*print 1 char*/
while ( *(p->chrs) )
{
wsptr=p->chrs; /*get ptr to string*/
wcol=p->cw;

inregs.h.bh = 0; /*video page 0*/
inregs.h.bl = p->att; /*attribute to use */

while (inregs.h.al = *wsptr++) /*char to print*/
{

/*position the cursor*/
_settextposition( (short) p->rw, (short) wcol++);

int86 ( VIDEO, &inregs, &outregs ); /*print with BIOS*/
}

p++;
}
}










































































Prototyping Experiences


Brett Martensen


Brett Martensen is a Senior Systems Consultant with SRI Strategic Resources,
Inc. He specializes in tools and techniques, including CASE, prototyping and
JAD, to develop database applications. Areas such as entity relationship data
modeling are his forte. He has a M.Sc. in Computer Science (1976) from Queen's
University (Kingston, Ontario).


When developing a prototype, one is faced with reaching a maximum level of
functionality across the maximum scope of the application, but within a
minimum time frame. Two productivity tools help reach these conflicting goals:
the CASE (Computer Aided Software Engineering) tool, which is the
specification engine, and a DBMS (DataBase Management System), which is the
application engine.
I recently participated with a team to develop a prototype system for Canada
Post Corporation. This prototype had to be robust enough to be used across the
country, at a number of different user sites during a three-month trial. Thus,
it had to be more functionally complete than would normally be expected of a
prototype.


Background


A prototype is a miniature system which approximates the final system but
provides only a subset of the application's scope and functionality. As such,
a prototype comes with all the benefits associated with modeling. A model is
easier and certainly less expensive to change than a real system.
Prototyping permits developers to elicit, model and then capture user
requirements for a system. Like the buildings on a movie set, a prototype must
look real even though it is only a facade. On a movie set, certain buildings
have rooms, some of which are furnished. Similarly some features in a
prototype are fully implemented, while others remain as images only. There are
four levels of functionality used when describing a prototype:
Level Functionality
one Screens only
two Screens with field entry and
 edit, some controllable flow.
three Level two plus Create, Retrieve,
 Update and Delete of data
 and Menu linking screens
 together.
four Level three plus Integrity
 checking, a correctly
 structured database and some
 application specific algorithms
 working.
Most prototypes end up as a mixture of these levels applied to different parts
of the application.
The prototype developer must be able to respecify the data model and quickly
regenerate the database structure because a prototype is not a static model;
it goes through a number of iterations.
A typical prototype development iteration consists of analyzing the
requirements (read documentation, conduct interviews); specifying (data model,
functional specifications); designing (display layout, reports); developing
(program, fill in tables of data); demonstrating and using; and finally,
reviewing the design with users using Joint Application Design (JAD)
techniques.
The JAD is a meeting in which a consensus can be reached amongst the user
population as to the system requirements. The results of the JAD provide the
requirements for the start of the next iteration.
Theoretically, this cycle should be repeated three times during a prototyping
project. Ideally, each iteration of the prototype progresses closer to the
specification for the final system. A rule of thumb is that sixty percent of
the remaining requirements are captured in each cycle. After the second cycle,
the system should be 84 percent correct and after the third cycle, 93+ percent
correct (rather like golf where each shot gets you closer to the hole). At
some point, however, diminishing returns make further iterations pointless.


The Canada Post Prototype


The Canada Post Corporation prototype was developed using E-R Designer from
Chen and Associates, a CASE tool for data modeling to specify the data model;
and ZIM from Sterling Software, Zanthe Systems Division, a powerful 4GL DBMS
to develop the prototype.
ZIM is a natural choice for prototyping projects. It directly implements an
Entity-Relationship data model which it keeps in an Active and Integrated Data
Dictionary. An entity-relationship database structure allows transferring the
conceptual data model produced in the specification stage directly into the
database data dictionary. An Active data dictionary is important because the
programs can access all the metadata. An Integrated data dictionary stores
display specification information as well as the database structure. An
advantage of choosing E-R Designer is that the data model can be exported and
used to create the ZIM database description without rekeyboarding.
ZIM also runs code either in interpreted or compiled mode. Since prototype
performance is not a consideration, the interpreted mode is used. This mode
has a macro substitution capability which allows names of entities,
relationships, displays, and ZIM routines to be substituted in the programs
and resolved at run time. The data dictionary stores all these names, as well
as segments of ZIM code for macro substitution. Thus, the data dictionary
becomes the repository of the prototype specifications, data, displays and
programs.
Other tools such as WordPerfect and internally developed ZIM utilities
increased productivity further. We used WordPerfect's line draw facility to
draw boxes and rapidly design the displays. Then below each display, the
position of all the fields and their prompts were specified. A short ZIM
program analyzed this information and created the display form specifications
in the ZIM data dictionary.
One of ZIM's most useful prototyping features is the way in which display
forms relate to entities in the database. If a field on a display form is
given the same name as a field in an entity, then the ZIM command CHANGE Form
FROM EntitySet fills in the fields of the current form from the fields of the
same name in the current record of the entity set. Similarly, ADD EntitySet
FROM Form creates a new record in the entity set from the data entered on the
form.
This functional relationship between display form fields and entity set fields
is re-inforced by maintaining the metadata in the data dictionary for each
entity set field. (See Table 1.)
The other ZIM database information normally needed length, decimals, indexed
and required are also stored in the data dictionary. Other useful attributes,
such as default value, data mask and validation rules, exist for the fields in
the display form.
We developed a number of generic, table-driven modules for the prototype's
functional side. The concept of table-driven software is extremely powerful:
by simply modifying the data in the tables, the developer can rapidly change
the way in which a module functions, what it operates on and how it appears to
the user. For example, we developed a menu program which was totally table
driven. By changing the tables, the routines executed when the menu items were
chosen, the menu structure and menu functions were modifiable.
With the table-driven approach, code can be reused. For example, only one
modify routine can be applied to any entity set, and an enhancement made to a
routine is universally available in the prototype system. Thus, the only
nonreusable code is the application-specific algorithms, which are all linked
in via the program name attribute attached to each field.
For the Canada Post Corporation prototype, generic table-driven routines were
developed for
Menu
Level 1 and 2 screens (slide show)
Entityset Lister, which provided
access to the functions of:
Sort list
Pick record

Print list
Find record
Add record
Modify record
Delete record
Help
This list doesn't include "Reporting" since programming a general-purpose
reporting module is difficult to perform using the table-driven approach. The
Print list routine provided simple reports. More complex reports which needed
to were hand-coded in ZIM. Reports that did not need to function were
presented as Level one screens.
There exist both a spreadsheet and business graphics package which take data
from a ZIM database and allow for its manipulation, analysis and presentation.
Given that the Canada Post Corporation application involved large quantities
of statistical information, both these packages were linked into the prototype
to assist the user in performing ad hoc inquiries and analysis of the data.
These two packages were especially useful for the design and creation of new
and existing reports to elicit user feedback. They added substantial
functionality to the prototype with very little effort.
The modules of the prototype were linked together in the calling sequence as
shown in Figure 1.
This prototype environment can be easily extended as new generic routines are
developed. A table-driven ad hoc inquiry function could be built and linked in
via the menu.
Since the purpose of a prototype is to develop a working model with frequent
feedback from the user population, it is appropriate to add a feature which
captures users' ideas while they are fresh in their minds. Since "help" was
available throughout the prototype, a suggestion box feature was added to the
"help" module which allowed for on-line and in-context idea capturing. These
ideas were collected, printed and analyzed during the JAD sessions. User
feedback was very general on the first iteration: "We also need to be able to
store information about our services," for example. In subsequent iterations,
suggestions became more specific: "Use the word 'item' rather than 'product'."
The final goods delivered from a prototyping project consist of the working
prototype and a large quantity of documentation. The documentation covers the
working prototype and user requirements that were not implemented in the
prototype, such as complicated application-specific algorithms or feedback
from the final JAD.


Conclusion


The combined use of CASE tools and 4GLs allows for greater productivity in
prototype development. Using generic table-driven modules results in less
software development. As a result, the workload shifts to the specification
and analysis tasks. As in so many situations where technology is applied to
the automation of the simpler tasks, fewer people are required, but the ones
remaining need more expertise. The skills of business and functional analysis,
data modeling, and design, become more important than programming.
Reference
Application Prototyping: A Requirements Definition Strategy for the 80's,
Bernard H. Boar, Wiley-Interscience Publication, John Wiley & Sons, New York,
1984.
Figure 1

Name The unique name given to
 this data element.
Type Whether alpha-numeric,
 integer, character, etc.
Prompt The full username to appear
 on displays.
Column Header An abbreviated name to
 appear at the top of
 any list.
Program Name The name of the ZIM
 routine to be executed
 whenever the value of
 the field is changed.
Helptext A user-readable explanation
 of the data element,
 its possible values and
 purpose, if necessary.
Figure 2























The UI2 Code Generator


Paul Combellick


Paul Combellick has a BS in Petroleum Engineering from the University of
Alaska, Fairbanks. He is a contract programmer specializing in dBASE and C
local area network (LAN) database applications and can be reached at (602)
280-2569 or via Compuserve at 70671,3054.


As a Network DBMS applications developer, I recently undertook my first major
C project. After having scoffed at UI2 for several months, I decided to give
it a try as a fast prototyping tool. The resulting productivity gains exceeded
my most optimistic expectations. I was able to produce about fifty percent of
the 20,000 lines of C code in this application using UI2.
I used UI2 with Vermont Views Screen library and Btrieve Record Manager to
build a Network DBMS application. As I was new to all three tools, I spent a
third of project time learning the new tools, with the remaining weeks
actually producing generated and hand-written code.
By the time I completed this single DBMS application, I had produced a set of
templates and template libraries, using UI2 terminology, that would allow me
to produce the next Network DBMS application in a few weeks, rather than in
months if the code were entirely written by hand.


Description Of UI2


UI Programmer Version Two, The Developer's Release, by Wallsoft is a
programmable code generator targeted toward dBASE programmers, but is flexible
enough to be used for many languages in the MS/PC-DOS environment including C.
UI2 contains four major components.
Screen Editor. The user can interactively draw screens and define screen
entities including menus, background text, boxes, variables, fields.
Templates. Code generation language files define how the code for a particular
screen will be created.
Template libraries. A library contains groups of functions written in UI2's
generation language that are called by the templates during the generation of
the target source code file.
Code Generator. An interpreter that executes the template language to generate
the target source code for a particular screen.
UI2 is shipped with a set of templates and template function libraries for
dBASE programmers. The C programmer will have to create his own templates
before any non-trivial C code, other than simple menus, can be generated.


Case Study


For this application the client specified C for portability to OS/2 and UNIX.
The target system is a Novell Network which supports the Btrieve database
server. Btrieve also has versions for OS/2 and Intel-based PCs running UNIX. I
chose Vermont Views screen library for its portability to UNIX and OS/2. This
networked DBMS application contains several types of screen input/output:
menu-only screens, data entry-only screens, combined menu and data entry
screens, and reports.
I created three templates, one for each of the first three screen types. These
templates are actually source code files, written in the code generation
language, that are executed by the code generator's interpreter. The templates
describe how the code generator should handle a particular screen object to
create the target source code language.
I painted the screens by drawing boxes, menus and their actions, background
text, and defining data entry fields. Through the screen editor the user may
specify menu attributes such as hot keys and the name of a function to call
when the menu is selected. The editor also allows specification of field
attributes such as type, width, picture and provision for such user-designed
features as begin-field and end-field trigger and validation functions. The
programmer designs the template language functions to take advantage of the
entities and their attributes defined in the screen editor.
UI2 has an interactive mode as well as a command line mode that allows UI2's
code generator to be accessed by make. A make response file can include the
dependencies for generating target source code files from screen definitions,
as well as compiling and linking the entire system. I can now modify a
template file and make will call UI2 to regenerate all the affected source
code modules. To modify a screen definition file -- either by adding text or a
data element, or changing one of the many field, form, or menu attributes -- I
don't edit the C source file; instead, I modify the screen definition file
using the UI2 screen editor. I design screens so that I never modify the UI2
generated C code files except through the UI2 screen editor.
UI2 is not limited to any particular coding style or third-party library and
is adaptable to many different compilers. UI2 was used on this project to
generate 40 screens and about 10,000 lines of C code. The code generation
language syntax is very dBASE-like and the learning period was brief.


UI2 Strengths


This type of code generator performs very well on repetitive tasks such as
building screens. I was able to build all the screens -- both menu and data
entry -- entirely with UI2. After learning Vermont Views, Btrieve, UI2 and
building templates, I will probably be able to reproduce 40 new screens for a
new application in a few days. More importantly, the generated C code will be
free of syntax errors and errant pointers. This bug-free code is at least as
important as the productivity in creating the code. Once the templates are
debugged, future screens will be virtually free of syntax errors. On the very
first project, I used UI2 to boost productivity significantly, despite a
learning period to become familiar with a new tool.


Limitations


In light of the fact that UI2 was designed with the dBASE programmer in mind,
it lacks a couple of features for the C programmer. The most obvious feature
missing is a full-featured dictionary that supports C data types, including
structures and scoping concepts, and general data file schema beyond the dBASE
file support. However, I was able to work around most of the data dictionary
limitations by creating a hidden box in each screen. The box, made up mostly
of #includes and external declarations, contained code for the generator to
insert literally into the generated C source code.


Conclusion


I am quite satisfied with UI2. I have created templates for my non-programmer
partner to fast prototype systems for prospective clients in order to
illustrate what a proposed system may look like. I believe that UI2 will boost
my coding and debugging productivity by factors of five to ten in the area of
screen generation and maintenance. On future projects I expect to realize
tremendous productivity gains now that I am familiar with this tool and have
created a set of templates and template libraries to create code that utilizes
Vermont Views Screen Library.

Listing 1 The fragment of the template in Listing 1 expands to produce the C
code in Listing 2.
/*********************** define the form ***************************/
<<menuform = 'form' ** set UI var to 'form' for vvdispc.tlb >>

/* define a form */

{menuname}_dfmp = fm_def( {formbox.row}, {formbox.col},
{formbox.height}, {formbox.width}, LNORMAL, BDR_NULLP );

/* define boxes around form items ****/
<<define_all_form_boxes()>>

/*********** define background text */
<<display_text()>>

sfm_help( "*DATA HELP" , {menuname}_dfmp ); /* define form help keyword */

<<define_form_options()>>

/******* define form data fields *********/
<<define_all_form_fields()>>


Listing 2
/*********************** define the form ***************************/

/* define a form */
CUG_dfmp = fm_def( 0, 0, 21,80, LNORMAL, BDR_NULLP );

/* define boxes around form items ****/
bg_boxdef( 0,0,21,80,LNORMAL,BDR_SPACEP,CUG_dfmp);
bg_boxdef( 5,14,11,52,LNORMAL,BDR_DLNP,CUG_dfmp);

/*********** define background text */
bg_txtdef( 1, 28, "C USER'S GROUP UI2 DEMO", LNORMAL, CUG_dfmp);
bg_txtdef( 2, 28, " ", LNORMAL, CUG_dfmp);
bg_boxdef( 5,14,11,52,LNORMAL,BDR_DLNP, CUG_dfmp);
bg_txtdef( 7, 19, "Name : [ ]", LNORMAL,
CUG_dfmp);
bg_txtdef( 8, 19, "Address : [ ]", LNORMAL,
CUG_dfmp);
bg_txtdef( 9, 19, "City : [ ]", LNORMAL,
CUG_dfmp);
bg_txtdef( 10, 19, "State : [ ] Zip : [ - ]", LNORMAL,
CUG_dfmp);
bg_txtdef( 12, 19, "Phone : [ ]", LNORMAL, CUG_dfmp);
bg_txtdef( 13, 19, "Fax : [ ]", LNORMAL, CUG_dfmp);

sfm_help( "*DATA HELP" , CUG_dfmp ); /* define form help keyword */


/******* define form data fields *********/
CUG_fld1 = fld_def( 7,33, NULLP , FADJACENT , "!!!!!!!!!!!!!!!!!!!!!!!!!",
F_STRING , (PTR) name, CUG_dfmp );
CUG_fld2 = fld_def( 8,33, NULLP , FADJACENT , "XXXXXXXXXXXXXXXXXXXXXXXXX" ,
F_STRING , (PTR) address, CUG_dfmp );
CUG_fld3 = fld_def( 9,33, NULLP , FADJACENT , "XXXXXXXXXXXXXXXXXXXXXXXXX",
F_STRING , (PTR) city, CUG_dfmp );
CUG_fld4 = fld_def( 10,33, NULLP , FADJACENT , "!!", F_STRING ,
(PTR) state, CUG_dfmp );
CUG_fld5 = fld_def( 10,48, NULLP , FADJACENT , "UUUUU-UUUU", F_STRING ,
(PTR) zip, CUG_dfmp );
CUG_fld6 = fld_def( 12,33, NULLP , FADJACENT , "(UUU)UUU-UUUU", F_STRING ,
(PTR) phone, CUG_dfmp );
CUG_fld7 = fld_def( 13,33, NULLP , FADJACENT , "(UUU)UUU-UUUU", F_STRING ,

(PTR) fax, CUG_dfmp );






























































MEL: A Metalanguage Processor


George Crews


George M. Crews received his bachelors in General Engineering from the
University of Nevada at Las Vegas, and his masters in Engineering Science from
the University of Tennessee at Knoxville. He is a "generalist" with over 15
years experience in mechanical and software engineering design and analysis.
He may be contacted at 109 Ashland Lane, Oak Ridge, TN 37830 (615) 481-0414.


As a mechanical engineer, my experience with analysis programs falls in the
areas of structural stress, fluid dynamics, heat conduction, and
thermal/hydraulic system simulation. Such programs present the technical
software developer with a number of unique problems, not least of which is
providing a user-friendly interface.
Though program users tend to be computer literate, input data can often be
voluminous and tedious to prepare; the typical user may make many runs with
only slight modifications as design optimization is often accomplished by
repeated analysis. Both input and output must be stored and presented in a
manner that allows independent verification and validation. Finally, the
information output from one program may be required as input by another.
Another big headache is that modern (i.e., graphical) user interfaces tend to
be hardware or system-software specific. A good universal interface would free
the developer from the nuances of different machines and operating systems,
while at the same time representing a standard that machine-specific routines
can work with.
MEL is my solution for making such technical programs more user-friendly and
modularized. MEL (for MEtaLanguage data processor) is a set of input/output
utilities that provides a standard interface between the program and the user.
It can translate input data written in "pseudo-English" (Example 1) making the
data available to the program as variables (Example 2). It can also translate
program variables (Example 3) into pseudo-English (Example 4). Effort was made
to provide data objects that could be easily incorporated into almost any
engineering analysis program (Example 5).
The pseudo-English look of MEL means that I/O will be more readable and
comprehensible to the user (or checker).
Secondly, MEL is object oriented in that it provides a structured and
encapsulated I/O interface. Thus, development time will be reduced and future
changes can be made to the program more easily.
Thirdly, MEL's grammar is simple and unambiguous, with both input and output
formats identical so that output from one program may serve directly as input
to another. Finally, MEL can read and write data directly to a file so that a
permanent record of a run and its results are available.


Description


In MEL the smallest unit of pseudo-English I/O is called a "descriptor." Its
purpose is to describe something, either data or a command, to a program. The
general format for descriptors is much like function calls in a typical
programming language. An I/O unit consists of a descriptor name (somewhat like
a function name), followed by a parameter list, followed by an end-of-unit
symbol (the semi-colon). For example, consider the following MEL descriptor,
which could be used as part of the input to a piping network analysis program:
pipe, length = 100 (ft), diameter = 6 (in);
This is a pipe descriptor whose parameters are length and diameter. The values
assigned to these parameters would be 100 and 6, and in units of feet and
inches, respectively. Although the tokens (names and parameters) making up
descriptors are customized by the developer for each individual application
program, the above grammar remains the same for all programs using MEL. (See
Example 1 and Example 4.)
MEL's format was chosen for its simplicity, while allowing for as much
flexibility as possible without introducing ambiguity. In MEL, tokens may be
abbreviated as long as they remain uniquely identifiable. MEL assumes a
default parameter order if parameter names are missing. Comments may be
included by enclosing them in double quotes; parameter values may be labeled
as "unknown," etc. These format choices are designed to make programs
incorporating MEL as convenient to the user as possible.


Incorporating MEL


In order to incorporate MEL into one of your own programs, you must customize
the mel.h header file to be included in your application source code file.
First create a "dictionary" for both input and output that defines the proper
spelling, number, and types (integer, array, etc.) of data associated with
each descriptor and parameter. (Note that by simply changing spellings in the
dictionary you could go from pseudo-English to "pseudo-French" or some other
"pseudo-language.") The task of defining dictionaries has been made as
painless as possible by providing complete instructions and an example program
on the MEL diskette available through the CUG library. (The diskette contains
MEL source code, header file, documentation and instructions, an example
program, and a conversion factor routine. Since a listing of all MEL routines
would run over 50 pages, a complete listing has not been included with this
article.) You will need to prepare documentation for the user, defining the
dictionaries and explaining what the tokens mean.
To obtain data from a descriptor, you must first read it and then extract the
data (see Example 2). An example of outputing data is shown in Example 3.
Allowing the user to input data with different units requires conversion to
internal units (ASTM, 1982). Included on the MEL diskette is a routine that
can convert more than 150 different units. Additional units and conversion
factors can easily be added to the source code.


How MEL Was Developed


An early decision was to write MEL in C. Fortran is the traditional language
for scientific programs; however, engineers like myself are beginning to
realize that there is more to technical software development than simply
correctly coding a complex algorithm. ANSI C has a number of significant
non-numerical advantages over Fortran (Kempf, 1987). C allows for more
flexible structured programing and data encapsulation techniques to be applied
(also see Jeffery, 1989). C has more operators and program control constructs
than Fortran. C allows indirection (pointers) where Fortran does not. C more
easily interfaces to existing system software since much of this software is
itself written in C. Also, C is a popular language for unconventional computer
architectures such as parallel processors (Lusk, 1987) and neural networks.
Let me also mention some of C's shortcomings, which are related to its
relative naivete for scientific purposes. Dynamic array dimensioning in C is
convoluted (Press, 1988). C does not have the numerical library that Fortran
does. And finally, C does not allow operator overloading for data structures
(complex numbers for example) nor does it have an exponentiation operator.
However, I do not think these deficiencies are difficult to overcome.
Partly as an experiment to form my own opinion about OOP, the design of MEL
incorporates the object-oriented paradigm. I chose to make use of C's
preprocessor to restrict the visibility of public type, function, and data
declarations to just those objects that the application program may need at a
certain place (see Example 5). (The private type, function, and variable data
needed by the MEL routines themselves are not shown in the example and are
hidden from your program by other defined/undefined manifest constants.) For
another approach refer to the article by Jeffery.


Summary And Future Enhancement


Software engineering is rapidly evolving and everyone seems to have his or her
own ideas about what makes a good user-interface. I believe MEL is a practical
answer to the spectrum of interface problems confronting the developer and
user of complex technical programs.
Some may criticize MEL for its verbosity (as compared to Fortran's fixed field
format), the time a user must spend learning to use MEL (versus a more
interactive interface), and the somewhat clumsy way objects must be (or at
least, were) encoded in C. These points are legitimate and are inherent in
MEL's design. No design can be all things to all people.
The next steps in MEL's evolution might be incorporating it into a language
sensitive editor, a graphical output post-processor, and perhaps later, into
an expert system shell specialized for the type of analysis being performed.


Bibliography


George M. Crews, "HAPN--A Hydraulic Analysis of Piping Networks Program,"
Masters Thesis in Engineering Science, University of Tennessee, Knoxville,
1989. A portion of this thesis describes MEL and how it was developed and used
for a specific analysis program.
David Jeffery, "Object-Oriented Programming in ANSI C," Computer Language
Magazine, February, 1989. This article discusses the object-oriented paradigm
and a way to implement it in C.
James Kempf, Numerical Software Tools in C, Prentice-Hall, Inc., 1987. This
book contains an introduction to both numerical programming and C. The
emphasis of the text is on creating small routines that can be used as
building blocks for larger programs. Possible shortcomings are its lack of
data hiding and that it treats doubly dimensioned arrays statically rather
than dynamically.
Ewing Lusk, Overbeek, et al., Portable Programs for Parallel Processors, Holt,
Reinhart and Winston, Inc., 1987. This book describes a set of C tools for use
on a broad range of parallel machines.

William H. Press, Flannery, et al., Numerical Recipes in C, Cambridge
University Press, 1988. Based on an earlier Fortran edition, this is a great
cookbook giving a wide range of oven-tested recipes for the numerical gourmet.
It shows the correct way to handle multidimensioned arrays (dynamically). A
complaint sometimes heard is that a few of the algorithms are getting obsolete
due to rapid advances in numerical techniques being made.
ASTM E 380-82 Standard for Metric Practice, American Society for Testing
Materials, 1982. This standard contains many useful conversion factors between
English and metric units.

Listing 1
Example 1. An Example of MEL Input for a Hydraulic Analysis Program.
(Note that tokens will be unique to each application.)


title, 'Example Problem Illustrating MEL';
fluid, "water"
density = 62.4 (lbm/ft3),
viscosity = 1 (cp);
node, 1, pressure = 8.67 (psi); "20 ft of water"
branch, 100, from_node = 1, to_node = 2;
pipe,
length = 100 (ft),
id = 6 (in),
material = steel;
end_of_branch;
node, 2, pressure = 6.5 (psi); "15 ft of water"
next;


Listing 2
Example 2. Example of Obtaining Data From a MEL Descriptor:


Descriptor:

pipe, length = 100 (ft), diameter = 6 (in);

Code fragment:

double pipe_length, diameter;

union meli_param_data data; /* see Example 5. */
char units[MAX_STRING_LEN+1];
int array_len;
int unknown_flag;

meli(); /* reads descriptor */

meli_data("length", &data, units, &array_len,
&unknown_flag); /* gets pipe length */
pipe_length = data.real; /* will equal 100 */

meli_data("diameter", &data, units, &array_len,
&unknown_flag); /* gets pipe diameter */
diameter = data.real; /* will equal 6 */

/* note that units, array_len, and unknown_flag
are not considered (used). */


Listing 3
Example 3. Example of Outputting a MEL descriptor:

Code Fragment:


double pipe_length = 100, diameter = 6;

union melo_param_data data; /* see Example 5. */
char length_units[] = "ft";
char diameter_units[] = "in";
int array_len = 0;
int unknown_flag = 0;

melo_init("pipe"); /* initialize */

/* get data ready to output: */
data.real = pipe_length;
melo_data("length", &data, length_units, array_len,
unknown_flag);
data.real = diameter;
melo_data("diameter", &data, diameter_units,
array_len, unknown_flag);

melo(); /* translates data into string */

Descriptor:

pipe,
length = 100 (ft),
diameter = 6 (in);


Listing 4
Example 4. An Example of Output Generated by a Hydraulic Analysis
Program using MEL. (From the input data given in Example 1.)

program,
name = 'HAPN - Hydraulic Analysis of Piping Networks',
problem_title = 'Example Problem Illustrating MEL';
message,
text = 'Date: Thu Jul 13 09:02:11 1989';
message,
text = 'Input filename: input';
equations,
node = 0,
loop = 0,
iterations = 7;
branch,
number = 100,
type = 'independent_branch',
flow_rate = 436238 (lbm/h),
flow_change = -6.20476e-007 (%),
flow_dp = 2.17 (psi),
elevation_dp = 0 (psi);
component,
branch_number = 100,
component_number = 0,
type = 'pipe',
resistance = 4.95228 (Pa*s2/kg2),
change_resistance = -1.24095e-008 (%),
pressure_drop = 2.17 (psi);
node,
number = 1,
pressure = 8.67 (psi);

node,
number = 2,
pressure = 6.5 (psi);
next;


Listing 5
Example 5. Public Interface Between MEL and Any
Application Program Using It. (Excerpted from mel.h header file.)

/* if using MEL for input (#define MEL_INPUT), then must
define the MEL input data object: */
#ifdef MEL_INPUT

/* firstly, define input constants (all must be
CUSTOMIZED for specific application program): */

#define MELI_MAX_DESCRIP_STR_LEN 256
/* maximum number of characters in any input descriptor
string. */
#define MELI_MAX_PARAMS 6
/* maximum number of parameters for any descriptor (min
num = 1). */
#define MELI_MAX_PARAM_STR_LEN 80
#define MELI_MAX_PARAM_ARRAY_STR_LEN 1
/* largest allowable parameter string lengths (min size
= 1) */
#define MELI_MAX_PARAM_INT_ARRAY_LEN 1
#define MELI_MAX_PARAM_REAL_ARRAY_LEN 1
#define MELI_MAX_PARAM_STR_ARRAY_LEN 1
/* maximum number of elements in parameter data arrays
(min = 1). */
#define MELI_UNITS_STR_LEN 80
/* maximum length of units associated with any param
(min = 1) */

/* secondly, define input data structures: */

union meli_param_data {
int integer; /* also holds boolean type */
double real;
char string[MELI_MAX_PARAM_STR_LEN+1];
int integer_array [MELI_MAX_PARAM_INT_ARRAY_LEN];
double real_array[MELI_MAX_PARAM_REAL_ARRAY_LEN];
char string_array [MELI_MAX_PARAM_STR_ARRAY_LEN]
[MELI_MAX_PARAM_ARRAY_STR_LEN+1];
};
/* this is used for input parameter data. it may either be
an integer, real, string, array of integers, array of
reals, or an array of strings. (to save space a union was
used.) */

/* thirdly, define input variables: */

char meli_descriptor_string[MELI_MAX_DESCRIP_STR_LEN+1];
/* global storage for the input descriptor string. */

/* lastly, define input functions (typically they return 0
if no error encountered, else some nonzero error

code): */

int meli_file(FILE *meli_file_handle);
/* read a descriptor string from the input stream and
call meli(). also, put copy of string read into
meli_descriptor_string. */
int meli(void);
/* translate meli_descriptor_string and put information
into a private data structure (meli_datum). */
char *meli_descrip_type (void);
/* return pointer to name of type of descriptor read by
meli(). */
int meli_num_params(void);
/* return number of parameters read by meli(). */
int meli_param(int param_num, char *param, union
meli_param_data *data, char *units, int *array_len, int
*unknown_flag);
/* fill arguement list with param_num'th parameter read
by meli(). (start with param_num = 0.) */
int meli_data(char *param, union meli_param_data *data,
char *units, int *array_len, int *unknown_flag);
/* see if *param was input. if it was, then fill
argument list with data from meli_datum. */

#endif /* MEL_INPUT */

/* if using MEL for output, must define the MEL output data
object: */
#ifdef MEL_OUTPUT

/* firstly, define output constants (all must be
CUSTOMIZED): */

#define MELO_MAX_DESCRIP_STR_LEN 256
/* how many characters can be in an output descriptor
string? */
#define MELO_MAX_PARAMS 6
/* maximum number of parameters for any descriptor. */
#define MELO_MAX_PARAM_STR_LEN 80
#define MELO_MAX_PARAM_ARRAY_STR_LEN 1
/* largest allowable parameter string length. */
#define MELO_MAX_PARAM_INT_ARRAY_LEN 1
#define MELO_MAX_PARAM_REAL_ARRAY_LEN 1
#define MELO_MAX_PARAM_STR_ARRAY_LEN 1
/* maximum number of elements in array of parameter
data. */
#define MELO_UNITS_STR_LEN 80
/* maximum string length of any units associated with a
param. */

/* secondly, define output data structures: */

union melo_param_data {
int integer;
double real;
char string[MELO_MAX_PARAM_STR_LEN+1];
int integer_array[MELO_MAX_PARAM_INT_ARRAY_LEN];
double real_array[MELO_MAX_PARAM_REAL_ARRAY_LEN];
char string_array[MELO_MAX_PARAM_STR_ARRAY_LEN]

[MELO_MAX_PARAM_ARRAY_STR_LEN+1];
};
/* this is for output parameter data. it may either be an
integer, real, string, array of integers, array of reals, or
an array of strings. (to save space a union was used.) */

/* thirdly, define output variables: */

char melo_descriptor_string[MELO_MAX_DESCRIP_STR_LEN+1];
/* global storage for the output descriptor string. */

/* lastly, define output functions (typically return 0 if no
error): */

int melo_init(char *descrip_type);
/* initialize private data structure (melo_datum) to
accept parameter data from following functions.
output descriptor type will be descrip_type. returns
0 if no errors were encountered. */
int melo_data(char *param, union melo_param_data *data, char
*units, int array_len, int unknown_flag);
/* put data for parameter *param into the proper place
in melo_datum. returns zero if no errors were
encountered. */
void melo(int melo_verbose_flag);
/* takes the information in melo_datum and translates it
into melo_descriptor_string. user must set
melo_verbose_flag = 1 to make output as readable as
possible, set it equal to zero to make output as
terse as possible (and still remain in MEL
format). */
int melo_file(FILE *melo_file_handle, int melo_verbose_flag);
/* take the information in melo_datum, translate it into
melo_descriptor string, and output it to file. */

#endif /* MEL_OUTPUT */

/* now define data objects common to both input and
output: */

/* if an error occurs, MEL will try and tell you what
happened. so define required error handling
information: */

#define MEL_MAX_ERR_MSG_LEN 80

struct mel_errors {
enum { /* which error occured? */
mel_no_err,
mel_read_err,
mel_write_err,
mel_end_of_file_err,
mel_end_of_data_err,
mel_syntax_err,
mel_unknown_descrip_name_err,
mel_unknown_param_name_err,
mel_missing_param_name_err,
mel_param_data_err,
mel_missing_paren_err,

mel_too_many_param_err,
mel_missing_bracket_err,
} type;
int, start_line; /* on which lines did err occur? */
int end_line; /* (meaningful for input only.) */
char msg[MEL_MAX_ERR_MSG_LEN+1];
/* additional info describing err */
} mel_err; /* (not same as messages below). */

#define MEL_MAX_NUM_ERR_MESSAGES 13

#ifdef MEL-INIT

/* the following describes each type of enumerated error: */
char mel_err_msg[MEL_MAX_NUM_ERR_MESSAGES]
[MEL_MAX_ERR_MSG_LEN+1]
={"No errors encountered",
"Can't read file",
"Can't write file",
"Unexpected end of file encountered",
"End of input data encountered",
"Descriptor/parameter syntax error",
"Unknown descriptor name",
"Unknown parameter name",
"A (or another) parameter name was expected but is "
"missing",
"Unable to read parameter value(s) for this "
"descriptor",
"Missing right parenthesis while reading units",
"Too many (or duplicate) parameters given for this "
"descriptor",
"Missing brackets around array data"};

#else

extern char mel_err_msg[MEL_MAX_NUM_ERR_MESSAGES]
[MEL_MAX_ERR_MSG_LEN+1];

#endif /* MEL_INIT */
























Object-Oriented Programming As A Programming Style


Eric White


Eric White is a software engineer at Advanced Programming Institute, Ltd. He
is working on a character-based version of XVT. XVT is a common programming
interface with implementations for various window systems, including
Macintosh, Microsoft Windows, Presentation Manager, OSF/Motif, and character
based on UNIX and MS-DOS. He can be reached at API at (303) 443-4223.


Object-oriented programming is a programming style that can be used in many
languages, including C and C++. Some programmers think that C++ gives them the
ability to do object-oriented programming. This isn't accurate -- C
programmers can already do object-oriented programming. I will demonstrate by
showing two identically structured object-oriented programs, one in C and the
other in C++.
Even though one can do object-oriented programming in C, C++ offers several
advantages: C++ supplies syntactic support for object-oriented programming and
C++ provides type checking where not possible in C.
I am assuming the reader has already read one of the numerous magazine
articles that introduce object-oriented programming. A good article is
"Mapping Object Oriented Concepts Into C++ Language Facilities", CUJ July '89
by Tsvi BarDavid. If you already know C, an example of object-oriented
programming in C can clarify exactly what is goes on in object-oriented
programming. Once you understand the C example, the identical example in C++
can make learning C++ easier. You can even imagine how the code generated by a
C++ translator looks.


The Example


I'll develop the comparison using a graphical application that could be the
beginnings of a drawing program such as Mac Draw. This example is constructed
with four classes of objects: graph_obj, circle, square, and double_circle.
Three instructions can be given to any one of these objects:
init, which takes as arguments the initial position and size of the object.
init initializes the object, then draws it.
move, which draws the object in black, modifies the position, then draws it in
white. move takes a change in the y and x coordinates as arguments.
draw, used by init and move. draw takes a color as an argument.


The Listings


Listing 1 is the pseudo-code for the example. The code in Listing 2 (obj.h)
and Listing 3 (obj.c) facilitates object-oriented programming in C, allowing
the creation of classes, methods, objects, and implementing inheritance.
Listing 4 (drawc.c) and Listing 5 (drawcxx.cxx) are two examples of
object-oriented code in C and C++ respectively. They perform identically.
In the pseudo-code, you can see:
We derive classes circle and square from class graph_obj.
We derive class double_circle from class circle.
All classes inherit the method move from class graph_obj. If method move needs
to be invoked for an object of class circle, then method move of class
graph_obj is actually the function called. We are able to reuse the move
method for every class in this example.
Class double_circle inherits the method init from class circle.
Class double_circle overrides the method draw from class circle. If method
draw needs to be invoked for an object of class double_circle, then the method
is not inherited from the super-class.
For portability, I isolate the graphics functions in a utility module. Listing
6 (utility.h) is the interface to the utility module. Listing 7 (utility.c)
contains fatal() and the graphical functions. The utility module is compiled
and linked with either the C or C++ code. The isolation also makes it easier
to compare the two object-oriented examples.


Object-Oriented Programming In C


This system implements classes, methods, objects, inheritance, and messages.
The entire module that facilitates object-oriented programming is less than 90
lines of code.
I'll start with a simple data abstraction mechanism, then develop it into a
system that supports classes, inheritance, and messages.
The most natural means of creating an object and associating methods with it
is to put pointers to the methods (pointers to functions) in a structure along
with the data. A structure for an instance of a circle might look like this:
struct {
int y;
int x;
int radius;
void (*init)();
void (*draw)();
void (*move)();
} circle;
This implements an object that knows how to initialize itself, draw itself,
and move itself. The implementation could vary for different types (such as a
double circle). However, we might get tired of setting up the methods every
time we create a new instance of a circle. A solution is to design another
structure (called a class) that contains the pointers to the functions, and
place only a pointer to the class in each object. With this technique we may
create a class once, then create several objects and have them point to that
class.
To make the class structure more generic, we define an array of pointers to
functions, and by convention, define the methods as an index into this array.
The code now looks like
/* defines for methods */
#define INIT 0
#define DRAW 1
#define MOVE 2


struct class {
int nbr_methods;
void (**method)();
};

typedef struct class CLASS;

struct {
CLASS *class;
int y;
int x;
int radius; }
circle;
When creating a class, we need to initialize the array of pointers to
functions after allocating memory for it. If the method is implemented in the
class itself, then the pointer is set to the function address. If the method
is inherited from the super-class, then the pointer is loaded from the
super-class.
To make an object more generic, we'll take the definition of the data out of
the object and replace it with a pointer to the data. Space for the data is
allocated when the object is created and freed when the object is no longer
needed. Listing 2 contains the final definitions of structures for class and
object.


Classes


To define a class:
Define a structure to hold the information about the class. (Listing 6, lines
15-18)
Write the methods (the functions associated with the class). An example is the
DRAW method for class circle. (Listing 6, lines 69-81)
Declare a structure of type class. (Listing 6, line 143)
Call new_class(), which loads the pointers to the inherited methods from the
super-class. It also saves the size of memory needed for each object in the
class. (Listing 6, line 160)
Call reg_method() to register each method that we want to implement in the
class being created. Registering a method means storing a pointer to a
function in the array of pointers to functions. reg_method() shouldn't be
called for methods inherited from the super-class. (Listing 6, lines 161-162)


Methods


A method is a function written specifically to go with the class. In this
example, methods don't return a value.
All methods should be aware that obj->data is a pointer to the data allocated
on the heap. For a particular class, this data is of an assumed structure
type. By casting obj->data to a pointer to a structure, the method can access
the object data correctly.
All methods receive the argument arg_ptr, which can be used with the macro
va_arg() if there are arguments to the method. See your documentation on
stdarg.h.


Objects


The structure that holds what we need to know about an object is:
typedef struct {
void *data; 
CLASS *class;
} OBJECT;
To create and use an object:
Declare a structure of type OBJECT. (Listing 6, line 148)
Call the function new_object(), which registers a class with the object and
allocates memory for the object. (Listing 6, line 174)
Send messages to the object. With the graphical objects in the example, the
first message that we want to send is the INIT message. (Listing 6, line 175).
After that, we can send MOVE or DRAW messages. (Listing 6, line 186)
When done with the object, we call free_object (), which frees the allocated
memory. (Listing 6, line 191)


Inheritance


Inheritance of methods is demonstrated here. circle inherits MOVE from class
graph_obj. double_circle inherits INIT and MOVE from its super-classes.
I implement inheritance of data structures by having a sub-class allocate more
memory than the super-class. The sub-class data consists of the parent-class
data followed by the data specific to the subclass.


Messages



There is a distinction between a message and a method. A message gets sent to
an object, and then something decides which method to invoke. Invoking a
method means that the function that is part of the class is called. In C++,
the translator decides which method to invoke. In the system implemented in C,
the function message() (Listing 3) decides, based on the class of the object.


Summary Of OOP In C


One disadvantage of doing object-oriented programming in C is that there is no
function prototyping. We have no idea what the arguments to a method are when
we declare the pointers to functions in the class structure. Programmers are
responsible for sending the correct parameters to a message.
Another disadvantage is that when writing methods, the programmer must access
the data in the object correctly. The pointer to the data in the object
structure must be cast as a pointer to the correct structure type.


Object-Oriented Programming In C++


The C++ example also demonstrates classes, methods, objects, inheritance, and
messages.
I'll explain a small subset of the syntax of C++, only what is essential to do
object-oriented programming. There are many features of C++ that have nothing
to do with object-oriented programming, and the object-oriented programming
part of C++ is elaborate, with useful but nonessential features. The subset
is:
Definition of a class, with and without a super-class.
Definition of a method.
Declaration of an object.
Sending a message to an object.


Classes


The three essential pieces of a class are:
The data structure of the class.
The super-class if there is one.
The methods.
The definition of a class in C++ looks like:
class graph_obj {
public:
int y;
int x;
void init(int y, int x);
void move(int y, int x);
virtual void draw(int color){};
};
y and x are the data that will be contained in an object of class graph_obj.
To define methods, you put the function prototype for the methods in the
definition of the class.
The class graph_obj doesn't have a super-class. When defining a class where
there is a super-class, you follow the name of the class by a colon (:), the
keyword public, and the name of the super-class. For example:
class circle : public graph_obj {
public:
int radius;
void init(int y, int x, int
radius);
void draw(int color);
};
Members of a class may be private or public. For simplicity's sake, all
members of all classes in this example are public. I'm not attempting to do
data-hiding in this example. Hiding data is a separate (and important) issue,
but is beyond the scope of this article. The keyword public before the name of
the super-class means that all the public members of the super-class are
public members of the sub-class.


Methods


The definition of a method looks similar to that of a function. To define the
name of the function, you follow the class name with the scope resolution
operator :: and the name of the method. For example, the draw method for class
circle would look like this:
void circle::draw(int color)
{
/* code to draw a circle */
...

}
Here is an important note about coding a method. A hidden argument to every
method is the object. When a method gets invoked for a particular object, by
definition you get access to that object. You can access the members of that
object just by using the names of the members.
Methods are invoked much as functions are called in C. Sometimes, when writing
code for a method, we want to force a method to be invoked for a super-class,
and the class for which we are writing the method has a method of the same
name as the one in the super-class that we want to invoke. In this case, we
can use the scope resolution operator (::) and force the method to be invoked
for the super-class. For the init method for class circle, to invoke the init
method for class graph_obj, we specify the name of the class, followed by the
scope resolution operator, followed by the name of the method.
Sometimes the method to invoke at run time can't be determined because a
particular section of code could be operating on many types of objects. In
C++, code such as this must be operating on objects of a certain class, or of
a sub-class of that class. If you declare a method of a class highest in the
class hierarchy virtual, C++ will wait until run time to make the decision of
which method to invoke, and will invoke the correct method for the object
being operated on. To do this, C++ puts something in the object that indicates
which class it is. Resolution of the method to invoke at run time is called
late binding.
This is useful when you send messages to pointers to objects, where the
pointer could point to one of several classes of objects. It's also useful in
a method that serves a class and its subclasses. draw is virtual because the
method move (which uses the method draw) in class graph_obj also serves
classes circle, double_circle and square.
In C++, each class can have two special methods: the constructor and
destructor. Essentially the constructor is called automatically when an object
comes into scope, and the destructor is called when an object goes out of
scope. For example, if you declare an automatic object at the start of a
function, the constructor is called at the time of declaration, and the
destructor is called before the function returns to its calling function.
Constructors and destructors are not essential to object-oriented programming.
In other systems, programmers make a method specifically for initializing an
object when they need one, then send that message to the object after creating
it. In the C++ example that accompanies this article, I don't use the built-in
constructors and destructors. In both the C and C++ examples, I have a method
that initializes the values of the graphical object. I call this method INIT.
In the C example, I use a function that allocates memory for the object before
use and frees the memory after use. These functions aren't defined as part of
a class and should not be confused with methods.


Objects


An object declaration looks like a declaration of something for which there is
a typedef in C. A declaration of an object of class circle looks like:
circle c1;
In the graphics example, immediately after declaring a graphical object the
init message is sent to the new object. This gives the object its starting
position and size, and draws it on the screen. Listing 7, line 99 shows
initialization of a circle at position (40, 40), with a radius of 20.
After sending the init message, we can send a move message to the object,
causing it to move on the screen. (Listing 7, line 103-105).
In the C example, we use a pointer in an object to point to the data specific
to that instance of the object. new_object() allocates that data on the heap,
and the function free_obj () frees it.
In contrast, the C++ translator actually creates a structure that contains the
data. In our example, this structure is an automatic structure. Space for it
gets deallocated when main() returns. We don't need to free any data on the
heap as we needed to do in the C example.


Inheritance


Just as in the C example, the C++ example demonstrates inheritance of methods.
double_circle inherits init and move from class circle.


Messages


Sending a message in C looks like:
message(&c1, MOVE, 1, 1);
Sending a message in C++ looks like:
c1.move(1, 1);
We specify the same essential elements in both cases. They are:
The object (c1)
The message (MOVE or move)
The number of pixels to move in the y and x direction.


Summary Of OOP In C++


Data hiding and modularity are important issues in C++ as in other languages.
I am not addressing these issues and have put the entire program in one source
file. I want to focus on the object-oriented aspect and keep it simple.
Often in C++, when a message is sent to an object of a known type, the
compiler resolves the particular method to invoke at compilation time. This is
called early binding. In contrast, the function message() in the C scheme
presented here resolves the issue of which method to invoke at run time. This
is called late binding.
Because the C methodology always does late binding, a little more code must
always be executed at run time. The C code may be a bit slower than the code
generated by the C++ translator. However, when using virtual functions, I
believe that the speed of sending a message in C is comparable to C++.
C++ inherits many of the characteristics of C. In C++, you have the ability to
corrupt memory in the same ways that you can corrupt memory in C. This causes
temporal and referential non-localization of bugs. C++ offers the same
beneficial characteristics of C such as speed, compactness, and the
possibility of portability.


Portability


The C code is quite portable and runs on:
Microsoft C v5.1
Microsoft Quick C v2.0
Zortech C compiler
The C++ code runs on:
Zortech C++ compiler
Glockenspiel C++ translator using the Microsoft C compiler v5.1.

The graphics code works on CGA, EGA, Hercules and VGA.
The utility module can use either the graphics library that accompanies
Microsoft C v5.0 or the graphics library that comes with the Zortech C++
compiler.
If you are using the Microsoft graphics library and Hercules graphics, before
you can run these programs you need to run MSHERC.COM.
The Zortech graphics library has its origin at the lower-left corner.
Microsoft has its origin at the upper-left corner. Also, because pixels are
not square, neither the Zortech nor the Microsoft libraries create perfectly
round circles. Because this article is focusing on object-oriented techniques
and not on graphical techniques, I didn't address any of these problems.


Exercises


A few valuable exercises might be:
Make a new class such as a diamond.
Make a new method such as expand or contract that will change the size of an
object.
Adapt this system to another graphical system.


Acknowledgements


I thank Marc Rochkind and Tom Cargill, who taught me much of what I know about
object-oriented programming.

Listing 1
Class Graphical Object
Graphical Object is an abstract class. There will never
be any instances of this class. Classes Circle and
Square are subclasses of this class.
Graphical Object data:
y position
x position
Graphical Object methods:
INITIALIZE
Starting y position
Starting x position
DRAW
Only implemented by subclasses
MOVE
Arguments:
Increment in the y direction
Increment in the x direction
Send the draw black message to the object (erase
the object).
Modify the x and y position of the object per
the arguments passed to the MOVE method.
Send the draw white message to the object.

Class Circle
Circle is a subclass of class Graphical Object.
Circle data (in addition to Graphical Object data):
radius of the circle
Circle methods:
INITIALIZE
Arguments:
Starting y position
Starting x position
Radius
Send the INITIALIZE message to class Graphical
Object
Store the size in the Circle data.
Send the DRAW message to the Circle.
DRAW

Argument:
Color of the circle to be drawn.
Draw the circle on the screen.
MOVE
Inherited from the class Graphical Object.

Class Square
Square is a subclass of class Graphical Object.
Square data:
the length of a side of the square
Square methods:
INITIALIZE
Arguments:
Starting y position
Starting x position
Radius
Send the INITIALIZE message to class Graphical
Object
Store the size in the Square data.
Send the DRAW message to the Square.
DRAW
Argument:
Color of the square to be drawn.
Draw the square on the screen.
MOVE
Inherited from the class Graphical Object.

Class Double_circle
Class Double_circle is a subclass of class Circle.
Double_circle data:
Same a for a Circle.
Double_circle methods:
INITIALIZE
Inherited from class Circle.
DRAW
Argument:
Color of the Double_circle to be drawn.
Draw a circle on the screen.
Draw a slightly smaller concentric circle on the
screen.
MOVE
Inherited from class Circle.


Listing 2
001 /* obj.h - Interface to module for object oriented
002 programming in C. */
003
004 struct class {
005 int size; /* size of data */
006 int nbr_methods;
007 void (**method)();
008 };
009
010 typedef struct class CLASS;
011
012 typedef struct {
013 void *data;
014 CLASS *class;

015 } OBJECT;
016
017 void new_class(CLASS *class, CLASS *super_class,
018 int nbr_methods, int size);
019 void reg_method(CLASS *class, int mth, void (*fcn)());
020 void new_object(OBJECT *obj, CLASS *class);
021 void message(OBJECT *obj, int msg, ...);
022 void free_object(OBJECT *obj);
023 void free_class(CLASS *class);


Listing 3
001 #include <stdlib.h>
002 #include <stdarg.h>
003 #include <stdio.h>
004 #include "utility.h"
005 #include "obj.h"
006
007 void new_class(CLASS *class, CLASS *super_class,
008 int nbr_methods, int size)
009 {
010 int x;
011 class->nbr_methods = nbr_methods;
012 class->size = size;
013 class->method =
014 (void (**)())malloc
015 ((unsigned)(nbr_methods * sizeof (void (*)())));
016 for (x = 0; x < nbr_methods; ++x)
017 class->method[x] = (void *)NULL;
018 if (super_class != NULL)
019 for (x = 0; x < super_class->nbr_methods &&
020 x < class->nbr_methods; ++x)
021 class->method[x] = super_class->method[x];
022 }
023
024 void free_class(CLASS *class)
025 {
026 free(class->method);
027 }
028
029 /* register a method with a class */
030 void reg_method(CLASS *class, int mth, void (*fcn)())
031 {
032 if (mth < 0 mth >= class->nbr_methods)
033 fatal("attempting to register an invalid method");
034 class->method[mth] = fcn;
035 }
036
037 /* initialize an object */
038 void new_object(OBJECT *obj, CLASS *class)
039 {
040 void *v;
041 obj->class = class;
042 v = malloc((unsigned)class->size);
043 if (v == NULL)
044 fatal("smalloc failed");
045 obj->data = (void *)((unsigned char *)v);
046 }
047

048 /* send a message to an object */
049 void message(OBJECT *obj, int msg, ...)
050 {
051 va_list arg_ptr;
052 va_start(arg_ptr, msg);
053 if (obj->class->method[msg) == NULL)
054 fatal("no method for this class");
055 (*obj->class->method[msg])(obj, arg_ptr);
056 va_end(arg_ptr);
057 }
058
059 /* free the data allocated for an object */
060 void free_object(OBJECT *obj)
061 {
062 free(obj->data);
063 }


Listing 4
001 /* interface to utility module */
002
003 extern int g_white;
004 extern int g_black;
005
006 void fatal(char *s);
007 void g_init(void);
008 void cleanup(void);
009 void g_circle(int y, int x, int radius, int color);
010 void g_square(int y, int x, int size, int color);


Listing 5
001 #include <stdlib.h>
002 #include <stdarg.h>
003 #include <stdio.h>
004 #include "utility.h"
005 #ifdef __ZTC______LINEEND____
006 #include <fg.h>
007 #else
008 #include <graph.h>
009 #endif
010
011 int g_white;
012 int g_black;
013
014 void fatal(char *s)
015 {
016 printf("FATAL ERROR: %s\n", s);
017 exit(1);
018 }
019
020 void trace(char *fmt, ...)
021 {
022 static FILE *outfp = NULL;
023 va_list arg_ptr;
024 va_start(arg_ptr, fmt);
025 if (outfp == NULL) {
026 unlink("tf");
027 if ((outfp = fopen("tf", "w")) == NULL)

028 fatal("fopen failed\n");
029 setbuf(outfp, NULL);
030 }
031 vfprintf(outfp, fmt, arg_ptr);
032 va_end(arg_ptr);
033 }
034
035 /* utility function to put screen in graphics mode */
036 void g_init(void)
037 {
038 #ifdef_ZTC_____LINEEND____
039 fg __init__all();
040 g_white = FG_WHITE;
041 g_black = FG_BLACK;
042 #else
043 struct videoconfig this_screen;
044 _getvideoconfig(&this_screen);
045 switch (this_screen.adapter)
046 {
047 case _CGA:
048 case _OCGA:
049 _setvideomode(_HRESBW);
050 break;
051 case _EGA:
052 case _OEGA:
053 _setvideomode(_ERESCOLOR);
054 break;
055 case _VGA:
056 case _OVGA:
057 case _MCGA:
058 _setvideomode(_VRES2COLOR);
059 break;
060 case _HGC:
061 _setvideomode(_HERCMONO);
062 break;
063 default:
064 printf("This program requires a CGA, EGA, MCGA,");
065 printf("VGA, or Hercules card\n");
066 exit(0);
067 }
068 g_white = _getcolor();
069 g_black = 0;
070 #endif
071 }
072
073 /* utility function - wait for a key so we can see
074 graphics, set video mode back to character mode */
075 void cleanup()
076 {
077 int ch;
078 ch = getchar();
079 #ifdef __ZTC______LINEEND____
080 fg_term();
081 #else
082 _setvideomode(_DEFAULTMODE);
083 #endif
084 /*lint -esym(550,ch) */
085 }
086 /*lint +esym(550,ch) */

087
088 void g_circle(int y, int x, int radius, int color)
089 {
090 #ifdef __ZTC_____LINEEND____
091 fg_drawarc((fg_color_t)color, FG_MODE_SET, ~0, x, y,
092 radius, 0, 3600, fg_displaybox);
093 #else
094 _setcolor(color);
095 _ellipse(_GBORDER, x - radius, y - radius, x + radius,
096 y + radius);
097 #endif
098 }
099
100 void g_square(int y, int x, int size, int color)
101 {
102 #ifdef __ZTC______LINEEND____
103 int hs;
104 fg_box_t box;
105 hs = size / 2;
106 box[FG_X1] = x - hs;
107 box[FG_Y1] = y - hs;
108 box[FG_X2] = x + hs;
109 box[FG_Y2] = y + hs;
110 fg_drawbox((fg_color_t)color, FG_MODE_SET, ~0,
111 FG_LINE_SOLID, box, fg_displaybox);
112 #else
113 int hs;
114 hs = size / 2;
115 _setcolor(color);
116 _rectangle(_GBORDER, x - hs, y - hs, x + hs, y + hs);
117 #endif
118 }


Listing 6
001 #include <stdio.h>
002 #include <stdlib.h>
003 #include <stdarg.h>
004 #include "utility.h"
005 #include "obj.h"
006
007 /* methods for graphical_object, circle, double_circle, square */
008 #define INIT 0
009 #define DRAW 1
010 #define MOVE 2
011
012 /********************************************************/
013 /* CLASS GRAPHICAL OBJECT */
014
015 struct graph_obj_s {
016 int y;
017 int x;
018 };
019
020 typedef struct graph_obj_s GRAPH_OBJ_T;
021 #define GRAPH_OBJ_SIZE sizeof(GRAPH_OBJ_T)
022 #define GRAPH_OBJ_OFFSET 0
023
024 /* graph_obj_init(object, y_position, x_position); */

025 void graph_obj_init(OBJECT *obj, va_list arg_ptr)
026 {
027 GRAPH_OBJ_T *g;
028 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data +
029 GRAPH_OBJ_OFFSET);
030 g->y = va_arg(arg_ptr, int);
031 g->x = va_arg(arg_ptr, int);
032 }
033
034 /* graph_obj_move(object, distance_y, distance_x); */
035 void graph_obj_move(OBJECT *obj, va_list arg_ptr)
036 {
037 GRAPH_OBJ_T *g;
038 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data +
039 GRAPH_OBJ_OFFSET);
040 message(obj, DRAW, g_black);
041 g->y += va_arg(arg_ptr, int);
042 g->x += va_arg(arg_ptr, int);
043 message(obj, DRAW, g_white);
044 }
045
046 /********************************************************/
047 /* CLASS CIRCLE */
048
049 struct circle_s {
050 int radius;
051 };
052
053 typedef struct circle_s CIRCLE_T;
054 #define CIRCLE_SIZE sizeof(CIRCLE_T) + GRAPH_OBJ_SIZE
055 #define CIRCLE_OFFSET sizeof(GRAPH_OBJ_T)
056
057 /* circle_init(object, y_position, x_position, radius); */
058 void circle_init(OBJECT *obj, va_list arg_ptr)
059 {
060 CIRCLE_T *c;
061 graph_obj_init(obj, arg_ptr);
062 (void)va_arg(arg_ptr, int);
063 (void)va_arg(arg_ptr, int);
064 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET);
065 c->radius = va_arg(arg_ptr, int);
066 message(obj, DRAW, g_white);
067 }
068
069 /* circle_draw(object, color); */
070 void circle_draw(OBJECT *obj, va_list arg_ptr)
071 {
072 int color;
073 CIRCLE_T *c;
074 GRAPH_OBJ_T *g;
075 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET);
076 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data +
077 GRAPH_OBJ_OFFSET);
078 color = va_arg(arg_ptr, int);
079 /* g_circle(g->y, g->x, c->radius, va_arg(arg_ptr, int)); */
080 g_circle(g->y, g->x, c->radius, color);
081 }
082
083 /********************************************************/

084 /* CLASS SQUARE (very similar to CIRCLE) */
085
086 struct square_s {
087 int size;
088 };
089
090 typedef struct square_s SQUARE_T;
091 #define SQUARE_SIZE sizeof(SQUARE_T) + GRAPH_OBJ_SIZE
092 #define SQUARE_OFFSET sizeof(GRAPH_OBJ_T)
093
094 /* square_init(object, y_position, x_position, size); */
095 void square_init(OBJECT *obj, va_list arg_ptr)
096 {
097 SQUARE_T *s;
098 graph_obj_init(obj, arg_ptr);
099 (void)va_arg(arg_ptr, int);
100 (void)va_arg(arg_ptr, int);
101 s = (SQUARE_T *)((unsigned char *)obj->data + SQUARE_OFFSET);
102 s->size = va_arg(arg_ptr, int);
103 message(obj, DRAW, g_white);
104 }
105
106 /* square_draw(object, color); */
107 void square_draw(OBJECT *obj, va_list arg_ptr)
108 {
109 SQUARE_T *s;
110 GRAPH_OBJ_T *g;
111 s = (SQUARE_T *)((unsigned char *)obj->data + SQUARE_OFFSET);
112 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data +
113 GRAPH_OBJ_OFFSET);
114 g_square(g->y, g->x, s->size, va_arg(arg_ptr, int));
115 }
116
117 /********************************************************/
118 /* CLASS DOUBLE CIRCLE (sub-class of CIRCLE) */
119
120 #define DOUBLE_CIRCLE_SIZE CIRCLE_SIZE
121
122 /* double_circle_draw(object, color); */
123 void double_circle_draw(OBJECT *obj, va_list arg_ptr)
124 {
125 int color;
126 CIRCLE_T *c;
127 GRAPH_OBJ_T *g;
128 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET);
129 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data +
130 GRAPH_OBJ_OFFSET);
131 color = va_arg(arg_ptr, int);
132 g_circle(g->y, g->x, c->radius, color);
133 g_circle(g->y, g->x, c->radius - 2, color);
134 }
135
136 /********************************************************/
137
138 int main(int argc, char **argv);
139 int main(int argc, char **argv)
140 {
141 int x;
142

143 CLASS graph_obj;
144 CLASS circle;
145 CLASS square;
146 CLASS double_circle;
147
148 OBJECT c1;
149 OBJECT s1;
150 OBJECT dc1;
151
152 g_init();
153
154 /* make class graphical object */
155 new_class(&graph_obj, NULL, 3, GRAPH_OBJ_SIZE);
156 reg_method(&graph_obj, INIT, graph_obj_init);
157 reg_method(&graph_obj, MOVE, graph_obj_move);
158
159 /* make class circle */
160 new_class(&circle, &graph_obj, 3, CIRCLE_SIZE);
161 reg_method(&circle, INIT, circle_init);
162 reg_method(&circle, DRAW, circle_draw);
163
164 /* make class square */
165 new_class(&square, &graph_obj, 3, SQUARE_SIZE);
166 reg_method(&square, INIT, square_init);
167 reg_method(&square, DRAW, square_draw);
168
169 /* make class double_circle */
170 new_class(&double_circle, &circle, 3, DOUBLE_CIRCLE_SIZE);
171 reg_method(&double_circle, DRAW, double_circle_draw);
172
173 /* make a circle object */
174 new_object(&c1, &circle);
175 message(&c1, INIT, 40, 40, 20);
176
177 /* make a square object */
178 new object(&s1, &square);
179 message(&s1, INIT, 40, 100, 20);
180
181 /* make a double circle object */
182 new_object(&dc1, &double_circle);
183 message(&dc1, INIT, 40, 160, 20);
184
185 for (x = 0; x < 100; ++x) {
186 message(&c1, MOVE, 1, 1);
187 message(&s1, MOVE, 1, 0);
188 message(&dc1, MOVE, 0, -1);
189 }
190
191 free_object(&c1);
192 free_object(&s1);
193 free_object(&dc1);
194
195 free_class(&graph_obj);
196 free_class(&circle);
197 free_class(&square);
198 free_class(&double_circle);
199
200 cleanup();
201

202 return (0);
203 }


Listing 7
001 #include <stdio.h>
002 #include <stdarg.h>
003 #include "utility.h"
004
005 /*********************************************************/
006 /* CLASS GRAPHICAL OBJECT */
007
008 class graph_obj {
009 public:
010 int y;
011 int x;
012 void init(int y, int x);
013 void move(int y, int x);
014 virtual void draw(int color){};
015 };
016
017 void graph_obj::init(int y2, int x2)
018 {
019 y = y2;
020 x = x2;
021 }
022
023 void graph_obj::move(int y_delta, int x_delta)
024 {
025 draw(g_black);
026 x += x_delta;
027 y += y_delta;
028 draw(g_white);
029 }
030
031 /*********************************************************/
032 /* CLASS CIRCLE */
033
034 class circle: public graph_obj {
035 public:
036 int radius;
037 void init(int y, int x, int radius);
038 void draw(int color);
039 };
040
041 void circle::init(int y2, int x2, int radius2)
042 {
043 graph_obj::init(y2, x2);
044 radius = radius2;
045 draw(g_white);
046 }
047
048 void circle::draw(int color)
049 {
050 g_circle(y, x, radius, color);
051 }
052
053 /*********************************************************/
054 /* CLASS SQUARE */

055
056 class square: public graph_obj {
057 public:
058 int size;
059 void init(int y, int x, int radius);
060 void draw(int color);
061 };
062
063 void square::init(int y2, int x2, int size2)
064 {
065 graph_obj::init(y2, x2);
066 size = size2;
067 draw(g_white);
068 }
069
070 void square::draw(int color)
071 {
072 g_square(y, x, size, color);
073 }
074
075 /*********************************************************/
076 /* CLASS DOUBLE_CIRCLE */
077
078 class double_circle: public circle {
079 public:
080 void draw(int color);
081 };
082
083 void double_circle::draw(int color)
084 {
085 g_circle(y, x, radius, color);
086 g_circle(y, x, radius - 2, color);
087 }
088
089 /********************************************************/
090
091 int main(void);
092 int main(void)
093 {
094 int x;
095 circle c1;
096 square s1;
097 double_circle dc1;
098 g_init();
099 c1.init(40, 40, 20);
100 s1.init(40, 100, 20);
101 dc1.init(40, 160, 20);
102 for (x = 0; x < 100; ++x) {
103 c1.move(1, 1);
104 s1.move(1, 0);
105 dc1.move(0, -1);
106 }
107 cleanup();
108 return (0);
109 }




































































Tools For MS-DOS Directory Navigation


Leor Zolman


Leor Zolman wrote "BDS C", the first C compiler designed exclusively for
personal computers. Since then he has designed and taught programming
workshops and has also been involved in personal growth workshops as both
participant and staff member. He STILL doesn't hold any degrees. His latest
incarnation is as a CUJ staff member.


As an MS-DOS user with a large amount of hard disk space to manage, I
frequently find myself cd-ing all over the system in pursuit of source files
and data. The standard MS-DOS command processor COMMAND.COM's repertoire of
options for facilitating system navigation is bare-bones and full of
idiosyncrasies. For instance, to change directly to an arbitrary drive and
user area, the user must enter the drive selector and path specification as
two separate commands. Switching from the root directory of drive C: to the
\work directory on drive D: requires the command sequence:
C:\>d: (select D:)
D:\>cd work (change to the desired directory)
D:\WORK>...
(All examples assume the PROMPT environment variable is set to $p$g so that
COMMAND.COM will display the current path as part of the system prompt.)
If the user attempts to select a different drive and a path with one command,
he will find that apparently nothing has happened:
C:\>cd d:\work
C:\>...
Actually, the system has selected the specified path to be active on the
specified drive, but the specified drive is not selected to be current! The
system maintains a current working directory for each logical drive. If the
user were then to select that other drive, i.e.,
C:\>d:
D:\WORK>...
then the selected path would show up as the current directory.
Another "missing" feature in the standard command environment is a simple
directory-name aliasing mechanism, so that one can switch quickly to
commonly-used directories even if the path name happens to be lengthy. MS-DOS
does provide a simplistic facility (the subst command) to relate an arbitrary
path to a new drive designator, but subst isn't really adequate: the alias
name is limited to a single letter and there is no facility for viewing all
active assignments. I would prefer to have the ability to assign arbitrary
mnemonics to arbitrary paths, and to have those mnemonics be recognized when
specified in cd commands.
I would also like some clean mechanism for instantly switching to the previous
directory -- even if I've forgotten what it was.


The Answer


To address these needs, I wrote an extended CD command that supports combined
drive and path specifications and a companion command that returns the user to
the previous directory (taking the directory specification from information
recorded in an environment variable by the extended cd command). The
cd-replacement stores the old full path name in an environment variable before
switching to a new specified path, and the companion command reads this
environment variable and returns to the original directory upon request. Since
the extended cd must modify its parent's environment, it uses the functions
for modifying the master environment which appeared in the July 1989 issue of
CUJ.
CDE (for CD Extended) works similarly to MS-DOS's cd command, except for the
following special cases:
When both a drive designator and a path name are specified, the specified
drive is immediately selected together with the path.
When the argument is identified as the name of an existing MS-DOS environment
variable, the named variable is assumed to contain a path name to be
substituted as the path to switch to.
In support of the "return to previous directory" feature, I decided to
implement a "directory stack" mechanism. This stack is maintained via
environment variables, and the user may select the naming convention for those
variables by customizing the #define statements in CDERET.H. (See Listing 1.)
One master environment variable (I call it CHAINS) specifies the maximum size
of the directory stack. When CDE is first invoked, CDE checks to see if the
CHAINS variable has been previously defined in the environment; if so, its
current value is used. If not, CDE initializes CHAINS to a default value (also
specified by a definition in the header file). Thus, the user has the option
of setting the value of CHAINS explicitly (using the standard built-in command
set) or allowing CDE to handle the initialization of CHAINS automatically.
(See Listing 2.)
A "stack" of size CHAINS is represented by a set of environment variables
named by a common base name (I use CHAIN) with position numbers appended.
Thus, with CHAINS=3, after several CDEs the environment variables CHAIN1,
CHAIN2 and CHAIN3 would be created to store the pertinent path names in the
environment.
Every time CDE is used to change directories, it "pushes" the old current
working directory "on the stack" by reassigning all the relevant environment
variables. CHAIN1 is always the top of the stack, CHAINn (where n = CHAINS) is
the base. Since there is no disk activity involved, this process is quite
fast.
The RET command (Listing 3) "returns" to the previous directory (either
specified by CHAIN1 or undefined), then "pops" the stack by reassigning all
the active environment variables in reverse order.
As long as CHAINS is greater than 1, then the directory stack behaves as
described above and successive uses of RET unravel the stack.
When CHAINS is set to 1, RET considers this a special case: after returning to
the directory specified by CHAIN1, CHAIN1 is reset to the name of the
directory that was current at the time of the RET call. Thus, repeated uses of
RET with CHAINS equal to 1 effect a "toggle" between two directories.
Depending on the way your system is organized, this toggling mechanism may be
more useful to you than the directory stack mechanism.


Icing


The directory aliasing feature is activated by simply setting an environment
variable to the full path desired, then using that environment variable name
as a parameter to CDE. For example,
C:\>set WORK=d:\project\subproj\
new\testing
C:\>cde work
D:\PROJECT\SUBPROJ\NEW\TESTING>...
As a special case, for convenience, giving the CDE command without any
arguments will cause CDE to look for a special environment variable (I call it
HOME) and switch to the directory it specifies. If you spend much of your time
headquartered at one particular directory, this is an easy way to go back to
it from anywhere in the system, regardless of the state of the directory
stack. The current directory at the time this special form of CDE is given
will, as usual, be recorded in the environment by CDE in case you want to use
RET from the HOME directory.
When setting environment variables in general, be careful not to type spaces
between the end of the variable name and the = sign. DOS would keep the space
as part of the variable name, and things wouldn't work. The CDE program will
handle spaces after the = sign (and before the text) with no problem, but it's
probably safer to be consistent and use no spaces whatsoever.


Implementation


Both CDE.C and RET.C have two phases of operation: the first phase performs
the required drive/directory selection, and the second phase updates the
related environment variables. If the first phase fails, then the programs
exit immediately; there's no need to update environment variables if the
current directory wasn't changed.

To obtain the name of the target directory in phase one, RET simply accesses
the CHAIN1 environment variable. If the variable does not exist, then CDE has
never been run and an appropriate message is displayed. If CHAIN1 exists, it
specifies the target path. CDE.C gets its target path name from the command
line. If the name happens to be the name of an active environment variable,
then the value of the variable with that name is used to obtain the target
path.
The directory selection process itself is identical for both commands and
takes two steps: the selection of the logical drive and the selection of the
desired directory. The drive is selected first; if that fails, we quit and no
harm has been done. Once the new drive has been selected, then the new path is
selected. If that fails, we have to go back and reinstate the original drive.
If it succeeds, we're done with phase one.
Phase two for RET.C is relatively straightforward. If CHAINS is equal to 1,
then the CHAIN1 environment variable is set to the original current directory
name (before phase one) in order to support the toggling feature. For other
values of CHAINS, the directory stack is "popped" by looping to reassign each
CHAINn variable to the value of its next higher counterpart.
CDE's phase two begins by making sure the CHAINS environment variable, used to
specify the stack size, is present and initialized. If it exists, its value is
assigned to the program variable chaincnt. If CHAINS does not exist, then it
is initialized to the default value (specified by the symbolic string constant
DEFAULT_CHAINS).
Finally the directory stack is "pushed" by copying each CHAINn variable (for n
= 1 to CHAINS-1) to its next higher counterpart. CHAIN1 is a special case; it
is assigned to the name of the current directory before phase one was
completed.


Configuration


The following symbolic constants may be changed to suit your own preferences:
CHAINS_VAR The master directory chain size control variable
CHAIN_BASE The "base" name of directory stack variables
DEFAULT_CHAINS The default value for CHAINS_VAR (in quotes)
HOME_NAME The name of the env. variable for home directory
The CDE.EXE and RET.EXE commands should be placed in a directory that is
somewhere in your system search path. (I use c:\bin for all my personal
utilities.)


System-Dependent Functions


The two areas of high compiler-dependency in this application, direct console
I/O and DOS logical drive selection, have been isolated in a separate utility
library named UTIL.C (Listing 4). The only support function required by the
functions in UTIL.C is the bdos function typically supplied with most popular
compiler libraries.
If you need to write the bdos function yourself, the prototype is shown at the
top of the UTIL.C source file. It takes an interrupt (int 21h) function
number, a DX register value, and an AL register value as parameters (although
the AL parameter is not needed for this application). The bdos function can
easily be written in terms of any of the more general operating system
interface functions (int86(), intdos(), etc) you may have available.
To keep the commands' .EXE file sizes as short as possible, all messages are
displayed on the console using direct console I/O calls (through bdos
facilities) so as not to require the file I/O support to be dragged into the
linkages. The UTIL.C functions cputs () and putch () are similar to their
namesakes in the Microsoft library and are provided here for the benefit of
users of compiler packages that do not include these functions.
The setdrive() function I provide is cleaner than Microsoft's _dos_setdrive().
The library functions chdir() and getcwd() are used by the commands and should
be available in your compiler's standard library.
When compiled with optimization, both CDE.EXE and RET.EXE weigh in at just
over 6K, so their load-and-run time is negligible.


Caveats


The following line in your CONFIG.SYS file will insure plenty of environment
space for the CHAIN variables:
shell = c:\command.com /p /e:1500
Due to an as-of-yet inexplicable MS-DOS anomaly, specifying too small a value
for the environment (xxxx in /e:xxxx) may cause the system to hang up after
CDE or RET completes execution. The message I've gotten says something about
COMMAND.COM being "invalid". While this has never been destuctive, it has
required a re-boot of the system. The only way I've found (so far) to avoid
this problem is to allocate plenty of extra environment space. If anyone has a
more "bulletproof" solution, please let us know here at CUJ.
I recommend highly that one modification be made to the Master Environment
package as listed in the 7/89 CUJ: the environment variable name should be
converted to upper case both in the m_getenv() and m_delenv() functions. As
written, only the m_putenv() function converts the name to upper case, and
this causes failure when either m_getenv() or m_delenv are called with
lower-case variable names. To make this change, alter the lines reading:
n = name;
to:
n = strupr(name);
There is one such line near the beginning of both the m_getenv() and
m_putenv() functions.


Linking


The commands to compile and link CDE.C and RET. C using Microsoft C are shown
at the top of the source file listings. I arbitrarily named the master
environment package ENVLIB.OBJ, so including envlib on the qcl command line
links in the object module.


Summary


The CDE and RET commands provide a clean, quick and convenient mechanism for
alleviating some of MS-DOS's command processor limitations. Although there are
plenty of full-blown command processor replacements, shells and
special-purpose TSRs out there (even for free) that offer alternative ways to
"get around" your DOS system, few (if any) of these can offer 100%
compatibility with all other packages and TSRs, zero bytes of system RAM
overhead (unless you count the few extra bytes of environment space required),
and virtually instantaneous gratification. And you even get the source code!

Listing 1
/*
* UTIL.H: Includes and definitions for the CDE/RET
* Directory Navigation utilities.
*/



#define MAX_DIRNAME_SIZE 100 /* longest conceivable directory name size */
#define MAX_EVARNAME_SIZE 20 /* max length of env. var. names created */
#define DEFAULT_CHAINS "1" /* initial default dir. stack size */
#define CHAINS_VAR "CHAINS" /* name of env. var. controlling stack size */
#define CHAIN_BASE "CHAIN" /* base name of env. vars holding dir names */
#define HOME_NAME "HOME" /* Name of 'home dir' environment variable */

/*
* Prototypes for utility functions in CDERET.C:
*/

void error(char *msg);
int cputs(char *txt);
int putch(char c);
int setdrive(int drive_no);
int getdrive();
void change_dir(char *newpath);

/*
* Prototypes for Master Environment Control routines
* (functions from CUJ 7/89)
*/

char *m_getenv(char *name);
int m_putenv(char *name, char *text);
int m_delenv(char *name);


Listing 2
/*
* CDE.C: Extended "cd" command for MS-DOS.
* Written by Leor Zolman, 9/20/89
*
* Features:
* 1) Allows changing to another drive and directory in one step
* 2) Supports directory aliasing through environment variables
* 3) With no arguments, optionally switches to 'home' directory
* (if the HOME environment variable is currently defined)
* 3) Manages a "previous directory" stack through environment
* variables. The number of entries in the stack is dynamically
* configurable through a special controlling environment variable.
* 4) For special case of stack size = 1, toggling back and forth
* between two directories is supported
*
* Usage:
* cde [d:] [path] (changes to given drive/directory)
* cde <env-var-name> (indirect dir change on environment variable)
* cde (changes to HOME directory, if defined, or
* returns current working directory otherwise)
*
* Compile/Link:
* cl /Ox cde.c util.c envlib (where ENVLIB.OBJ is Master Env. Pkg.)
*
* Uses the Master Environment library from CUJ 7/89.
*
*/

#include <stdio.h>

#include <dos.h>
#include <string.h>
#include <stdlib.h>
#include "util.h"

main(int argc, char **argv)
{
char *pathp;
char cwdbuf[MAX_DIRNAME_SIZE]; /* buffer for current dir name */

int chaincnt; /* size of dir stack */
char chaincnt_txt[10], *chaincntp;
char chnevar1[MAX_EVARNAME_SIZE], /* env var names built here */
chnevar2[MAX_EVARNAME_SIZE];
char chndname_save[MAX_DIRNAME_SIZE], *chndname;
char itoabuf[10]; /* used by itoa() function */
int i;

/* Get current dir. name and current drive: */
getcwd(cwdbuf, MAX_DIRNAME_SIZE);

if (argc == 1) /* if no args given, */
if (pathp = m_getenv(HOME_NAME)) /* if HOME directory defined, */
{

change_dir(pathp); /* then try to change to it. */
strcpy(chnevar1, CHAIN_BASE); /* set top-stack env var */
strcat(chnevar1, "1");
if (m_putenv(chnevar1, cwdbuf)) /* to old dir */
error("Error setting environment variable");
return 0;
}
else
{ /* just print current working dir */
cputs(cwdbuf);
putch('\n');
return 0;

}

if (argc != 2)
error("Usage: cde [d:][newpath] or <environment-var-name>\n");

pathp = argv[1]; /* skip whitespace in pathname */

if (chndname = m_getenv(pathp)) /* if env-var-name given, */
pathp = chndname; /* use its value as new path */

change_dir(pathp);

/* Read or initialize master chain length variable: */
if ((chaincntp = m_getenv(CHAINS_VAR)) == NULL)
if (m_putenv(CHAINS_VAR,
strcpy(chaincntp = chaincnt_txt, DEFAULT_CHAINS)))
error("Error creating environment variable");

/* Update the environment directory chain: */
chaincnt = atoi(chaincntp);
for (i = chaincnt; i > 0; i--)

{ /* construct name of previous dirname variable: */
if (i != 1)
{
strcpy(chnevar2, CHAIN_BASE);
strcat(chnevar2, itoa(i-1, itoabuf, 10));
}

if (chndname = ((i != 1) ? m_getenv(chnevar2) : cwdbuf))
{ /* copy value of prev. to current */
strcpy(chndname_save, chndname); /* m_putenv() bashes it */
strcpy(chnevar1, CHAIN_BASE);
strcat(chnevar1, itoa(i, itoabuf, 10));
if (m_putenv(chnevar1, chndname_save))
error("Error setting environment variable");
}
}
return 0;
}


Listing 3
/*
* RET.C: Return to previous working directory
* Written by Leor Zolman, 9/89
*
* (companion to CDE.C)
* Uses the Master Environment package from CUJ 7/89
*
* Usage:
* ret (returns to previous directory)
*
* Compile/Link:
* cl /Ox ret.c util.c envlib (ENVLIB.OBJ is Master Environment pkg)
*/

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <dos.h>
#include "util.h"

main(int argc, char **argv)
{

char *pathp;
char cwdbuf[MAX_DIRNAME_SIZE];

int chaincnt;
char chnevar1[MAX_EVARNAME_SIZE], /* env var names built here */
chnevar2[MAX_EVARNAME_SIZE];
char chndname_save[MAX_DIRNAME_SIZE], *chndname;
char itoabuf[10]; /* used by itoa() function */
int i;

/* Get current dir. name and current drive: */
getcwd(cwdbuf, MAX_DIRNAME_SIZE);

if (argc != 1)
error("Usage: ret (returns to last dir cde'd from)");


if ((pathp = m_getenv(CHAINS_VAR)) == NULL)
error("cde hasn't been run yet");
else
chaincnt = atoi(pathp);

/* See if CDE has created any entries: */
strcpy(chnevar1, CHAIN_BASE);
strcat(chnevar1, "1");
if (!(pathp = m_getenv(chnevar1))) /* if so, pathp points to last dir */
error("No previous directory"); /* else no previous dir */

change_dir(pathp); /* change to previous directory: */

/* Update the environment directory chain: */
if (chaincnt == 1) /* special case: record old dir */

{
if (m_putenv(chnevar1, cwdbuf))
error("Error setting environment variable");
return 0;
}

for (i = 1; ; i++)

{ /* get name of current dirname variable */

strcpy(chnevar1, CHAIN_BASE);
strcat(chnevar1, itoa(i, itoabuf, 10));

strcpy(chnevar2, CHAIN_BASE);
strcat(chnevar2, itoa(i + 1, itoabuf, 10));

if (!(chndname = m_getenv(chnevar2)))

break; /* found end of saved chain */

/* copy value of next higher to current */
strcpy(chndname_save, chndname); /* m_putenv() bashes it */
strcpy(chnevar1, CHAIN_BASE);
strcat(chnevar1, itoa(i, itoabuf, 10));
if (m_putenv(chnevar1, chndname_save))

error("Error setting environment variable");
}
return 0;
}


Listing 4
/*
* UTIL.C: Utility functions for CDE/RET package
*
* These function rely on the "bdos" library function
* from your compiler's library. Prototype:
*
* int bdos(int dosfn, unsigned dosdx, unsigned dosal);
*/


#include <stdio.h>
#include <dos.h>
#include <ctype.h>
#include "util.h"

/*
* Print error msg and abort:
*/

void error(char *msg)
{

cputs("cde: ");
cputs(msg);
putch('\n');
exit(-1);
}

/*
* Change to specified drive/path, terminate program on error:
*/

void change_dir(char *new_path)
{
int old_drive;

old_drive = getdrive();

while (*new_path && isspace(*new_path)) /* skip whitespace */
new_path++;

if (new_path[1] == ':') /* if drive designator */
{ /* given, then set drive */
if (setdrive(tolower(*new_path) - 'a'))
error("Can't select given drive\n");
new_path += 2;
}

if (*new_path && chdir(new_path)) /* If path given, set new path. */
{
setdrive(old_drive); /* If error, restore drive */
error("Can't change to given path");
}

}

/*
* DOS functions, written in terms of the "bdos" function:
*/

int cputs(char *txt) /* display msg, console I/O only */
{
char c;

while [c = *txt++)
{
if (c == '\n')
putch('\r');
putch(c);

}
return 0;
}

int putch(char c) /* display a char on console */
{
return bdos(2, c, 0);
}

int setdrive(int drive_no) /* set logical drive. Return */
{ /* non-zero on error. */
int after;

bdos(0x0E, drive_no, 0);
after = bdos(0x19, 0, 0);
if ((after & 0xff) == drive_no) /* low 8 bits are new drive no. */
return 0; /* set correctly */
else
return -1; /* error. */
}

int getdrive() /* return current logical drive */
{
return bdos(0x19, 0, 0);
}






































Dealing With Memory Allocation Problems
Dear Mr. Ward:
I am not much of a letter writer, but after reading the July 89 issue of The C
Users Journal I felt I could save some of your readers a lot of time tracking
down a problem with the Microsoft C, version 5.10 memory allocation routines.
Enclosed is a listing and the output from the program.
This may help Steven Isaacson who is having memory allocation problems using
Vitamin C. I found this problem after a week of tracking down a memory leak
problem in a very large application. My final solution was to write my own
malloc()/free() rountines that call DOS directly. This will let the DOS
allocator do what is is supposed to do. No time penalty was noticed in our
application.
Note if you do write your own malloc()/free() routines, call them something
else! MSC uses these routines internally and makes assumptions about what data
is located outside the allocated area. I always use a malloc()/free() shell to
test for things like memory leaks and the free of a non-allocated block. It
also will give you an easy way to install a global 'out of memory' error
handler.
The code supplied by Leonard Zerman on finding the amount of free space in a
system is simplistic and very limited. A better routine would build a linked
list of elements and then the variable vptrarray could be made a single
pointer to the head of the list. The entire routine becomes dynamic, much more
robust, and there is no danger of overflowing a statically allocated array.
See the supplied code for an example.
The linked list implementation has the side effect that it will work on a
virtual memory system. Why you would want to do this is beyond me, but it
could be considered a very time consuming way to find out what swapmax is set
to on a UNIX system.
If you have any questions, please contact me. My phone number is (408)
988-3818. My fax number is (408) 748-1424.
Sincerely yours,
Jim Schimandle
Primary Syncretics
473 Sapena Court, Unit #6
Santa Clara, CA 95054
Editor's Note:
If you couldn't find "Listing 1" in last month's "We Have Mail", you needn't
fear the onset of any perceptual disorder -- there was no Listing 1. Usually
publishers blame this kind of problem on someone else -- the printer, the
typesetter, the proofreader, the paste-up artist. Unfortunately this publisher
doesn't have any convenient scapegoats; I pasted up the letters section
(something I often do), and failed to include the listing.
Anyway, here is the original letter and the promised listing. This time it
will be right -- my staff is doing it.
--rlw

Listing 1
/*----------------------------------------------------------------------
++
membug.c
Demonstrate MSC malloc() large size problem

Description

membug.c demonstrates a problem that occurs when Microsoft C,
version 5.10 is used to allocate and free large blocks of
memory.

If this program is compiled and run, you will find
that the first list will have significantly more memory
allocated to it. The second list will only have 1 to 2
elements allocated to it, depending on your memory layout.

The basic problem is that MSC never deallocates a DOS
allocated memory block, even if the memory call is about to
fail. Thus, the first list causes the MSC runtime to allocate
memory in 48K blocks. When the first list is freed, the
48K blocks remain. Then, when the second list is allocated,
there are only 2 blocks that DOS can carve the 60K blocks
from: the default memory segment and the last DOS memory block.
The default memory segment is 64K, so we should always get
an allocation from it. The last memory block can be expanded
by DOS to fit the 60K request if your memory layout will allow
it.

Note that if you reverse the order of memory requests, both
will return the same number of memory blocks because the 48K
requests will fit in the 60K blocks.

Compilation

Compilation is under Microsoft C, version 5.1 using the command:

c1 /W3 AL membug.c


Execution

Execution of the program should use the command line:

membug > membug.out

+-

$Log$

--
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dos.h>

/* Local definitions */
/* ----------------- */
#define FIRST_ALLOC_SIZE 48000
#define SECOND_ALLOC_SIZE 60000

/* Memory allocation list structure */
/* -------------------------------- */
typedef struct mb /* Memory list node */
{ /* ---------------------------- */
struct mb * mb_next ; /* Pointer to next block */
char mb_data ; /* Start of data area */
/* Actual data area size is */
/* determined by runtime */
/* malloc() argument */
} MEM_BLOCK ;

/* Pointer conversion macros */
/* ------------------------- */
#define FARPTR_SEG(a) ((int) (((unsigned long) (a)) >> 16))
#define FARPTR_OFF(a) ((int) ((long) (a)))
#define MAKE-FARPTR(seg,off) ((void far *) ((((long) (seg)) << 16) +
(of)))

/* Function prototypes */
/* ------------------- */
void main() ;
void DOS_Mem_Display(char *) ;

/*--------------------------------------------------------------------*/

+
main
Entry point for MSC dynamic memory test

Usage

void
main()

Parameters


None

Description

main() is the entry point for the Microsoft C dynamic memory
test. The function allocates a list of FIRST_ALLOC_SIZE
elements, frees the first list, allocates a second
list of SECOND_ALLOC_SIZE, and frees the second list. The
statistics printed out are the total bytes allocated by each
allocation and a dump of the DOS memory list after each
allocation/free.

Notes

None

-
*/

void
main()
{
MEM_BLOCK * list ;
MEM_BLOCK * p ;
long first_size ;
long second_size ;

/* Allocate list using first allocation size */
/* ----------------------------------------- */
list = NULL ;
first_size = 0;
while ((p = (MEM_BLOCK *) malloc(FIRST_ALLOC_SIZE)) != NULL)
{
p->mb_next = list ;
list = p ;
firstsize += FIRST_ALLOC_SIZE ;
}

/* Print first allocation results */
/* ------------------------------ */
printf("***** First allocation - %ld *****\n\n", first_size) ;
DOS_Mem_Display("After first allocation/n") ;

/* Free first list */
/* --------------- */
while (list != NULL)
{
p = list ;
list = list->mb_next ;
}

DOS_Mem_Display("After first free\n") ;

/* Allocate list using second allocation size */
/* ------------------------------------------- */
list = NULL ;
second_size = 0 ;
while ((p = (MEM_BLOCK *) malloc(SECOND_ALLOC_SIZE)) != NULL)
{

p->mb_next = list ;
list = p ;
second_size += SECOND_ALLOC_SIZE ;
}

/* Print second allocation results */
/* ------------------------------- */
printf("***** Second allocation - %ld *****\n\n", second_size ;
DOS_Mem_Display("After second allocation\n") ;

/* Free second list */
/* ---------------- */
while (list != NULL)
{
p = list ;
list = list->mb_next ;
free (p) ;
}

DOS_Mem_Display("After second free\n") ;
}

/*--------------------------------------------------------------*/

DOS_Menu_Display

psp_seg = *(p+1) + ((*(p+2)) << 8) ;
blk_paras = *(p+3) + ((*(p+4)) << 8) ;
size = ((long) blk_paras) << 4 ;
if (psp_seg == 0)
{
prg = (unsigned char far *) "(free)" ;
total += size ;
}
else
{
ip = (unsigned int far *) MAKE_FARPTR(psp_seg, 0x2c) ;
prg = MAKE_FARPTR(*ip, 0) ;
while (*prg != '\0')
{
prg += strlen((char *) prg) + 1 ;
}
prg += 3 ;
}
sprintf(str, "%5d %91d %p", idx++, size, p) ;

printf("%s\t%s\n", str, prg) ;

if (*p == 'z'}
{
break ;
}

p = MAKE_FARPTR(FARPTR_SEG(p) + blk_paras + 1, 0) ;
}

sprintf(str, "Total Free: %ld", total) ;
printf("%s\n\n", str) ;
}


/*--------------------------------------------------------------------*/





























































Standard C


Quiet Changes, Part I




P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standard committee.


A language standards committee can commit a variety of sins. It can eliminate
existing features, so that existing programs that use them generate
diagnostics with new translators. It can add lots of new features, so that
existing programs trip over them and generate diagnostics. It can even
redefine existing features, so that existing programs apparently misuse them
and generate diagnostics.
All of these are nasty things to do. A committee that indulges in such sins
had better be prepared to justify its actions. Discarded features must be
arguably dangerous, or at least not worth the clutter they cause by remaining
in the language. Added features must fill a real need and not add to the
clutter. Changed features require the most justification of all, since they
cause the greatest disturbance.
So long as changes cause diagnostics, however, you can live with them. Even if
you have to convert half a million lines of existing C code, you know how to
proceed. Stuff your code through the new translator and see where it gripes.
For very common gripes, you can often contrive a global edit that will
mechanically fix up the code. For the rest, you at least have your attention
forcibly directed to the areas where you must manually intervene.
The worst sin of all for a language standards committee is to make a change
that does not cause a diagnostic. You have a working program with your
existing C translator. You upgrade to a standard C compiler and your program
quietly recompiles. The only problem is, it behaves differently. That is a
project manager's worst nightmare.
Even if you generally like the new behavior, you have a serious problem on
your hands. That half a million lines of existing C code may change its
behavior in only a handful of places.You cannot rashly assume that the new
behavior is acceptable every place. (Probably it is not.) You need to locate
every place and check the implications of the change.
Committee X3J11 dubbed such alterations "quiet changes." We blanched every
time we faced the prospect of introducing one. We did our best to avoid them.
Nevertheless, we occasionally found compelling reasons to adopt quiet changes
along with various other subtle but noisy changes. So we made sure that we
documented every quiet change we made in the Rationale that accompanies the
Standard.
I discussed the most ambitious of these quiet changes last year. (See
"Standard C Promotes Types According to Value Preserving Rules," CUJ August
'88.) The rules for mixing signed and unsigned integer operands in an
expression were, in the past, both subtle and varied. The Committee discussed
the different approaches at length before choosing a particular set of
"promotion" rules. I did my best to present all the arguments and to justify
the choice we eventually made.
This column and the next endeavor to summarize all of the quiet changes made
in Standard C. They may not affect you because there have been numerous
dialects of C in past practice. (That's a principal reason for making a
language standard, to eliminate dialects.)
We labeled something a quiet change if any significant dialect of C quietly
changed meaning. The change may not affect your favorite dialect.
Nevertheless, you should be aware of any possibility of a quiet change in C
code. Who knows, you may already have a lurking problem in code moved from a
different implementation of C.
In the explanations that follow, I have copied the description of each quiet
change almost verbatim from the Rationale for Standard C. They appear in the
same order as in the Rationale, which reflects the order of topics presented
in the Standard.


The Quiet Changes


"Programs with character sequences such as ??! in string constants, character
constants, or header names will now produce different results."
For example,
printf ("You said what??!\n");
quietly becomes
printf ("You said what\n");
This is the result of introducing trigraphs. The committee felt a compelling
need to provide a way to represent certain characters unavailable in EBCDIC or
the invariant subset of ISO 646. (The characters are [\]^{/}~#.) The alternate
forms had to be representable using just the common subset of characters. They
also had to be usable within character constants, string literals, and header
names. Since existing programs can conceivably contain an arbitrary sequence
of characters in these places, we had no way to satisfy these basic
requirements without introducing the possibility of a quiet change.
We settled on trigraphs, or three-character sequences, as a compromise.
Digraphs might be easier to type, but were more likely to change the meaning
of older programs. (C uses all of the characters in the subset, so even code
outside quotes and headers is endangered.) Each trigraph begins with two
question marks, to minimize the chance of a quiet change. It ends with a
character from the subset that is designed more or less to suggest the
replacement character.
Nobody pretends that ??< is a highly readable alternative to {. But then
nobody prevents you from filtering your C code before you send it to a
printer. (You might, for example, overstrike a left parenthesis and a minus
sign to print a left brace instead of printing the actual trigraph.) Trigraphs
serve the limited purpose of providing a minimal interchange standard for
shipping C between countries. (Even the Danes, who are adamant that trigraphs
are insufficient, have offered no alternative to their use within quotes and
header names.)
"A program that depends upon internal identifiers matching only in the first
(say) eight characters may change to one with distinct objects for each
variant spelling of the identifier."
For example,
int get_stuff_DEF;

f() {
extern int get_stuff_REF;

return (get_stuff_REF); }
A clever programmer may expect that all the names beginning with get_stuff
refer to the same data object. That is no longer true.
There was widespread support for longer names in C. The eight-character
significance limit inherited from Ritchie's original implementation is
certainly inadequate. Worse, implementations differed on the treatment of
"insignificant" characters in a name. (Is an implementation obliged to ignore
the extra characters when comparing names? Or is it merely permitted to ignore
them?) Further confusing the issue was the distinct, and more severe, limit on
external names imposed by old-fashioned linkers.
The committee decided on a three-tiered limitation on names. First, any name
can be as long as a logical line. An implementation can choose to inspect all
characters when comparing names. Second, an implementation must inspect at
least the first 31 characters. It can choose to look at no more than 31
characters. Finally, an implementation may require that external names differ
in the first six characters, and ignore case distinctions.
These rules were adopted despite a few notorious cases cited of existing
programs that would quietly change. It seems that some implementations ignore
characters after the first eight. Some programmers have made a practice of
intentionally punning by writing distinct names that are intended to compare
equal. I don't recall the rationale for this practice and I don't care. The
practice is sufficiently barbaric that it garners little sympathy, even if it
can be the victim of a quiet change.
"A program relying on file scope rules may be valid under block scope rules
but behave differently -- for instance, if d_struct were defined as type float
rather than struct data in the following example:"
typedef struct data d_struct {
/* ... */ };

first() {
extern d_struct func();
/* ... */
}


second() {
d_struct n = func():
}
(This example from the Rationale is not wonderful. I even had to fix a small
bug in reproducing it here.)
At issue here is the clash between C as a block scoped language and C as a
"flat" language with separately compiled modules. The former requires that
names be forgotten at the end of the scope in which they are defined. The
latter requires that external names be remembered and matched up across
separate compilations.
Past implementations differ widely on the treatment of extern declarations
within function bodies. Do such declarations percolate out, a block at a time,
to file level so they can be matched up with any other file-level declarations
for the same name? Or does each such declaration form a worm-hole out to the
linker, with the worm-hole forgotten at the end of the block? Or does
something even more bizarre occur?
The example above can give different results with different interpretations.
In the first case, the declaration of func percolates out from the first
function. It is then visible within the second function, so the assignment
makes sense.
In the second case, the declaration of func goes out of scope at the end of
the first function. The second function must assume that func is implicitly
declared as an external function returning int. In this case, you get a
diagnostic. But change the type definition to float, as the Rationale
suggests, and you get a quiet (but erroneous) conversion across the
assignment.
Like the previous issue on identifier lengths, here is a case where a quiet
change is essentially unavoidable. Existing dialects differ too much for the
standard to contain a common subset of behavior. What the committee chose, in
fact, was the second behavior. C is a block structured language with holes
blown in it.
A translator can diagnose conflicting external declarations within a
translation unit. It can also elect not to do so, since this is a case of
"undefined behavior." A linker can diagnose conflicts between separate
compilations. It can also elect not to do so. In practice, most compilers and
few linkers will choose to diagnose such conflicts.
"Unsuffixed integer constants may have different types. In K&R, unsuffixed
decimal constants greater than INT_MAX, and unsuffixed octal or hexadecimal
constants greater than UINT_MAX, are of type long."
For example, on an implementation where type int occupies 16 bits,
f(32768); /* argument now 16-
bits */
i = OxFFFFF / -10; /* divide
now unsigned */
This is part of the fallout of choosing value-preserving rules for promoting
types in expressions (discussed later). The committee felt obliged to tidy up
the typing rules for integer constants, to maintain a consistent philosophy
toward preserving the expected value of a sub-expression.
Ritchie's original rules required that 32768 have type long on an
implementation where type int occupies 16 bits. That led to occasional
surprises, particularly when writing arguments on function calls. (There were
no function prototypes in those days to fix up or diagnose improper argument
types.) With value-preserving promotion rules, however, you get the expected
result more often by making 32768 type unsigned int. And such a choice is more
consistent with the basic philosophy of choosing the "cheapest" type that
preserves the value of an expression.
Similarly, octal and hexadecimal integer constants are expected to be
unsigned. It is silly for one to lose its unsignedness just because its value
is too large to be represented as type int. Consistency requires that 0x10000
(on an implementation where type int occupies 16 bits) have type unsigned long
instead of long.
In both cases, you can contrive programs that quietly change meaning with the
change of typing rules for integer constants. The committee felt, however,
that such programs were already at risk in being moved among existing
dialects, which supported a variety of promotion rules.
"A constant of the form '\078' is valid, but now has different meaning. It now
denotes a character constant whose value is the (implementation-defined)
combination of the values of the two characters '\07' and '8'. In some
implementations the old meaning is the character whose code is 078 == 64."
This is a consequence of now disallowing the digits 8 and 9 in octal escape
sequences. Even the earliest C compilers have tolerated the practice, and more
than a few programs have taken advantage of this tolerance. Nevertheless, the
committee felt it was sufficiently barbarous that it had to be dropped. (The
committee did not revoke the even more barbarous license to write 111l in
place of 111L.)
"A constant of the form '\a' or '\x' now may have different meaning. The old
meaning, if any, was implementation defined."
For example,
char letter = 'a';

if (letter == '\a') /* no longer
same as 'a' */
The backslash is no longer ignored in front of an arbitrary letter. Worse,
Standard C now gives special meaning to \a.
The committee felt obliged to add to the list of character escape sequences.
The sequence \a stands for the "alert" character. In ASCII, it is the BEL code
that rings the bell on old Teletype terminals and makes some sort of
electronic beep on modern ones. The sequence \x signals the start of a
hexadecimal escape sequence of arbitrary length.
Neither of these escape sequences was officially defined in the past. There
was the general promise that a backslash before a character with no magic
meaning simply stood for that character. (I had, in fact, written a number of
strings that used \x as a place holder to be filled in. That was my tough
luck.) Nevertheless, the addition could cause a quiet change.
"A string of the form "\078" is valid, but now has different meaning."
See above for the same issue with character constants. The only difference is
that the string literal gets longer. Character constants pack all the
character codes into a single int value, in an implementation-defined manner.
"A string of the form "\a" or "\x" now has different meaning."
See above for the same issue with character constants.
"It is neither required nor forbidden that identical string literals be
represented by a single copy of the string in memory; a program depending upon
either scheme may behave differently."
For example,
char *s = "abc";
.....
if (s != &"abc"[0])
printf("s has changed\n");
The printed message is correct only if both instances of "abc" become the same
data object. This is not guaranteed in Standard C.
Here is another case where existing dialects of C were in conflict. Some
dialects guarantee that identical string literals are represented by a single
copy within a translation unit. Others guarantee that each string literal
occupies distinct storage.
The committee chose to leave the choice up to the implementation. It is
"unspecified," so the implementation need not document the choice or even be
consistent in how it chooses. (Another example of unspecified behavior is the
order in which a program evaluates multiple arguments on a function call.)
Naturally, any program that depends on some particular behavior is likely to
be disappointed by some conforming implementation.
"Expressions of the form x=-3 change meaning with the loss of the old-style
assignment operators."
For example,
i =-3; /* now stores -3 */
It has been many years since UNIX C reversed the assigning operators. Where
now you write -= you once wrote =- as in the example above. Programmers who
are stingy or haphazard with spacing around operators got burned often enough
that Ritchie switched C to match the Algol 68 convention. Nevertheless, a
number of commercial C compilers retained the old forms for backward
compatibility with early C code.
Disallowing the old forms can, of course, lead to all sorts of nasty puns.
Those who didn't bite the bullet back in the seventies must do so now.


Intermission


That's about half of the quiet changes documented in the Rationale for the C
standard. Tune in next month for the rest of the story.



































































Doctor C's Pointers (R)


Header Design And Management




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091
or via UUCP at uunet!aussie!rex.


All too often, programs just "happen." There is little if any serious design
done, and programmers "design on the fly", using an approach I call stepwise
refinement. That is, you code a bit and test it then iteratively refine it
till it's somewhere close to what you think you want. And after you have
hard-coded the same macro definitions and function declarations in ten
different places you think perhaps it would be a good idea to create a header
instead. However, this either doesn't get done or it's done at the local level
to solve just the particular problem in the code you are currently working on.
For the most part, I find people program defensively.
Designing and managing headers is an integral part of a C project design. It
must be done before any code is written to ensure that the design is
consistent, can be managed easily, and that a high degree of quality assurance
can result. The lack of properly designed headers is a likely recipe for added
development, debugging, and maintenance time, as well as significantly reduced
reliability.
There are many aspects to designing headers. In this article I will look at
those I've recognized. However, before I begin, a definition of the term
header is in order. I think you all know what a header is but for the purposes
of this discussion, I will consider a header to be a collection of
declarations that can be shared across multiple source files via the #include
preprocessing directive. And while a header is typically represented as a
source code file on disk, it need not exist as such. For example, a header
might actually be built into the compiler (at least the standard ones like
math.h could be) or it could be compiled into some binary form that the
preprocessor can more easily or efficiently handle. The specific
representation details are left to the implementer's choice and will not be
further discussed here. As such, I prefer to use the term header rather than
header file or include file since the last two names imply a file
representation. Whatever term you use, be consistent.


Header Categories


There are four categories into which headers can be classified: standard,
system, library, and application.
A standard header is one of the 15 defined by ANSI C, such as stdio.h, math.h,
and string.h. ANSI requires you to include standard headers using the notation
#include <header.h>. Do so even if #include "header.h" appears to work for
them. A standard header is stored in some special place such that it can be
accessed from all places in which a source file can be compiled.
A system header is one supplied by the compiler vendor that can be used to
interface to and/or exploit the host hardware and/or operating system.
Examples on MS-DOS systems include bios.h and dos.h; on VAX/VMS, headers
rms.h, rab.h, and fab.h are used to access the RMS file system; and on UNIX,
the special set sys\*.h is provided. An implementer can provide as many system
headers as he needs. VAX C, for example, comes with about 200. Since system
headers are useful to all applications, they are typically stored in the same
place as standard headers.
A library header is one provided with a third-party library such as a windows,
graphics, or statistical package. Again, a product may include many headers
and you may use a number of different libraries in the same application.
Library headers are also universally shareable and will likely reside with
standard and system headers.
An application header is one you design for a particular application and as
such, it should be located in a place separate from headers in the other three
categories. It is possible, however, that over the course of designing an
application, you build a header that is useful beyond the life of the current
system. This header then, should really be treated as a miscellaneous library
header. If each programmer on the project develops his own private
miscellaneous headers naming conflicts can easily arise, so you must ensure
that private headers are not used.
During testing stages of a project, it can be very tempting to provide a quick
(and often dirty) fix to a given problem by simply changing a header and
recompiling the offending source module. However, this can cause other nasty
side-effects later on when the system as a whole is rebuilt. Also, you must
never, never, ever even think of changing a standard, system, or library
header; these are sacred. For example, you might discover you need macros
called TRUE and FALSE in several modules and since stdio.h is included in all
of them, why not simply add definitions for these macros to that header?
Afterall, it can't hurt any existing uses of these headers, can it? Apart from
reflecting bad style when you next (re)install the compiler, these changes are
lost. One solution to this is to make all headers, including application
headers that have been moved to production, read-only. That way, if you should
ever try to change or overwrite them you are reminded of the seriousness of
such an action.


Header Names


ANSI C requires the standard header names to be written in lower case. Do so
even if your file system is case insensitive (as is the case with MS-DOS and
VAX/VMS.) In fact, ANSI does not require that filenames of the form header.h
be supported by your file system. The compiler must accept #include <stdio.h>,
but is allowed to map the period or any other part of that header name to
other characters.
The convention of naming headers with a .h suffix is exactly that, a
convention and need not be followed by user-written headers. Certainly, it's a
useful default convention if you have no good reason to do otherwise.
If you wish to port code, keep in mind that the length of significance, case
distinction, and format of filename (assuming a header is a file), are all
implementation-defined.
It is generally considered bad style to specify device and or directory
information in a header name. Considering that almost all compilers provide
compile-time options and/or environment variables to specify an include search
path, I see no reason to unduly reduce your flexibility options.


Header Contents


Just what should go in a header and how big should headers be? It is
relatively easy to answer the "what." If something cannot be shared, it does
not belong in a header. For the record, candidates for inclusion in a header
are: macros, typedefs, templates for structures, unions, and enumerations, and
function prototypes, extern data declarations, and preprocessing directives.
Placing anything else in a header needs careful scrutiny. In particular,
including executable code that is not inside a macro definition is very bad
style.
My rule of thumb is to put all related stuff together in one header. However,
if that makes for a very large header and the contents can easily be broken
into logical subsets, then I prefer each subset be in its own header. It's
useful to give such headers names with the same prefix so you can easily
determine they are related. The only difference here is whether the
preprocessor has to process one big header instead of just those parts it
needs. Don't get too hung up on worrying how much work the preprocessor has to
do unnecessarily since that's what CPU cycles are for. In fact, in the extreme
case where you put each declaration in its own header, the preprocessor won't
need to do any extra work, except for opening and closing all those headers.
It's quite likely that, while most things will fit neatly into related groups
each in a header, some miscellaneous bits will be left over. About the only
way to handle these reasonably is a miscellaneous header. ANSI C has one of
these, called stddef.h. Whatever organization you chose, everything that can
be shared should be shared. That is, you should make sure that all macros,
function prototypes, etc., are part of some header and not hard-coded in
source files directly.
Each header should be self-contained. If one header refers to something in
another header, the first should directly include the second. Forcing the
programmer to know and remember the order in which related headers need be
included is burdensome and unnecessary.


Protecting Header Contents


It is very likely that in some source modules you will include the same header
multiple times, once directly and one or more times indirectly via other
headers. Since everything in a header is supposed to be shareable, there
should be no problem in processing the same header multiple times except the
extra work of preprocessing. Right? Well, that's not quite true. Specifically,
if the same typedef or structure, union, or enumeration template definition is
seen more than once, the compiler produces an error so they must be somehow
protected. The best way to achieve this is to place a conditional compilation
protective wrapper around the whole header as follows:
/* header local.h */
#ifndef LOCAL_H
#define LOCAL_H
...
#endif

I prefer to use a macro spelled in upper case the same as the header, along
with a suffix of _H. This naming convention is easy to understand and is very
unlikely to be used for other macros elsewhere in the set of headers. Using
something like LOCAL could easily be used as a different macro elsewhere,
leading to confusion.
Since the standard headers can also be included multiple times and some of
them contain typedefs and structure templates, these too must be protected.
Check those provided with your compiler to see if they indeed are protected.
The only difference between your wrapper and that used by the standard headers
is that you must not begin your private macro name with an underscore while
they must, since that's the implementer's namespace.
It is preferable to have each thing defined in one, and only one, header.
However, for various reasons it may be desirable to duplicate something in
multiple headers. The problem here is to make sure that all of those headers
containing duplicates can be included at the same time. For example, consider
the case of having a typedef for count in two headers as in Listing 1.
You should also check your standard headers for this kind of protection since
size_t, the type of the sizeof operator, is required to be typedefed in five
of them. Note that ANSI C places strict rules on whether a standard header can
include another standard header. For example, most identifiers defined in a
standard header are only "reserved" if their parent header is included. For
example, if you don't include one of the six standard headers that define
NULL, you are perfectly safe in defining your own identifier NULL even though
it would be bad style. So, if assert includes stdio.h, all the names in
stdio.h would become defined as well, even though they are not defined in
assert.h. And while assert.h could contain #undefs to remove these, there is
no way for it to remove any typedefs or template definitions.
Many mainstream compilers claiming ANSI conformance or claiming to be tracking
the ANSI standard break this rule. As such, they are not ANSI-conforming.
Check your standard headers for this.


Conditional Inclusion


There are a number of ways to conditionally include headers as necessary.
Perhaps the best is to conditionally compile a subset of #include directives
inside a header, based on the existence or value of a macro defined using a
compiler option. That is, the compilation path is specified outside all source
modules. This way, you can trigger any possible conditional compilation path
from as few as one macro.
You also have the ANSI invention of #include macro where macro must expand to
a header name of the form <...> or "...". You also can use the stringize and
token pasting preprocessor operators # and ## respectively, to construct a
macro that is to expand to a header name.
I have also found that it is a good idea to remove as many preprocessing
directives as possible from source modules into headers. In particular, I find
conditional compilation directives in source code to be most distracting,
especially when there are more that two compilation paths. The aim is to
isolate such dependencies into headers so you can forget about them and get on
with the business of implementing or maintaining the application. An example
of this strategy follows:
#if TARGET == 1
fp = fopen("DBAO:[direct]master.date",
"r");
#else
fp = fopen("A:\direct\master.date",
"r");
#endif
This can be implemented in a much clearer way by abstracting the filename into
a header as in Listing 2.


Planning For Debugging And Maintenance


People who don't design programs are unlikely to plan for debugging and
maintenance. They probably don't even write a shopping list for that matter.
Unfortunately, there are lots of these people programming, many of them in C.
It is very naive and probably irresponsible to believe that with a non-trivial
program, debugging will be a mere formality and that you will always be around
to maintain the code.
Over the years I have found it a useful idea to include a header called
something like debug.h into every source file I write when working on a
non-trivial project. If the header is empty, that's fine. However, it makes it
very easy to add or change that header's contents and recompile all or part of
the system for testing. Since you have one header included everywhere, it is
trivially easy to make powerful changes and to experiment. And the cost of
having this flexibility is practically nothing, if you cater for it at the
beginning.


Concatenating Headers


There are always people who try to stretch a language's capabilities to the
extreme. For example, they place part of a source file in one header and the
rest in another and include them both to form a valid source module. Cute, but
very bad style.
Let's look at just what can and cannot be split across multiple source
modules, and therefore across multiple headers. A source module must contain
complete tokens. That is, a source token cannot be split across two files.
Specifically, the notation of backslash/new-line continuation cannot be used
in the last line of a source file. Likewise, a comment cannot span two files.
With string literal concatenation now supported by ANSI, you could have a
string in one file concatenated with a string in another, but that would
require the strings to be outside a macro definition and I have already said
that's very bad style. You could also split a structure template definition
across multiple files, but I see no benefit.
One thing not immediately obvious in ANSI C is that each matching set of
#if/endif and corresponding #elif and #else directives must be contained
within the same source file. That is, the #if and matching #endif directives
must be in the same source file.


Conclusion


I have addressed many issues here most of which have arisen from my own
experiences. I am sure there are others that could be added. For the most
part, I find header design to be simply a matter of common sense once you know
and understand the tools the language and preprocessor provide. But then
again, I find that to be pretty much the solution to a vast number of
problems. It's sad that common sense is not all that common.

Listing 1
/* h1.h */

#ifndef H1_H
#define H1_H
...

#ifndef COUNT_T
#define COUNT_T
typedef unsigned int count;
#endif

...
#endif
/* h2.h */


#ifndef H2_H
#define H2_H
...

#ifndef COUNT_T
#define COUNT_T
typedef unsigned int count;
#endif

...
#endif

#include "h1.h" /* count defined */
#include "h2.h" /* count not redefined */


Listing 2
/* files.h */

#if TARGET == 1
#define MASTER_FILE "DBAO:[direct]master.date" #else
#define MASTER_FILE "A:\direct\master.date"
#endif

/* source.c */

#include "files.h"

...
fp = fopen(MASTER_FILE, "r");
































On The Networks


Games And Tongues




Sydney S. Weinstein


Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and President
of Datacomp Systems, Inc., a consulting and contract programming firm
specializing in databases, data presentation and windowing, transaction
processing, networking, testing and test suites and device management for UNIX
and MSDOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron
Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the
Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do
Internet addressing).




RPN Fans - Here's One For You


Before I took over David Fiedler's column, he mentioned in his last
installment the ultimate on-screen calculator for UNIX systems. Here now is a
simpler one, usable on any system that has a curses package or emulation
library. It emulates the HP-16C and can popup on both UNIX and MS-DOS. Support
for floating point, hexadecimal, decimal, octal and binary modes is provided.
The calculator, written by Emmet Gray of the US Army, has ten registers and
supports computer-oriented functions. It was posted to comp.sources.misc and
is available from the archive sites that support that group, including uunet.


New Games


New versions of several games were distributed recently in comp.sources.games.
These include version 4 of Conquer, a middle earth multi-player game for UNIX
systems. Source to the game itself, as well as the patches, is available from
the archive sites for comp.sources.games, including uunet. Conquer v4 patches
are volume 8, issues 1 - 4.
Nethack has also had a major update in comp.sources.games volume 8, issues
6-12. New screens and enhancements were added to this display-oriented
dungeons and dragons game.
Galactic Bloodshed, an empire-like war game has also been upgraded this month
in comp.sources.games, volume 8, issues 26 - 30. This upgrade gives several
new versions to keep those UNIX systems busy.
A new game has also appeared, a two-handed card game similar to Bridge and
Spades (especially two-handed Spades). It's a trick-taking game with a trump
suit determined by bidding. Cards are drawn from the deck, each player taking
a turn drawing one card from the top of the deck. If you desire to keep that
card, it becomes part of your hand and the next card is discarded without
being seen, otherwise you discard it and take the next card. This yields two
thirteen-card hands. Bidding is based on the number of tricks you think you
can take, with the last winner naming the trump. Lastly, the hand is played
out.
Scoring is simple; if the bid is made, you score ten times the bid plus the
number of overtricks. If you go down and don't make the bid, you score
negative ten times the bid. The winner is the first player to 250 points.
The author, Scott Turner from UCLA, has asked for help in improving the
bidding process. He has provided a program with a very interesting set of
bidding options coded as rule based, neural networks, and then a cheating
bidder that reads both hands. However, he is not happy with the outcome and is
asking for help. The program gives ample statistics for tuning a bidding
algorithm and those of you up to a challenge just might want to take him up on
his offer for help.


Back To Work


Several serious works also appeared recently on the networks. For those
diehard fans of vi type editors comp.sources.misc recently distributed
"stevie" (ST Editor for VI Enthusiasts), a public domain clone of UNIX's vi
editor. This version was developed for the Atari ST, but has since been ported
to UNIX, OS/2, DOS and Minix-ST. Unsupported ports also included in the
release include Minix-PC, Amiga, and some Data General systems. Thus, stevie
appears to be extremely portable. Makefiles are included for all the systems.
Stevie's main drawback, for some environments, is that it keeps the file being
edited in memory, limiting the size of the file to be edited for systems with
smaller addressing spaces or without virtual memory. It was originally written
by Tim Thompson, but this latest version was posted by Tony Andrews at
onecom!wldrdg!tony. He also will mail diskettes to those who send him a
formatted disk along with a self-addressed, stamped disk mailer for returning
the disk. He can write Atari ST (SS or DS) or MS-DOS (360K or 1.2M) formats.
His address is Tony Andrews, 5902E Gunbarrel Avenue, Boulder, CO 80301.
Now that Berkeley has released much of its BSD 4.3-tahoe release to the
public, sections of it are being ported to UNIX System V and Xenix. Comsat,
the BSD mail notification daemon, was recently posted to comp.sources.misc.
Comsat sends messages to users when mail is delivered for them. It uses a
daemon approach, and thus does not need to wait for the current command to
complete or the user to type a carriage return to the shell. Also included in
this port are changes to smail v2.5 necessary for it to notify comsat when
mail is delivered. Users control whether or not they get notification using
the biff command, which is also included. Since UNIX System V usually doesn't
support the Berkeley socket interface, this port uses named pipes, so the
notification is limited to the local machine. Those with the socket interface
can use the BSD version of the program. Thanks to David MacKenzie for his
porting effort.


Foreign Tongues?


In volume 8, issues 65-87 comp.sources.misc has distributed a major effort
that will strike people as either a godsend or totally useless. If you need to
print foreign languages with their extended character set support, the "cz
text to PostScript system" is for you. It is a table-driven system that can be
used to convert any "context-free octet-based character set into PostScript."
This means that every character in the character set is represented by one or
more eight-bit bytes and that only the bytes of that character determine what
it prints, not other bytes in the file. This excludes locking shift sequences.
Even if you don't need the foreign language support, the posting had an
addendum called libhoward that includes several C functions to convert numeric
literals to internal representations and perform string manipulation all with
error recovery. It's all documented and worth looking at, even just to see how
he did it, courtesy of Howard Gayle of Ericsson Telecom AB in Sweden
(howard@dahlbeck.ericsson.se).


Yea! Its Back, Maybe?


After a long absence from USENET with no postings, comp.sources.unix
distributed the first program of Volume 20. It is a contribution from Barry
Books at IBM releasing into the public domain an include file tester. This
tester checks include files for POSIX 1003.1 and ANSI compliance. It reports
missing items, additional items allowed by the standard, and additional items
not allowed by the standard. References to the standards documents are also
included in the report. This could prove to be a really useful tool for
portability.
Unfortunately, after this promising posting, comp.sources.unix has been quiet
again. Hopefully, Rich Salz, the moderator, will find time to resume the
postings shortly.


Upcoming Releases



Perl, Larry Wall's Practical Extraction and Report Language, is going though
its beta period on a new version via alt.sources. Version 3 has lots of new
features, and next time I will give an in-depth review of this new release
from one of the net's most respected authors of "Off the Wall Software".
Less, a more replacement (a display pager) is also in beta test with its
newest release. alt.sources is wonderful for hints of what is to come. Many
authors are using it for beta test distributions.
Another major package is also in its latest beta round; the Extended Portable
Bitmap Toolkit appeared recently in alt.sources. This set of tools is used to
convert images from one bitmap format to another. It supports many formats and
again, next time, a more detailed report.
If you have a pending release you would like covered in this column, drop me a
line. My electronic address is syd@DSI.COM and I look forward to hearing from
you.



























































Questions & Answers


malloc, Porting, And Stack Overflow




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


You can fax me your questions at (919) 493-4390. While you hear the answering
message, hit the * button on your telephone. Or you can send me e-mail
kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP).
Q
I was having problems using malloc on a UNIX machine. After allocating some
memory with malloc(), I wrote past the end of the allocated memory. The next
time I called malloc(), it hung up. I ran the same program on an IBM-PC and it
worked fine. What gives?
Jim Campbell
Durham, NC
A
Writing beyond (or before) the memory space that is allocated with malloc and
related functions can cause some serious problems. These functions allocate a
block of memory from the heap (memory space not used for code, data, and
stack). They return the address of the memory block. The memory remains
allocated until you call free(), passing it the address of the block. This
deallocates the block and returns it to the heap. When the program exits, it
will free up any allocations you have for which you have not called free().
These functions look like:
#include <stdlib.h>
void *malloc(size_requested)
size_t size_requested; /* Number of bytes */

void free(pointer)
void *pointer; /* Address of memory to free */
You request an amount of memory in bytes. The function returns to you an
address which points to the first byte of the allocated memory. You can use
this memory for any purpose. However, you should not write in the memory
preceding or following the allocated block.
The operating system and/or the compiler usually use a few bytes of memory
adjacent to the allocated block. These bytes, sometimes called the "block
header", may come before or after the block. The header keeps such information
as the size of the block allocated, and usually some pointers, including one
to the next block (i.e., a linked list). If the information in this block
header is destroyed, the system cannot allocate a new block or deallocate an
old block. Basically, the block looks something like the diagram in Figure 1.
Let's assume that the information is kept after the block, as it appears in
the case of your UNIX machine. You probably did something like:
char *pc;
pc = malloc(100);
...
*(pc + 100) = 0;
...
pc1 = malloc(200);
and overwrote the first byte in the block header. When you attempted the next
allocation, malloc() hung up as you destroyed the block header for the
previous block.
On a PC, the block header typically appears before the allocated memory. In
that case, your program ran okay, as you were simply writing into unallocated
memory, which contains no information.
Depending on the order in which you perform allocations and illegal accesses,
you could still have problems. For example, let's assume that you performed
both allocations first, and then an illegal access:
char *pc;
char *pc1;
pc = malloc(100);
pc1 = malloc(200);
*(pc + 100) = 0;
Assuming that you do not attempt to allocate blocks later on in the program,
this will execute as if no error occurred until the program attempts to exit.
When the operating system tries to free the allocated memory, it will become
confused due to the erroneous block header information. You will get a dreaded
"Memory allocation error -- system halted" message.
With some compilers, malloc() does not call the operating system routine if
the request can be satisfied from its own unallocated buffer. In this case,
you may not see this allocation error, since the exit operations will simply
free all the buffer at once and not the individual pieces.
Q
I am using an array of pointers; each pointer points to a structure; and each
structure contains several strings of various lengths.
My array of pointers is declared something like this:
struct {
char firstname[MAX_FIRSTNAME+1];
char lastname[MAX_LASTNAME+];
char homephone[MAX_HOMEPHONE+1];
char workphone[MAX_WORKPHONE+1];
char areacode[MAX_AREACODE+1];
char street [MAX_STREET+1];
char city[MAX_CITY+1];

char state[MAX_STATE+1];
char comments[MAX_COMMENTS+1];
} *record[MAX_RECORDS];
It follows that I could display each element of the structure that represents
the current record as follows:
show_record ()
{
printf("%s\n",record[current-record]->firstname);
printf("%s\n",record[current-record]->lastname);
printf("%s\n",record[current-record]->homephone);
printf("%s\n",record[current-record]->workphone);
printf("%s\n",record[current-record]->areacode);
printf("%s\n",record[current-record]->street);
printf("%s\n",record[current-record]->city);
printf("%s\n",record[current-record]->state);
printf("%s\n",record[current-record]->comments);
}
However, it seems that much of the code is unnecessarily duplicated. It would
be more efficient if I could create a loop and access a different element of
the structure each time through the loop. My show_record() function would then
look something like this:
show_record()
{
int i:

for(i = 0; i < NUM_OF_FIELDS; i++)
{
printf("%s\n",record[current_record]->??? );
}
}
Where ??? is the part I can't figure out. I could think of ways to do it in
assembly language by providing additional data types and accessing them in the
loop. Since the elements of a structure are usually word aligned, it's hard to
even be sure how many bytes are between each element of the structure.
Again, any information you could provide would be greatly appreciated.
Jonathan Wood
Irvine, CA
A
Accessing individual members of a structure in a loop is a commonly needed
operation. There are several ways that you can do this. Let me change your
structure template slightly and add a tag-type. I normally avoid declaring
variables when declaring a structure template, eliminating the need to declare
those variables when you use the template in another program. A clean
structure template is a handy thing to have around because it makes declaring
variables of the same structure a breeze.
struct s_record
{
char firstname[MAX_FIRSTNAME + 1];
...
};
struct s_record *record[MAX_RECORDS];
You could use a static variable, which will have constant addresses and set up
an array of pointers to those addresses. show_record() might then look like:
static struct s_record print_record;

#define NUMBER_FIELDS 9
char *record_field_address[NUMBER_FIELDS] =
{
&print_record.firstname,
&print_record.lastname,
...

/* Remainder of the fields */
};

show_record()
{
int i;
/* Copy in the record to be printed */
print_record = *record[current_record];

for (i=0; i < NUMBER_FIELDS; i++)
{

printf("%s\n", record_field_address[i]);
}
}
One feature in the new ANSI standard, the offsetof() macro, can help you out
here. Its syntax is:
#include <stddef.h>
offsetof( type, member-name)
The type is a structure type and the member-name is a member in the structure.
Instead of keeping the address of individual members in an array, you simply
keep the offsets from the start of a structure. For example,
#define NUMBER_FIELDS 9
size_t record_offsets[NUMBER_FIELDS] =
{
offsetof(struct s_record, firstname),
offsetof(struct s_record, lastname),
...
/* Remainder of the fields */
};
Now show_record could look something like:
show_record()
{
int i;
char *pc;
pc = (char *) &record[current_record];
for (i=0; i < NUMBER_FIELDS; i++)
{
printf("%s\n", pc + record_offsets[i]);
}
}
Note that the conversion of the address to a char pointer is necessary. If you
simply printed out &record[current_record] + record_offsets[i], you would get
the address of something which is record_offsets[i] * sizeof(struct s_record)
after the beginning of record.
I would suggest that you change the calling sequence of show_record so that it
expects a record (or an address of a record). This way, you can print out
records that are not part of the array (such as a record that might be used
for input purposes).
show_record(record)
/* Prints out a record */
struct s_record record;
{
int i;
char *pc;
pc = (char *) record;
for (i=0; i < NUMBER_FIELDS; i++)
{
printf("%s\n", pc + record_offsets[i]);
}
}
or
show_record(precord)
/* Prints out a record, whose address is passed */
struct s_record *record;
{
int i;
char *pc; pc = (char *) precord;
for (i=0; i < NUMBER_FIELDS; i++)
{
printf("%s\n", pc + record_offsets[i]);
}
}
You might want to be even more organized and create another structure that
contains not only the offsets, but also the names of the members, so that you
can use the same names everywhere you print the record.
struct s_field
{
char name[MAX_FIELD_NAME + 1];
size_t offset;
}


#define NUMBER_FIELDS 9
struct s_field fields[NUMBER_FIELDS] =
{"First name", offsetof(struct s_record,
firstname)},
{"Last name", offsetof(struct s_record, lastname)},
...
/* Remainder of the fields */
};
With this you might have a function like:
show_record_with_field_names(precord)
/* Prints out a record, whose address is passed */
struct s_record *record;
{
int i;
char *pc;
pc = (char *) precord;
for (i=0; i NUMBER_FIELDS; i++)
{
printf("%-20.20s: %s\n",
fields[i].name,
pc + fields[i].record_offsets);
}
}
You should note that elements of a structure are not necessarily word aligned.
On a PC, they can be byte aligned or word aligned. I prefer packed (i.e., byte
alignment) structures, in order to save space, but there is a slight element
of speed in using non-packed structures. Note that the sizeof() operator and
the offsetof() macro take into account any padding bytes (unused bytes due to
alignment). In fact, it is the potential presence of padding bytes that made
the ANSI committee eliminate the equality comparison of structures. For
example:
func()
{
static struct s_record record_1;
struct s_record record_2;

if (record_1 == record_2)
...
}
The padding bytes in record_1 will be set to 0, since it is a static variable.
The padding bytes in record_2 will be garbage, since record_2 is an automatic
variable. You could use the fields array shown above to create a structure
comparison function, if you required it.
Q
How do you make a binary data file that is portable between the MAC and the
IBM PC?
Richard Walton
Wellesley, MA
A
Porting data files between any two systems presents a problem in that the
representation of the numbers varies from computer to computer. A common way
of avoiding this problem is to output the data to an text file using fprintf()
and to read the data on the other machine using fscanf(). For example, on one
machine you would have:
struct_s record
{
int one_number;
double another_number;
};
write_record_to_file(data_file, record)
FILE *data_file;
struct s_record record;
{
int ret;
ret = fprintf(data_file, "%d %lf\n",
record.one_number, record.another_number);
return ret;
}
On the other machine, you would use:
read_record_from_file(data_file, precord)
FILE *data_file;
struct s_record *precord;
{
int ret;

ret = fscanf(data_file, "%d %lf", &(precord->
one_number), &(precord->another_number) );
return ret;
}
If you do not wish to have the overhead of the conversions done by fprintf()
and fscanf(), then you will need to write some specific code. For example,
suppose on an IBM you have written out the records as:
write_record_to_file(data_file, record)
FILE *data_file;
struct s_record record;
{
int ret;
ret = fwrite(&record, sizeof(struct s_record), 1,
data_file);
return ret;
}
On the other machine, you will have to rearrange the bit patterns manually:
#define SIZE_BUFFER 8 /* Size of record on other
machine */
read_record_to_file(data_file, precord)
FILE *data_file;
struct s_record *precord;
{
int ret;
char buffer[SIZE_BUFFER];
ret = fread(&buffer, SIZE_BUFFER, 1, data_file);

/* Now you need to convert each value
individually */
convert_ibm_int_to_mac_int(&buffer[0]ø,
& (precord->one_number);
convert_ibm_double_to_mac_double((&buffer[2],
& (precord->another_number);
return ret;
}
Now each of the individual members must be dealt with separately. The double
conversion is a bit of a bear. As they say in the teaching business, it is
reserved as an exercise for the student. The integer conversion might look
like:
convert_ibm_int_to_mac_int(pibm_number,pmac_number)
char *pibm_number;
char *pmac_number;
{
/* Reverse the byte order */
*(pmac_number) = *(pibm_number + 1);
*(pmac_number + 1) = *(pibm_number);
}
Note that I have simply shown a return value for each of these file functions.
You probably want to be more clever and test the functions so that the return
value is consistent among all the functions. For example, the first function
might look like:
#define BAD_IO 1
#define GOOD_IO 0

write record_to_file (data_file, record)
FILE *data file;
struct s_record record;
{
int ret;
int io_ret;
ret = fprintf(data_file, "%d %lf\n",
record.one_number,record.another_number);
if (ret < 1)
io_ret = BAD_IO;
else
io_ret = GOOD_IO;
return io_ret;

}
Q
I am in the process of implementing hotkey-controlled real-time data
acquisition for some laboratory experiments. This is being achieved by taking
control of the keyboard interrupt number 0x09. My compiler is Microsoft C
v5.1.
The experimental apparatus has three distinct modes of operation: A, B, and C,
which are to begin upon the striking of their respective keys from the
keyboard. Assume that task A, defined by its function, fA(), is currently
executing and that the user now strikes the key to commence task B, similarly
defined by its function, fB(), so that fA() stops and fB() starts. My question
is this: Can you continually interrupt function i and start function j and
expect to escape a stack overflow? How does one handle suspending a function
at an arbitrary time with no a priori intention of returning to it (which
would free the stack space used by the function). I would imagine that you
could do this a few times, but what about suspending A and starting B (or C)
an arbitrary number of times? Perhaps setjmp() and longjmp() are the solution.
Another serious problem that concerns me is that my method does not seem to
admit a way to signal end-of-interrupt to the keyboard handler (or to whatever
is listening). Because the directives to begin execution of function A, B, or
C are embedded in the new 0x09 interrupt handler, the handler could
potentially never finish executing during the experiment. Is there a better
implementation which can achieve what I need and still use hotkeys?
Mark S. Petrovic
Stillwater, OK
A
You are right in your concern over stack overflow. If you keep calling an
interrupt function without clearing up the stack (i.e., with an IRET
instruction), you will eventually run out of stack space. An interrupt
function that might cause overflow could look like the following, where
keyboard_input() is a function that gets the actual keystroke.
control_function()
/* This will only be called if a keyboard
interrupt */
{
int c;

/* Get the key that was hit */
c = keyboard_input();

switch(c)
{
case 'A':
function_a();
break;
case 'B':
function_b();
break;
case 'C':
function_c();
break;
default:
function_default();
break;
}
/***** This function never returns *****/
}

function_A()
{
/* Code to perform function A */
}
function_B()
{
/* Code to perform function B */
}
function_C()
{
/* Code to perform function C */
}
function_default()
{
/* Code to perform default function */
}
Everytime you invoke the interrupt, another set of flags and return addresses
are pushed onto the stack. set_jmp/longjmp provide an appropriate mechanism
for implementing the sort of structure you desire. These two functions allow
you to set a place marker in your code (setjmp) and then jump directly back to
it from another routine (longjmp). Without setjmp/longjmp to report an error
that occurred several levels deep in a program, you could return an error
value at every level as you exit the nested calls. With setjmp/longjmp you can
instead simply jump back to a central error handler and give it the error
value. The function calls are:
#include <setjump.h>
int setjmp(environment)
jmp_buf environment; /* Will hold the place
information */
and
void longjmp( environment, return_value)
jmp_buf environment;

/* Place information from setjmp */
int return_value; /* To be returned to setjmp */
setjmp() returns 0 the first time it is invoked. The calling function can test
this and ignore any error condition. When longjmp() is called, the next C
instruction to be executed is the equivalent of a return from setjmp(). This
returns execution to the place marked by the call to setjmp(). One of the
parameters to longjmp() is a non-zero value which was setjmp()'s return value.
longjmp() cleans up the stack from any nested function calls.
The parameter passed to setjmp() is of type jmp_buf. This variable holds
information regarding the current position of the stack. You can call setjmp()
in many different places and pass it different variables of type jmp_buf. The
value passed to longjmp() determines to which of the setjmp() calls it will
return. The code below gives an indication of how your problem might be
programmed. You would connect this up to the keyboard interrupt.
#include <setjmp.h>
#define TRUE 1
#define FALSE 0
control_function()
/* This will only be called if a keyboard
interrupt */
{
int c; /* Character input */
int ret; /* Return value from setjmp() */
jmp_buf environment; /* For the setjmp */
static int init = FALSE; /* First time through flag */

if (init)
{
/* Stop previous execution */
longjmp(environment, 1);
}
ret = setjmp(environment);
if (ret == 0)
{
/* This is the return from the initial setup */
init = TRUE;
}
else
{

/* This is the return from the longjmp */
;
}
/* Get the key that was hit */
c = keyboard_input();

switch (c)
{
case 'A':
function_a();
break;
case 'B':
function_b();
break;
case 'C':
function_c();
break;
default:
function default();
break;
}
/******* THIS FUNCTION NEVER RETURNS */
}
Alternatively, you could avoid using an interrupt by coding each function to
periodically check for something on the keyboard stack. This approach does
kludge up your lower level functions. However, if the lower level functions
have sections of code that should not be interrupted, then this less elegant
method may be preferable. Two Microsoft (and some other compiler) functions
(not ANSI standard) support this alternate approach. The kbhit() function
returns non-zero if there is a key in the buffer. The getch() function returns
the character in the buffer, without waiting for a carriage return.
#include <setjmp.h>
#define TRUE 1
#define FALSE 0

main()

{
int c;

/* Get the key the first key*/
while (1)
{
c = getch();

switch (c)
{
case 'A':
function_a();
break;
case 'B':
function_b();
break;
case 'C':
function_c();
break;
default:
function_default();
break;
}

} /* End while loop */

function_A()
{

/* Code to perform function A */
/* Inside each loop: */
if (kbhit()) return;

}
function_B()
{
/* Code to perform function B */
/* Inside each loop: */
if (kbhit()) return;
}

function_C()
{
/* Code to perform function C */
/* Inside each loop: */
if (kbhit()) return;
}
Figure 1











































































New Releases


A New Year's Wish List




Kenji Hino


Kenji Hino is a member of The C Users' Group technical staff. He holds a
B.S.C.S. from McPherson College and an undergraduate degree in metalurgy from
a Japanese university. He is currently working toward an M.S.C.S. at the
University of Kansas.




New Releases




CUG299 -- MEL and BP


This volume contains two programs, MEL -- Universal Metalanguage Data
Processor submitted by George Crews (TN), and BP -- Back Propagation for
neural networks by Ronald Michaels (TN).
MEL provides an I/O interface between a program and the user. It can take
input data written in "pseudo-English" and translate it into program
variables. It can also translate a program's variables into pseudo-English.
(See the article on page 33 in this issue.) MEL was originally designed for
use with engineering analysis programs. It was written in ANSI C and was
developed using Microsoft C v5.1. The disk includes MEL source code, a test
example program, sample input and output files, documentation, and the article
and listings from this issue. Since MEL provides only a processor engine, you
need to define your own input and output data format rule (called a
dictionary) for your application program in mel.h.
BP is a simple implementation of the back propagation algorithm as an example
of a neural network. The implementation is based upon the article in Nature,
"Learning representations by back propagating errors" by Rummelhart, Hinton
and Williams. BP employs an adaptive algorithm that converges as result of
learning. BP was developed on an AT clone with a math coprocessor using
Zortech C v1.07. The disk also includes the Hercules graphics version of BP.


CUG300 -- MAT_LIB


Our first volume in the 300's is a shareware package, MAT_LIB -- Matrix
Library submitted by John J. Hughes III (TN).
MAT_LIB includes approximately 50 C language functions and macros which input
and output tabular data maintained in ASCII text files. While the tabular data
is in RAM, it is stored in dynamically-allocated token or floating-point
arrays on the heap.
Functions are provided to examine an ASCII text file to determine the number
of rows, columns, and token size of the tabular data in the file. Other C
macros dimension either a floating-point or string token array large enough to
hold the ASCII data. Once in memory, floating-point array matrix operations
can be performed on the data. Token array data can be converted to and from
float or integer values. Floating-point arrays which have been modified by
calculation can be merged into token arrays for output or they can be output
to a text file directly. The output text files can in turn be used as the
input for later application programs found in MAT_LIB text file formats.
The disk includes a users manual, test programs, example programs, and small
and medium model libraries for Turbo C. The library source can be obtained for
$20 from the author (John Hughes III, 928 Brantley Dr., Knoxville, TN 37923).


CUG301 -- BGI Applications


This volume contains graphics applications that use Borland Graphics
Interfaces (BGI) submitted by three authors, Mark A. Johnson (CO), Henry M.
Pollock (MA), and John Muczynski (MI). All programs were compiled with Turbo C
and use BGI files. The disk includes C source code, executable code and BGI
files.
Mark A. Johnson has created DCUWCU -- a simple application environment that
provides a mouse-driven cursor, stacked pop-up menus, and forms that contain
editable fields and a variety of selectable buttons. The sample program DRAW
allows you to draw lines, circles, and text on the screen using a mouse. A
stacked pop-up menu can be invoked anywhere on the screen (Figure 1). DRAW
uses public domain Microsoft mouse routines written by Andrew Markley (CUJ
Sept/Oct 1988). An article describing DCUWCU appeared in the Jan '89 issue of
CUJ (p. 67).
Henry M. Pollock has submitted a demonstration program combining trig
functions and graphics functions in Turbo C. By selecting an option from the
menu, the program displays circleoids, asteroids, spirals, cycloids (Figure
2), etc.
My review of the JJB library in the October 1989 issue prompted John Muczynski
to create a graphics pull-down menu system with deeply nested menus. The
separate include code allows you to change key assignments and create macros.
The new configuration may be saved and restored. He also has submitted an
example program, "Conway's game of life," using the pull-down menu.


Updates




CUG295 -- blkio Library


The blkio library released in the November issue has been updated. Version 1.1
includes minor bug fixes and modifications.



Retrospective


CUG started collecting and maintaining public domain source code (originally
just BDS C source code) nine years ago. The library started with just ten
standard CP/M 8-inch disks. Currently, the total number of volumes (one volume
includes one to three 360K MS-DOS disks) has surpassed 300.
The past nine years have brought remarkable changes in C compiler technology
and in the microcomputer marketplace. Figure 3 shows the change in formats
requested by our members. Over the past three years, CP/M has become virtually
extinct and MS-DOS has come to dominate. More interesting, however, is the
diversity of operating systems used in recent years. Macintosh, UNIX/Xenix,
Atari and Amiga have appeared more than ever -- indicating that more and more
programmers who use non-MS-DOS operating systems are interested in C and are
seeking portable C source code. I think this trend is strong evidence that C
is a portable language.
Table 1 shows the 20 most popular disks in the last three years.
The most-ordered CUG disk is MicroEmacs v3.9 (CUG#197 and CUG#198). MicroEmacs
faithfully implements most of the features of Richard Stallman's Emacs editor.
Daniel Lawrence claims copyright privileges for this version which has also
been updated and enhanced many times by our staff and members. The secret of
MicroEmacs' popularity seems to be its portablity (it runs on more than ten
different operating systems), rich set of features, and its configurability --
a built-in macro language lets MicroEmacs be tailored to virtually any task.
The next two most popular disks are UNIX tools used in compiler development.
CUG#172, #173 and #290 are LEX, a lexical analyzer that extracts tokens from
an input character stream. CUG#285 is a YACC compatible parser and code
generator.
As you'll notice from the Top 20 list, our library contains a wide variety of
application programs and development tools, including cross-assemblers,
windows, graphics, an AI application, communications, and a math package,
among others.
One of the more recent trends in the library is the emergence of shareware.
Even though you must pay some minimal fee for the source code of a shareware
program, the quality of some volumes is very competitive with more expensive
commercial products. Another trend is the submission of more serious and
specialized applications. For example, the 3-D medical imaging software on
CUG#293-294.


Wish List


Even with all this diversity, there are many frequently requested packages.


A Simple Text Editor


Many people have asked for a simple text editor that can be embedded in their
application. The editor needn't be fancy and powerful like MicroEmacs, but
should offer these features:
Be callable (as a function) from the application program
Function in both full-screen and windowed applications
Can retrieve and save a file
Can browse a file (page up/down)
Be modeless
Support block manupilations (block copy, move, or delete)
Can be compiled with small model under MS-DOS
Can read up to 30K ASCII text
Search or replacement is optional
Go to the specified line number is optional


An ANSI C Compiler


This is a real challenge. We hope to address this need by distributing the GNU
C compiler (and C+ + compiler) from The Free Software Foundation.


.PDX Or .DBF File Function Libraries


A .PDX file is an image file produced by ZSoft's PC Paintbrush. It is a common
graphics file format for the PC and is also used by most scanners, fax
programs, and desktop publishing programs. A .DBF file is a data file used by
Ashton-Tate's dBase programs. We need function libraries that manipulate these
standard format files.


Spread Sheet


As with the editor, we need a simple spread sheet that can be embedded in
larger applications.


Pascal To C Translator


This would be useful for Pascal programmers trying to port their programs.
Michael Yokoyama (HI) has forwarded such a program to us, but we have been
unable to contact the author, Per Bergsten of Sweden, to get permission to
release the program. Please let us know if you can contact Per Bergsten or
know of an independent version of this code.



C To Pascal


Useful for Pascal programmers who want to port an application program written
in C.


Cross C Compiler


Thanks to Will Colley, we have a variety of cross assemblers. However, our
only cross C compiler is CUG204, 68000 C compiler by Matthew Brandt, which
runs under MS-DOS and generates 68000 object code. We need more variety in
this area (like a cross C compiler that runs under MAC and generates 8086
code).


Download Fonts In A Laser Printer


All sorts of applications could make better use of laser printer capabilities
if they could download special fonts. We'd like a library of functions that
can read Bitstream, Ventura Bitmap and other popular font files and download
them to an HP compatible.


Sideways Text


Not a configuration utility that uses a printer's landscape mode, but a
utility that exploits a printer's graphic mode to print 90ø rotated text. Why
not?


Database Management


We would like a simple and useful relational database manager -- in C.
If you've seen C source code such as those listed here or can implement them,
please let us know. In addition, we are interested in obtaining C++ and C
source code for Macintosh. Moreover, I believe you have your own wish list.
Please let me know about it for a future column.
P.S. Henri de Feraudy of France, the author of Small Prolog in CUG#297, is
sending us a PC version of Little Small Talk. It will be a new release in a
future issue.
Figure 1
Figure 2
Figure 3
Table 1
Year 1987
1. 173 LEX Part 1 (lexical analyzer)
2. 172 LEX Part 2
3. 198 MicroEmacs v3.9 Source (text
editor)
4. 197 MicroEmacs v3.9 Executable &
Documetation
5. 175 (Replaced with CUG285)
6. 174 (Replaced with CUG 285)
7. 201 MS-DOS System Support (ANSI
driver, TRS, ..etc.)
8. 204 68000 C Compiler (cross compier
for MSDOS)
9. 236 Highly Portable Utilities (Unix-like
tools)
10. 200 Small C Interpreter
11. 220 Window BOSS (window library)
12. 227 Portable Graphics
13. 164 Windows
14. 218 Dictionary Part I
15. 217 Spell & Dictionary Part II (spell
checker)
16. 155 B-TREES, FFT, etc. (balanced
binary tree, fast fourier transform)
17. 228 Miscellaney IX (window, ISAM

routines, .. etc.
18. 165 Programs from Reliable Data
Structures (from Plum Hall)
19. 216 Zmodem & Saveram
(communication)
20. 226 ART-CEE (rule-based inference
engine)

Year 1988
1. 197 MicroEmacs v3.9 Exec. & Doc.
(Text Editor)
2. 198 MicroEmacs v3.9 Source
3. 259 Console I/O & Withers Tools
(window functions)
4. 255 EGA Graphics Library
5. 172 LEX Part 1 (Lexical analyzer)
6. 173 LEX Part 2
7. 260 Zmodem, CU, tty Library
(communication)
8. 236 Highly Portable Utilities UNIX-like
tools)
9. 151 Ed Ream's Screen Editor for IBM
PC
10. 263 C_wndw Toolkit (windows)
11. 248 Micro Spell (spell checker)
12. 241 Inference Engine & Rule Based
Compiler
13. 242 Still More Cross Assemblers
14. 155 B-TREES, FFT, etc. (balanced
binary tree, fast fourier transform)
15. 227 Portable Graphics
16. 247 Miracl (multi-precision integer and
rational arithmetic C library)
17. 246 Cycles, Mandelbrot
18. 232 Little Smalltalk - Unbundled Part 2
19. 231 Little Smalltalk - Unbundled Part 1
20. 265 cpio Installation Kit (archive
utility)

Year 1989 (Until October)
1. 197 MicroEmacs v3.9 Exec. & Doc.
2. 198 MicroEmacs v3.9 Source
3. 285 Bison for MS-DOS (YACC like
parser)
4. 290 FLEX (fast lexical analyzer)
5. 263 C_wndw Toolkit
6. 283 FAFNIR (general-purpose,
table-driven forms engine)
7. 277 HP Plotter Library (graphics)
8. 173 LEX Part 2
9. 172 LEX Part 1
10. 284 Portable 8080 Emulator
11. 260 Zmodem, CU, tty Library
12. 236 Highly Portable Utilities
13. 276 Z80 and 6804 Cross Assembler
14. 155 B-TREES, FFT, etc.
15. 241 Inference Engine & Rule Based
Compiler
16. 242 Still More Cross Assemblers

17. 273 Turbo C Utilities
18. 261 68K Cross Assembler for MSDOS
19. 220 Window BOSS (window library)
20. 292 ASxxxx C Cross Assemblers



























































C Programmer's Toolbox/PC


Kenji Hino


Kenji Hino is a member of The C Users' Group technical staff. He holds a
B.S.C.S. from McPherson College and an undergraduate degree in metalurgy from
a Japanese university. He is currently working toward an M.S.C.S at the
University of Kansas.


Unlike UNIX, MS-DOS has no standard utility programs to support C programmers
in program development or maintenance. In the past, C programmers have
developed their own tools from scratch or ported tools from other operating
systems to MS-DOS. UNIX tools have been ported most, simply because they are
the "right" tools to improve programmer productivity. This report looks at a
collection of UNIX-like tools, C Programmer's Toolbox/PC revision 2.0 by MMC
AD Systems.


Component


The Toolbox/PC consists of Volumes I and II, which are available separately or
together. I recommend getting both. Each volume includes two IBM 360K disks
and costs $99.95; both volumes together go for $175. The manual (in a binder)
describes both volumes, regardless of whether you purchase Volume I, II, or
both. The C Programmer's Toolbox is available from MMC AD Systems, Box 360845
Milpitas, CA 95035, phone (408) 263-0781. Although the Toolbox/PC runs on
PC/MS-DOS, MMC AD Systems also distributes versions of the Toolbox for the
Macintosh MPW and the Sun UNIX system.
The installation of the Toolbox on either a floppy disk or hard disk system is
straightforward; just copy all files from the distribution to your disk. If
you install the Toolbox on hard disk systems, be sure that the path is set
correctly.


The Tools


The Toolbox includes 21 tools (see Table 1). All the tools are command-line
driven. The corresponding UNIX tools are also listed in the same table. The
tools help analyze the structure, format and execution of programs, manipulate
and/or modify program input/output data, or verify program input/output data
(see Figure 1).
Covering all 21 tools in a report this size is impractical and undesirable.
Thus, I will focus on the analytical tools, CFlow, PMon and CritPath. These
tools are mainly used to understand a program's structure and to analyze the
performance of your application program for the enhancement.


CFlow


Whether developing or maintaining a program, as the program becomes larger,
you tend to lose sight of the overall program structure. Discerning the
inter-relationships between modules becomes harder as the program grows. Even
worse, you may have to study code written by somebody else.
CFlow is a tool for studying code. It scans one or more C source files to
generate reports that describe the hierarchy of both defined and invoked
functions (external or library functions).
Figure 2 shows a program flow tree, one of the reports produced by CFlow. (The
analyzed source code is shown in Listing 1 and is adapted from a program in
the CUG PD Library. The original author is Richard Threlkeld.) The line
indentation indicates the level of function invocations. If the same function
is referenced more than once, the line number of the last reference is
attached to the beginning of the line. An asterisk (*) indicates whether the
function is an external or run-time library function. Within the parentheses
following a function name is the source filename and a starting line number of
the function definition.
In order to obtain the desired result, you must specify the dash/slash options
appropriately. For example, function names at each level of a CFlow tree are
displayed in alphabetical order by default. If you want function names
displayed as they are encountered, use the -e option. In addition, when using
multiple input files, the -f option is useful to display the location of each
function.
In this version 2.0, many improvements were made over the previous version.
CFlow now reports a function pointer (such as (*a) ()) or function address
(such as f(); a = f;). It also has a virtual memory system that handles
programs of unlimited size (true for some of the other tools, too).
The biggest improvement is that CFlow now automatically preprocesses your
source code. That is, it recognizes #if directives to read and process the
appropriate portions of your code. This, however, creates one problem. If a
function is a macro, it is expanded and replaced with some system-level
function, surprising you with some unfamiliar function name in the report,
such as _filbuf() instead of getc(). This can be solved by turning off the
preprocessor with the p switch, thereby sacrificing all the preprocessor
benefits.
Along with the CFlow Tree, CFlow generates a Master Define Function List (a
list of caller and callee), an Undefined Function List (a list of external or
library functions) and Function Called by List (a list of callee and caller)
when you specify the proper dash/slash options.
Using CFlow, the programmer can easily and quickly understand how a program is
structured and which module is invoked by which module. To understand
visually, you can draw the structure diagram as in Figure 3, based on the
Program Flow Tree. In Figure 3 for example, if a portion of the code in
crc_update() is modified, you know from the reports which other functions will
be affected (in this case, crc() and crc_finish()).


PMon And CritPath


The execution profiler PMon is a tool which analyzes a program. It determines
how much execution time is spent on each symbol (functions or BIOS/DOS calls)
or program area.
During program execution, PMon resides in memory with the target program,
intercepts the program at regular intervals and examines the CS: IP register
of the target program to determine which section of code is currently being
executed. PMon tracks this information for each intercept and, using the
information from the .MAP file (symbol entries), generates a set of reports.
I tested PMon using the CRCK (Cyclic Redundancy ChecK) program CRC15.EXE. The
program listing of CRC15 is in Listing 1; it must be compiled and linked to
generate a .MAP file. The .MAP file is then processed by MapVar and placed
into PMon with the target executable program. Figure 4 shows two reports
resulting from monitoring CRC15.
The first report is the program execution summary, which gives the complete
synopsis of the program's execution. Descriptions for certain summary headings
are:
Total execution clicks. The total number of clock ticks recorded in the
program initiation, execution, and termination.
Total monitored hits. The actual number of clock ticks recorded during program
execution.
Total symbol entries. The total number of symbols (function names) used in the
program.
Number of symbols hit. The number of symbols detected in the execution.
Total symbols hits. The total number of times PMon found the program executing
as opposed to BIOS, DOS, or other resident programs.
Time in program. The total time spent in the program vs. BIOS/DOS functions
and other activities (Time below/above).
Time in BIOS/DOS. The total time spent in BIOS/DOS functions.
According to the program execution summary, CRC15 processed 1 file within 6
seconds. Although CRC15 contains 115 symbol entries, PMon found only four
symbols during program execution, even though it checked CRC15 a total of 92
times. CRC15 made 113 DOS system calls using 12 different DOS calls. Of the 92
times checked, PMon found the program executing for 4.76 seconds (79.3%) and
BIOS/DOS for 1.24 seconds (20.7 %).
The second report, the Symbol execution Summary, shows where a monitored
program is executing within itself, excluding DOS calls.
Abs Adr -- the starting address (segment:offset) of asymbol.
Hits -- the total number of times PMon found the execution of a symbol.

Loc% -- the percentage of activity of a symbol when compared with the total
execution excluding DOS calls.
Tot% -- the percentage of activity of a symbol when compared with the total
execution including DOS calls.
Entry Name -- Symbol name.
In this example, PMon detected that function crc_update(), whose starting
address is 0:011e, executed 50 times and took 63.5% of total execution time
excluding DOS calls and 54.3% of total execution including DOS calls.
In addition, PMon generates a BIOS Interrupt Summary, a DOS Function Call
Execution Summary Report and DOS Function Call Execution Detail Report showing
the statistics of BIOS/DOS operations performed in the program execution such
as Character input/output, File input/output, etc.
Although these reports provide a good amount of information about software
performance, further analysis can be done with CritPath command.
CritPath determines the critical path of a program by analyzing the reports
generated by CFlow and PMon commands. A program's critical path is the
sequence of functions called from main() that consumes more execution time
than any other sequence.
Figure 5 shows a Critical Path Report generated by CritPath. The report
provides the primary information necessary to improve a program's performance.
The report shows a list of the 20 functions that used the most execution time
(Top 20 Functions in Actual Time), a list of the 20 functions that by
themselves and through other functions that they called used the most
execution time (Top 20 Functions in Cumulative Time). Finally, the reports
provide a list of the functions that comprise the critical path of the
program. In this example, the critical path is the sequence of functions crc()
and crc_update(). CritPath also generates both a Function Summary Report that
evaluates the performance of all functions and system calls in the program and
a Weighted Hierarchical Program Flow Tree.
Using the statistics produced by PMon and CritPath, programmers can spot
places where performance could be improved. However, these tools only identify
weak spots in the program and don't come up with the method to improve the
performance. Such information might be obtained from books such as
Supercharging C With Assembley Language by Harry Chesley, Mitchell Waite, The
Waite Group.


Conclusion


Overall, compared to UNIX tools, the Toolbox tools have more options and
provide more detailed information, helping the programmer to take more control
over program output. On the other hand, he or she must read the manual very
carefully and specify the appropriate options that will generate the desired
result. Furthermore, the input source code for some tools should be not only
syntactically correct but done in good programming style, even if the program
compiled fine. Otherwise, the output information might come out confusing. For
example, the inappropriate choice of options and poor programming style (such
as Listing 1) cause CFlow to report an identifier, crc as a function address,
not as a variable (crc is used for a function name and variable name. This can
be detected by CXref.). CFlow also doesn't distinguish between function
invocation and function declaration inside a function.
For beginners, the Toolbox can be a good starting point for using tools to
improve productivity since the commands are very uniform and the manual is
well written. In the manual, each tool is uniformly explained using sample
results. In particular, observations and suggestions about the reports
generated are honest and good advice for users.
For advanced programmers, the combination of CFlow, PMon and CritPath can give
them clues for fine tuning or improving software performance either after the
program has been developed or when it is about to be updated. CFlow, CPrint,
CXref and CLint can be used to study existing programs and will greatly reduce
maintenance cost.
Figure 1
Figure 2
 *** Program Flow Tree ***
 -------------------------

 1: main(CRC15.c:4)
 2: crc(CRC15.C:29)
 3: crc_clear(CRC15.C:58)
 4: crc_finish(CRC15.C:80)
 5: crc_update(CRC15.C:63)
 6: 5 crc_update()
 7: exit(*)
 8: fclose(*)
 9: fopen(*)
10: fprintf(*)
11: printf(*)
12: _filbuf(*)
13: 7 exit(*)
14: 11 printf(*)
Figure 3
Figure 4
 *** Program Execution Summary ***

Program executed: crc15.exe
Delay/Run period (clicks): 0/0
Start date/time: October 19, 1989 19:45:12
Stop date/time: October 19, 1989 19:45:18
Elapsed execution time: 0: 0: 0: 6 6 seconds

Total execution clicks: 95
Approximate clicks/second: 15.8 Approx sample period (ms): 63.2
Total monitored hits: 92

Total symbol entries: 115
Number of symbols hit: 4 % of total symbols hit: 3.5
Total symbol hits: 73 Avg hits/hit symbol: 18.3

Number of monitored interrupts: 2
Number of interrupts used: 2 % of total monitored: 100.0
Total BIOS interrupt calls: 141 Avg # interrupts/hit: 7.4
Total BIOS interrupt hits: 19 Avg # hits/interrupt: 0.1


Number of DOS calls used: 12
Total DOS program calls: 113

Time in program (secs): 4.76 % of total: 79.3
Time in BIOS/DOS (secs): 1.24 % of total: 20.7
Time below program (secs): 0.00 % of total: 0.0
Time above program (secs): 0.00 % of total: 0.0

Total KNOWN time used (secs): 6.00 % of total: 100.0
Total UNKNOWN time used (secs): 0.00 % of total: 0.0

 *** Symbol Execution Summary ***

 Abs Addr Hits Loc % Tot % Entry Name
--------- -------- ----- ----- ----------

 7a 12 16.4 13.0 _crc
 11e 50 68.5 54.3 _crc_update
 3e4 1 1.4 1.1 __chkstk
 1edc 10 13.7 10.9 __aNlshr

--- HINT --- HINT --- HINT --- HINT --- HINT --- HINT --- HINT --- HINT ---

 Concentrate on the following functions to improve your program's
 performance:

 _crc ( 13.0)
 _crc_update ( 54.3)
 __aNlshr ( 10.9)
Figure 5
 *** Critical Path Report ***
 ----------------------------

 Top 20 Functions in Actual Time
 -------------------------------

Rank Seconds % Total Function Name
---- ------- ------- -------------

 1. 3.3 54.3% crc_update()
 2. 1.0 17.4% __SysCall_3fH()
 3. 0.8 13.0% crc()
 4. 0.7 10.9% _aNlshr()
 5. 0.1 1.1% _chkstk()
 6. 0.1 1.1% __SysCall_3dH()
 7. 0.1 1.1% __SysCall_40H()
 8. 0.1 1.1% __SysCall_43H()
 9. 0.0 0.0% crc_clear()
 10. 0.0 0.0% crc_finish()
 11. 0.0 0.0% exit()
 12. 0.0 0.0% fclose()
 13. 0.0 0.0% fopen()
 14. 0.0 0.0% fprintf()
 15. 0.0 0.0% main()
 16. 0.0 0.0% printf()
 17. 0.0 0.0% _filbuf()

 Top 20 Functions in Cumulative Time
 -----------------------------------


Rank Seconds % Total Function Name
---- ------- ------- -------------

 1. 6.0 100.0% crc()
 2. 6.0 100.0% main()
 3. 2.7 44.6% crc_finish()
 4. 2.7 44.6% crc_update()
 5. 0.8 14.1% __SysCall_3fH()
 6. 0.5 8.7% _aNlshr()
 7. 0.0 0.0% crc_clear()
 8. 0.0 0.0% exit()
 9. 0.0 0.0% fclose()
 10. 0.0 0.0% fopen()
 11. 0.0 0.0% fprintf()
 12. 0.0 0.0% printf()
 13. 0.0 0.0% _chkstk()
 14. 0.0 0.0% _filbuf()
 15. 0.0 0.0% __SysCall_3dH()
 16. 0.0 0.0% __SysCall_40H()
 17. 0.0 0.0% __SysCall_43H()

 The Critical Path
 -----------------

 Act Rank Cum Rank
 (%) (%)
---- ---- ---- ----
 0.0 15 100.0 2 main()
13.0 3 100.0 1 crc()
 0.0 10 44.6 3 crc_finish()
54.3 1 44.6 4 crc_update()

 Critical path hits = 62 Total hits = 92
 Critical path time = 4.0 secs Total time = 6.0 secs
 % of total = 67.4
Table 1
Toolbox Volumes I & II UNIX tools Description
==================================================================
Cat cat, cp Concatenate Data
CharCnt wc Count Characters,Lines...
CFlow cflow Trace C Program Flow
CLint lint C Semantic Checker
CPrint cb, indent C Source Code Beautifier/Reformatter
CritPath Critical Path Analyzer
CXref xref C Cross Reference
Detab expand Remove Tabs
Entab unexpand Restore Tabs
ExecTime time Time Program Execution
FileComp comp Compare Files
FileDiff diff Difference Files
FileDump od Dump File
FileList List and Find Files
Fill Expand Text Template
MapVar Extract Load Map Variables
PMon prof, gprof Program Performance Monitor
STrip Extract Text
Tail tail Copy End of File
TabTran sed Translate Tabs

TransLit tr Transliterate Characters

Listing 1
#include <stdio.h>

main(argc,argv)
int argc;
char **argv;
{
int i;
void crc();

if (argc <= 1)
{
printf("USAGE:crc15 filename [filename...]\n");
exit(1);
}

for(i=1; i < argc; i++)
{
printf ("\n%-s ",argv[i]);
crc(argv[i]);
}
exit(0);
} /* main */

/* CRC
* Cycric Redundancy Check
*
*/
void crc(argv)
char *argv;
{
FILE *fd;
int crc;
int c;
char crc_char;
int crc_clear(),crc_update(),crc_finish();

fd = fopen(argv,"rb");
if(!fd)
{
fprintf(stderr,"Can't open %s !\n",argv);
exit(1);
}
crc = crc_clear();

while((c = getc(fd)) != EOF)
{
crc_char = c;
crc = crc_update(crc,crc_char);
}

crc = crc_finish(crc);
printf("%04x",crc);
fclose(fd);
} /* crc */

int crc_clear()

{
return(0);
}

int crc_update(crc,crc_char)
int crc;
char crc_char;
{
long x;
int i;

x = ((long)crc << 8) + crc_char;
for(i = 0;i < 8;i++)
{
x = x << 1;
if(x & 0x01000000)
x = x ^ 0x01A09700;
}
return(((x & 0x00ffff00) >> 8));
}

int crc_finish(crc)
int crc;
{
return(crc_update(crc_update(crc,'\0'),'\0'));
}





































Publisher's Forum
I've been reading documentation. It's no fun.
Here's some advice from an experienced "how to" writer, who's also an
experienced programmer, about how documentation should be structured to be
useful.
Include an extended procedural tutorial. This section is for the user who
doesn't have enough prior experience with similar products to guess what to do
next. Don't mix tips about advanced tricks into this section, or cautions
about product limitations and quirks. If you do, the user won't be able to
find those important tidbits later without re-reading the entire section. In
every "how-to" piece, focus is everything: give the procedural outline and
just the procedural outline.
Include a goal-oriented "Tips & Techniques" section. I don't care what fruity
name you give your product, there will be certain non-obvious tricks that make
it more productive. Organize these by goals -- e.g. Printing Fields From A
Join, Timestamping A File, Converting File Formats. This section should be
rife with cross-references and redundancy. Each goal's discussion should at
least cross-reference related material that appears elsewhere, and include all
the other "extraneous" information you were tempted to toss in as asides in
the procedural section. Short, well-targeted examples belong here. Even if
your product is "truly unique", the goals should be stated in terms of
commonly recognized paradigms so that my experience with similar projects can
speed my adaptation to your product.
Include a thorough technical specification. No, technical specifications don't
help the beginner, but they are invaluable to an experienced user.
Cross-reference the specs. Include hardware requirements, interface
specifications, data structure templates, file specifications, and
command-line syntax for subordinate modules (even for those modules that are
normally invoked by some "integrated environment driver" -- don't presume to
know better than the programmer what he needs to know to get the job done).
Explain the design goals and philosophy. Virtually every product started in a
specific environment with a specific, limited application in mind. Yes,
marketing will want to promote the product as everything for everyone, but
make room somewhere in the document for the truth. Sharing the design
philosophy helps the programmer understand where the product fits and reduces
the early frustration level. If I'm trying to use your tool in a development
project, and I know the design goals that produced the tool, I stand a better
chance of designing a project that can be built with the tool.
Invest in a superb index. So what if the answer to my question is in the
manual. How many times can I afford to read a 900-page primer to find the two
lines that are critical? The answer is a very small integer; I'm going to be
calling customer support. Get your ego and marketing's time-table out of the
way and hire a professional to prepare a SUPERB index. Every dollar spent on
an index will be returned ten-fold in reduced customer support costs.
Explain the installation process for standard environments, and then explain
what configuration options are available and how they interact. Give me this
information even if you do bundle a whizbang installation utility. I've
probably been at this long enough to have my own ideas about where to put my
working files.
In short, keep your reader in mind. Design your documentation to meet the
user's needs over his entire life as a user: a detailed step-by-step to orient
the beginner; well-packaged goal-organized information to support the
exploration and growth of the intermediate user; and comprehensive, frank, and
well-indexed reference material for the experienced and technically advanced
user.
I mean it.
Robert Ward
Editor/Publisher
















































New Products


Industry-Related News And Announcements




UNIX Alternative Announced For The Apple Macintosh


Technical Systems Consultants, Inc. has released a UNIX compatible, real-time
operating system for the Apple Macintosh family. The system, UniFLEX, supports
multi-tasking and multi-users and comes complete with all development tools, a
C Compiler, TCP/IP Networking support and X Window System v11.3 software. A
version has also been released for Force Computer's CPU-37-singleboard VMEbus
computer with integrated Ethernet hardware.
For the Apple Macintosh family, price for a single system development license
is $595. The price includes 90 days phone support. For the Force CPU-37, the
single system licensing price is $1000 for UniFLEX/RT or $1800 for UniFLEX/RN
with networking.
Contact Technical Systems Consultants, Inc., 111 Providence Road, Chapel Hill,
NC 27514 (919) 493-1451; FAX (919) 490-2903.


Stepstone Updates Objective C


The Stepstone Corporation has released its Objective-C Compiler v4.0 running
under MS-DOS and Microsoft's OS/2 The Compiler is a C-based hybrid
object-oriented language and is ANSI C compatible.
Objective C v4.0 requires a PC/AT or PS/2 class machine running MS-DOS and
Microsoft C v5.0. The compiler, packaged with a basic data structures library
(ICpak101) and built-in extended memory support is $249.
Stepstone has also released its object-oriented user interface toolkit,
ICpak101, for workstations running the X-Windows System v11.
Product information is available from the Stepstone Corporation at (203)
426-1875, (800) 289-6253 or by mail to The Stepstone Corporation, 75 Glen
Road, Sand Hook, CT 06482.


Lattice's New 6.0 Release Features ANSI Compliance


Lattice, Inc. is shipping v6.0 of its C compiler for MS-DOS & OS/2. The
release features major enhancements to the compiler, a global optimizer, new
programming utilities, and a number of new library functions. Both the
compiler and libraries are now ANSI compatible.
Version 6.0 contains a new global optimizer, automatic register variable
support, in-line function support, optimized libraries, and upgrades to the
compiler. The Lattice C Compiler v6.0 allows program modules compiled under
different memory models to be linked into a single program.
The Lattice v6.0 now includes LASM, a full-featured macro assembler with
support for 386 systems. LASM is compatible with MASM, and its output is
compatible with CodePRobe so assembly language programs can also be debugged
at source level. Utilities now bundled with the compiler are an overlay
linker, a MAKE facility, BIND Utility, and several UNIX-like tools including
EXTRACT and BUILD, DIFF, GREP, SPLAT, TOUCH, and WC. Programmer's tools in the
compiler package include the CodePRobe source level debugger, an integrated
editor, object module disassembler, object module librarian, and an automatic
installation program.
In addition to the OS/2 API and special graphics libraries in the previous
version, Lattice adds its Curses screen management library, communications
library, the dBC III library of database functions, and a protected mode OS/2
library.
The new list price of $250 includes unlimited free technical support through
Lattice's telephone hotline, bulletin board, MIX network, or written
correspondence. Lattice provides an unconditional, 30-day money-back guarantee
with each product.
For further information, contact: Lattice, Inc., 2500 South Highland Avenue,
Lombard, IL 60148 (312) 916-1600; FAX: (312) 916-1190.


Greenleaf CommLib, V3.0 Released


Greenleaf Software has released a new version of its communications library,
CommLib. Greenleaf CommLib v3.0 includes Kermit, XModem, XModem 1K, and YModem
batch file transfer protocols. It fully supports automatic RTS-CTS hardware
flow control, Hayes modem control functions, and XON/XOFF software flow
control.
CommLib automatically filters up to three codes from the receive stream,
stores status along with data in a "WideTrack Receive" mode, and
programmatically ignores or reacts to modem status at the interupt service
level.
The Greenleaf CommLib supports the PC, XT, AT, PS/2, and compatible machines
using COM1 and COM2 ports, and COM3..COM8 on a PS/2. It also supports up to 35
ports when using multiport boards. It can serve several families of multi-port
boards, including Digiboard, Stargate, Arnet, Contec, Quatec, and Quadram.
CommLib v3.0 is $299. For additional information and a free Demo disk, contact
Greenleaf Software, Inc.; 16479 Dallas Parkway, Suite 570; Dallas, TX 75218;
(800) 523-9830; FAX (214) 248-7830.


XVT Now Runs On MS-DOS, OS/2 And UNIX


New character-based versions of GSS' XVT Extensible Virtual Toolkit are
available for MS-DOS, OS/2 and UNIX programmers.
XVT allows programmers to support character displays with applications that
feature windowing, pull-down menus, dialog boxes, scroll-bars and other
graphical user-interface features. The same application source code can
support the Windows, PM and Mac GUIs.
Versions for Windows, PM, Macintosh and UNIX list for $595. XVT carries no
run-time redistribution royalties. The company is located at 9590 SW Gemini
Drive, Beaverton, Oregon 97005 (503) 641-2200; FAX: (503) 643-8642.


Helios Enhances Proteus System


Helios Software has released a new version of its prototype/demo system,
Proteus. Proteus v4.5 enables software developers to build functional
prototypes, marketing demos, tutorials and other interactive presentations.

Version 4.5 offers an integrated environment to build character-based demos
and bitmapped demos in any of 23 graphics formats. Both full-screen and
overlay images can be displayed, using 26 different video effects. Designers
can create screens with the built-in Screen Painter or configure Proteus to
execute any external paint program. Captured screens can also be incorporated
into demos.
Proteus is $199 for a three-disk set, with examples in source code. There is a
30-day money-back guarantee, no royalties for distribution and no sign-on
screen. Required hardware configurations depend on the graphics mode used,
ranging from monochrome text to super-VGA. The Helios order number is (800)
634-9986, or contact them at P.O. Box 22869, Seattle, WA 98112 (206) 324-7208.


High C V1.6 Includes 486 Support


MetaWare Inc. has released its ANSI-conformant High C compiler v1.6 for
386/DOS on the 80386 and the 80486 in protected mode. Protected mode on the
386 and 486 is supported in conjunction with MS-DOS extenders. Specific
support for the 486 is provided under toggle control. MetaWare has also
released High C v1.6 for OS/2 and real-mode MS-DOS.
Version 1.6 features expanded libraries, new documentation, two editors, a
disk cache utilily, a B-tree library, and a graphics library for the 80386/486
in protected mode. Users also get MetaWare's new make facility, and DOS Helper
which is a set of UNIX-style utilities for the MS-DOS operating system.
This upgrade comes with the GFX/386 Graphics library, produced in conjuction
with C Source. GFX for the 80386 is a user-transparent port of the C Source
GFX graphics package. The graphics package provides specific floating-point
graphics function; MetaWare is providing additional libraries that support the
80387 and Weitek Abacus. High C also includes the EC editor from C Source,
HyperDisk disk cache from HyperWare, and source code for the MicroEMACS
editor.
In addition, v1.6 will be bundled with two products from Sterling Castle:
BlackStar/386 "C" Function Library and BPTPlus in C. These products provide
data retrival capabilities and over 300 additional library functions. Sterling
Castle's BlackStar/386 C libraries and the GFX/386 Graphics library are
available only through MetaWare.
Please refer inquiries to MetaWare Incorporated, 2161 Delaware Avenue, Santa
Cruz, CA 95060-5706 (408) 429-6382; FAX (408) 429-9273.


Prototyping Tools Combined


Genesis Data Systems has consolidated its line of prototyping and presentation
products, promerly sold as RADs and RPS, into a single system named
"ProtoFinish."
ProtoFinish is a versatile system for creating prototypes, demos, tutorials
and other presentations. It includes a screen design module for building
ASCII-based screens, a memory-resident utility for capturing text or CGA
graphics screens, a music module for adding sound, a flexible 4th-generation
language for accurately simulating the look and feel of a program, and a
royalty-free run-time utility for distribution. Libraries of assembly language
routines, primarily for incorporating screens in C, PASCAL, BASIC, and Clipper
code, are included for the programmer.
Contact Genesis Data Systems, 8415 Washington Place NE, Albuquerque, NM 87113
(505) 821-9425; FAX (505) 821-9695.


LISP Objects


Sapiens Software has released a beta test version of its Common LISP Object
System (CLOS) implementation.
CLOS supports generic functions and methods (rather than message passing), and
multiple inheritance of object slots. Star Sapphire CLOS is embedded in the
Star Sapphire LISP v3.1 run-time, written in C, which eliminates CLOS loading
time.
Star Sapphire LISP runs on any PC compatible with 640Kb and a hard disk;
extended memory can be used if installed. The product is $99.95 from Sapiens
Software Corporation, P.O. Box 3365, Santa Cruz CA 95063 (408) 458-1990.


Faircom Offers 'Special Edition'


The Faircom Corporation has released a new application development toolbox,
which includes the d-tree development environment, file management system and
report generation system.
Faircom is introducing this product with a $695 "Special Edition" package and
a 30-day, no-risk trial offer.
For more information, contact Faircom at (800) 234-8180, 4006 West Broadway,
Columbia, MO 65203; FAX (315) 445-9698.


Oakland Updates Screen Tools


Oakland Group, Inc. has released v3.1 of the Look & Feel screen designer and
the C-scape interface management system. Look & Feel lets you prototype and
simulate screens, and automatically turn screens into C source code that will
run across MS-DOS, OS/2, UNIX, and VMS. The new version of C-scape allows for
total portability, has fewer levels of indirection, and creates smaller
executables.
MS-DOS and OS/2 versions of C-scape with Look & Feel cost $399, including
source code. Look & Feel costs $149; C-scape $299. UNIX versions begin at
$999. Look & Feel source code costs $900.
For more information, contact Oakland Group, Inc., 675 Massachusetts Avenue,
Cambridge, MA 02139 (800) 233-3733 or (617) 491-7311.


New Linker


Pocket Soft, Inc., has released .RTLink/Plus, an advanced overlay linker which
supports debugging of programs with multiple and nested overlays with
Microsoft's CodeView debugger.
.RTLink/Plus also provides a unique link-time Profiler, which gives a detailed
performance analysis in timing intervals which are user-adjustable to
thousandths of a second.
Pocket Soft is an authorized licensee of Microsoft CodeView information.
.RTLink/Plus has a list price of $495 and is available through most common
distribution/reseller channels and direct from Pocket Soft, Inc., 7676
Hillmont, Suite 195, Houston, TX 77040 (713) 460-5600.


Tool Writes Dialog Box Source Code


The Software Organization, Inc. has released DialogCoder, a programming tool
that eliminates as much as 95 percent of the coding normally associated with
windows dialog box programming.

DialogCoder automatically generates C source code from dialog templates to
manage all controls in the dialog; it uses graphical metaphors to express the
relationships between dialog controls and actions, which eliminates most of
the conventional dialog control programming. It also allows users to
interactively specify the state of each dialog control during initialization
and command processing.
DialogCoder requires a 286-or 386-based machine with Windows 2.X. A
Microsoft-compatible mouse is optional. DialogCoder is $349. To order, contact
the Software Organization, Inc. at (800) 696-2012.


Trio Releases C-Index/PC


Trio Systems has started shipping a new $195 C database library, C-Index/PC.
The new product, based on their C-Index/Plus package, allows C programmers to
incorporate database features into their applications running under Microsoft
Windows, OS/2, and MS-DOS. The C-Index/PC database library supports
single-user and multi-user LAN applications with full file management
facilities.
Complete source code is supplied with C-Index/PC and can be adapted for use
with any PC compiler and operating system running on an Intel microcomputer.
Product features include: precompiled libraries for Microsoft C and Turbo C, B
+ Tree indexing, variable-length records, direct and sequential access and
multiple record formats per file. There are no application royalties. For more
information, call (818) 798-5567.


New Debug Tool Traces Memory References


TUITS Inc. has introduced Dr. MD., a run time memory tracking utility that
finds memory overwrite bugs before an application crashes. Dr. MD catches
memory overwrites when they happen. It also catches 'free()s' on invalid
pointers, and dangling pointers.
Dr. MD will not allow you to overwrite allocated or automatic variables. When
Dr. MD finds a problem it reports the source file and the line number where
the problem was found as well as where the space was allocated. No heap
walking is needed.
Dr. MD comes as source and you compile it with your compiler to fit your
environment. The vendor claims it should work with any ANSII standard
compiler, and has worked successfully in MS-DOS and UNIX System V
environments. Dr. MD supports all the string library functions as well as
memset, memcpy, and limited support of sprintf.
Dr. MD sells for $59.95, and includes source code, manual, and some hints on
memory management. For more information contact TUITS Inc., 411 N. Shields,
Fort Collins, CO 80521, or call at (303) 224-9070.


AtLast Offers Overlay Tools


AtLast Software has released two new products: Overlay Architect, which
automates the process of overlay construction, and Overlay Optimizer, which
analyzes the performance of the program's overlay structure, then determines
how to rebuild the overlays for the best performance in a given amount of
space. AtLast Software will also custom build an overlay structure for
developers who do not want to build their own.
Overlay Architect sells for $369; Overlay Optimizer for $269. They can be
purchased together for $569. Quantity discounts are available. Custom built
structures are priced individually.


MicroWay 486 Compilers For C, Pascal & FORTRAN


MicroWay has released its 80486-targeted series of compilers, NDP C-486, NDP
Fortran-486, and NDP Pascal-486.
Each of the NDP-486 compilers include a "scheduler/code generator" that aligns
code and data on paragraph boundaries, detects and minimizes prefetch buffer
starving, uses new code sequences that run faster on the 80486 than the 80386,
and incorporates a new strategy for driving the Weitek 4167 high speed
coprocessor. They also provide a library of 70 device-independent graphics,
keyboard, and sound routines.
C, NDP Fortran, and Pascal-486 generate globally optimized, 32-bit native code
that runs in protected mode under UNIX 386 System V v3.0, SCO XENIX 386 v2.3,
and Phar Lap extended DOS. The compilers support the 486's built-in FPU and
the Weitek 4167 numeric coprocessor.
NDP C-486 is a two-dialect compiler that passes 100 percent of the Plum Hall
validation suite for UNIX System V C and 95 percent of the tests for the new
ANSI C standard. It includes an inline assembly language interface that
simplifies the writing of embedded code by allowing the programmer to specify
register values and generate interrupts.
The MS-DOS, UNIX, and XENIX versions of NDP C-486, NDP Fortran-486, and NDP
Pascal-486 retail at $1195 each. The C + + preprocessor lists at $495. All of
the compilers include one year of free updates.
Users should contact MicroWay's Technical Support Staff at (508) 746-7341 for
more information.


DOS Extender Supports Turbo C


Eclipse Computer Solutions, Inc.'s OS/286 MS-DOS extender now supports
Borland's Turbo C v2.0 and will soon support Turbo Pascal as well.
The MS-DOS extender products of Eclipse Computer Solutions, Inc. (formerly
A.I. Architects) exploit the protected mode operation of the 80286 and 80386
processors and make it possible to create, with conventional development
tools, applications that are not restricted by normal MS-DOS memory limits.
Contact Eclipse Computer Solutions, Inc., One Intercontinental Way, Peabody,
MA 01960 (508) 535-7510; FAX: (508) 535-7512.


T & T Enhances Data Junction


Tools & Techniques has released Data Junction v3.01. The new version adds
formats, an improved user interface, an expanded EZ-Convert mode, 300 percent
plus speed improvements, a built-in case translation, and new conversion
filters.
MS-DOS licenses are $99 for Data Junction: Standard, $199 for Data Junction:
Professional, and $299 for Data Junction: Advanced. UNIX/Xenix and LAN
licenses start at $495. Data Junction is written in C, and distribution/OEM
licenses are also available.
For more information, contact Micheal Hoskins at Tools & Techniques Inc., 1620
West 12th Street, Austin, TX 78703 (800) 444-1945, or (512) 482-0824.


LALR Adds Scanner Generator To Version 3.2


LALR Research has released LALR v3.2 which features the following improvements
over v3.0.
A lexical scanner generator is included which provides a 10 percent increase
in syntax checking speed over the previous hand-written scanner. An option has
been added to generate 0-40 percent smaller parsers. Multiple parsers can
exist in an application program. Parsers can read input files of unlimited
size.

The input grammar format for the new version is fully compatible with previous
versions.
LALR v3.2 is $249 and comes with a 60-day, money-back guarantee. Upgrades from
LALR v3.0 are $150. Shipping is $6.
For more information, contact LALR Research at PO Box 4722, Chico CA 95927
(916) 345-0916.


Solbourne Updates OS/MP


Solbourne Computer, Inc., has shipped the latest version of its
multiprocessing operating system, OS/MP v4.0A, which is based on the SunOS
v4.0.1, licensed from Sun Microsystems Inc.
OS/MP v4.0A introduces a set of system administration tools, to handle user
account maintenance, group account maintenance, network group account
maintenance, network account maintenance, NFS client maintenance, NFS server
configuration and modem installation.
OS/MP v4.0 also includes two new X Window tools. Smail is a user-friendly
interface to the standard UNIX mail environment. Sproperty displays the
property of any visible X Window.
Contact Solbourne Computer, Inc. at 1900 Pike Road, Longmont, CO 80501 (303)
722-3400; FAX: (303) 772-3646.



















































Belief Maintenance Using The Dempster-Shafer Theory Of Evidence


Dwayne Phillips


The author works as a computer and electronics engineer with the U.S.
Department of Defense and is a doctoral candidate in Electrical and Computer
Engineering at Louisiana State University. His interests include computer
vision, artificial intelligence, software engineering, and programming
languages. He first used the Dempster Shafer theory of evidence in 1984 and
uses it extensively in his PhD research into computer vision.


An expert system makes a decision given an amount of evidence. Usually it must
choose between several competing answers or hypotheses. The human expert keeps
these answers in his mind while he thinks over the problem. He gathers
evidence and shifts his thoughts from one answer to another. After gathering
evidence, he chooses the most favorable answer. We all do this in our daily
decisions, but we don't think about the process, and we certainly don't keep
track of specific numbers in our head.
An expert system needs a sub-system to pool evidence and reach decisions: a
belief maintenance system. The belief maintenance system keeps track of the
hypotheses and the degree of belief attributed to each hypothesis. When the
expert system finishes gathering evidence, the belief maintenance system
chooses the answer.
In some expert systems a belief maintenance system is not necessary, because
some expert systems make decisions based on a single, clear cut piece of
evidence. For instance, suppose an expert system has the task of rolling up
the windows in your car. The evidence is whether or not it is raining. The
system would check the atmosphere and ask, "Is it raining?" If the answer were
yes, it would roll up the windows.
In other expert systems a belief maintenance system is essential. Suppose the
expert system had to decide at 9.00 AM whether or not to roll up the windows
at 3.00 PM. Now the question is tougher. Evidence would include the daily
weather forecast, the wind speed and direction, the relative humidity, weather
records from past years, forecasts from the Farmer's Almanac, satellite
photographs, and other relevant sources. The expert system would pool all the
evidence and arrive at an answer.
Consider the nature of evidence. Some evidence is not reliable (the weatherman
is wrong sometimes and right sometimes). Some evidence is uncertain (an
intermittent atmospheric reading). Some is incomplete (the wind speed by
itself does not tell us much). Some evidence is contradictory (the
weatherman's forecast and the atmospheric conditions). Finally, some evidence
is incorrect (a broken atmospheric sensor or a wrong weather forecast).
The belief maintenance system must deal with these factors, taking the
evidence, assigning a measure of belief to each hypothesis, and changing this
belief as new evidence becomes available. The resulting decision must be the
same regardless of the order in which the system gathers the evidence.
The method of belief maintenance that most of us know is classical
probability. The basic properties of this system are [Beyer]:
A) P(’) = 0 (null set)
B) P(Q) = 1 (entire sample set)
C) P(A) = 1 - P(A')
D) P(AB) = P(A) + P(B), if A and B are mutually exclusive
E) P(A€B) = P(A) * P(B), if A and B are mutually exclusive
Another belief maintenance system came from the MYCIN project (a pioneering
medical expert system developed in the early seventies by Edward Shortliffe at
Stanford.) MYCIN used a system of certainty factors to keep track of
hypotheses. Shortliffe later dropped the certainty factor system for the
Dempster-Shafer theory of evidence.
The Dempster-Shafer (D-S) theory of evidence was created by Glen Shafer
[Shafer, 1976] at Princeton. He built on earlier work performed by Arthur
Dempster. The theory is a broad treatment of probabilities, and includes
classical probability and Shortliffe's certainty factors as subsets.
In the D-S theory of evidence, the set of all hypotheses that describes a
situation is the frame of discernment. The letter Q denotes the frame of
discernment. The hypotheses in Q must be mutually exclusive and exhaustive,
meaning that they must cover all the possibilities and that the individual
hypotheses cannot overlap.
The D-S theory mirrors human resoning by narrowing its reasoning gradually as
more evidence becomes available. Two properties of the D-S theory permit this
process: the ability to assign belief to ignorance, and the ability to assign
belief to subsets of hypotheses.
An example provides the easiest way to understand these properties and how
they differ from classical probability. Suppose we want to decide which of
three persons in an office -- Adam, Bob, and Carol -- will come in early to
turn on the lights and make coffee.
In the D-S theory the set Q = {Adam or Bob or Carol}. The sets {Adam}, {Bob},
and {Carol} are the mutually exclusive and exhaustive hypotheses. They are
singletons. In the frame of discernment there are 2Q or 8 possible
interpretations. (Figure 1)
Figure 1 contains two special sets in {’} and {Adam, Bob, Carol}. The first is
the null set, which cannot hold any value. As later examples will show, the
null set normalizes beliefs. The second special set is {Adam or Bob or Carol},
represented by Q. Assigning belief to Q does not help distinguish anything.
Therefore, Q represents ignorance.
Representing ignorance is a key concept. Humans often give weight to the
hypothesis "I don't know", which is not possible in classical probability.
Assigning belief to "I don't know" allows us to delay a decision until more
evidence becomes available. This mirrors the human tendency to procrastinate.
Suppose that given a piece of evidence, we make the assertion shown in Figure
2. The D-S theory calls an assertion a basic probability assignment. The M in
Figure 2 represents the measure of belief. The assertion of Figure 2 says that
we believe Adam is the best choice with a weight of 0.6. We'll give the other
0.4 of belief to Q or "I don't know," thus allowing us to delay deciding on
Adam.
We cannot make this type of assertion in classical probability. The classical
system's property of complements given earlier forces us to give Adam' 0.4
(the complement of Adam) if we give 0.6 to Adam. In this case Adam' = {Bob or
Carol}. Notice the difference between Q ={Adam or Bob or Carol} and Adam'
={Bob or Carol}. Adam' gives more belief to Bob and Carol than we want. Q
allows us to express a true "no comment" on the situation.
Assigning belief to subsets in the D-S theory allows us to assign belief to a
general concept instead of being too specific. Suppose in our example that the
local police advise us that we should not have women coming to work early by
themselves. We would make an assertion like that, shown in Figure 3. This
assertion gives a weight of 0.7 to the subset {Adam or Bob} and a weight of
0.3 to ignorance or no comment.
Classical probability does not permit a subset assertion. Recall that property
D requires P(Adam or Bob) = P(Adam) + P(Bob). That property would force us to
assign specific beliefs to Adam and to Bob individually. We do not want to be
that specific. We want to procrastinate and think it over some more.
Also, property C would make us assert the 0.3 to the complement of {Adam or
Bob} which is {Carol}. We do not want to assert 0.3 to {Carol}. Assigning
belief directly to {Carol} would contradict the evidence the police gave us.
The D-S theory employs Dempster's rule of combination to combine two
assertions. The mathematical formulas may be found in the references. They
confuse the best of us, but they are simple when illustrated. Figure 4 shows
how the two assertions combine.
The table in Figure 4 is an intersection tableau. Which lists one assertion
across the top and one down the side. Inside the tableau are the intersections
of the sets in the rows and columns, with the products assigned to the
intersections. The measures of belief inside the table sum to the final values
given below the table.
Notice how combination narrows the decision process. The single set {Adam} now
has the highest belief. The subset {Adam or Bob} comes in second with no
comment last.
Now suppose that we require the first person in the office in the morning to
bring up the computer system. Carol is an expert at this so we make the
assertion shown in Figure 5. This attributes most of the belief to {Carol}.
This new requirement or piece of evidence contradicts the previous evidence
given by the police. That is the nature of evidence. Dempster's rule of
combination allows us to combine the contradictory evidence and draw a logical
conclusion.
Figure 6 shows the combination of the result of Figure 4 and the assertion of
Figure 5. Inside the intersection tableau is the null set. There is no
intersection between the set {Carol} and the set {Adam} and there is also no
intersection between the set {Carol} and the set {Adam, Bob}. The null set
cannot hold any value. Therefore, it normalizes the beliefs of the other
subsets. The sum of the beliefs of the other subsets is divided by one minus
the belief in the null set. The beliefs of all the subsets sum to one. The
bottom of Figure 6 shows this extra step.
As a result, Carol is now the choice for coming in early in the morning. If
she is unable to do so, then Adam is the logical replacement. If Adam is
unavailable, then Bob comes in early.


Implementation


The preceding examples show that no complex mathematics are involved in
combining two assertions. Dempster's rule of combination uses simple addition,
subtraction, multiplication, and division. The only tricky part is the
intersections of the sets in the tableau.
There are several ways to solve the intersection question. Since there are
three singletons and 23 total interpretations, we'll represent the hypotheses
with three bits as in Figure 7.
Listing 2 shows the C function that combines two assertions. The inputs are
two belief vectors, each holding an assertion. The belief vector is a
one-dimensional array of floats. In our examples, the LENGTH_OF_BELIEF_VECTOR
is eight because we have three singletons and 23=8. The belief vector has a
space, or slot, for each hypothesis, ordered as in Figure 7. The belief vector
is awkward to initialize since we would like Adam in slot one, not slot four,
and Carol in slot three, not slot one. Nevertheless, a uniform belief vector
allows a very simple subroutine to combine the assertions.
The first for loop initializes the sum_vector, the belief vector which holds
the sums of the values found inside the intersection tableau. sum_vector holds
the sums for later when normalization occurs.
The for a loop goes through the belief vectors, finds the intersections, and
calculates the products. The two if > 0.0 statements reduce processing time by
eliminating unnecessary multiplication by zero. The function uses the C
bitwise AND operator & to find the intersection of sets. Without the bitwise
AND, the function would be much longer and much more complex.
The last for loop performs the normalization. The values in sum_vector are
divided by one minus the value assigned to the null set. The answer is stored
in vector1.
The combine_using_dempsters_rule function is the meat of the program written
in Turbo C v1.5. I used this compiler because it had a few functions that made
the user interface more pleasant. Except for those functions, there is nothing
in the program that is machine, compiler, or operating system specific.
One important note about implementing Dempster's rule of combination. The
number of calculations depends on 2Q . In our example there were eight
hypotheses. Alternatively, 200 single hypotheses would produce 2200 subsets,
2200 slots in the belief vector, and 2200 floating point calculations. This
gets out of hand rather quickly. Several of the references [Gordon, Shortliffe
1985] [Shafer 1985] [Shafer 1987] deal exclusively with this topic. The
discussion and proposed solutions are beyond the scope of this article.


Conclusion



The Dempster-Shafer theory of evidence is one method that an expert system may
use to keep score on competing hypotheses while it gathers evidence and draws
a logical conclusion. It is more general and capable than the classical
probability with which most of us are familiar. It is easy to implement and
executes quickly as long as the number of hypotheses is manageable. I suggest
you try it on your next expert system or AI-related project.
References
Beyer, William H., CRC Standard Mathematical Tables, 26th edition, CRC Press,
1983, pp. 503-559.
Gordon, Jean, Edward H. Shortliffe, "The Dempster-Shafer, Theory of Evidence,"
pp. 272-292 of Shortliffe, Edward H., Bruce G. Buchanan, eds., Rule Based
Expert Systems, Addison Wesley Publishing Company, 1984.
Gordon, Jean, Edward H. Shortliffe, "A Method for Managing Evidential
Reasoning in a Hierarchical Hypothesis Space," Artificial Intelligence, Vol.
26, No. 3, July 1985, pp. 323-357.
Shortliffe, Edward H., Bruce G. Buchanan, eds., Rule Based Expert Systems,
Addison Wesley Publishing Company, 1984.
Shafer, Glen, A Mathematical Theory of Evidence, Princeton University Press,
1976.
Shafer, Glen, "Hierarchical Evidence," The Second Conference on Artificial
Intelligence Applications, IEEE Press, December 1985, pp. 16-21.
Shafer, Glen, Roger Logan, "Implementing Dempster's Rule for Hierarchical
Evidence," Artificial Intelligence, Vol. 33, No. 3, November 1987, pp.
271-298.
Figure 1 Frame of Discernment for the Case of Adam, Bob, and Carol
 {Adam, Bob, Carol}
{Adam, Bob,} {Adam, Carol} {Bob, Carol}
 {Adam} {Bob} {Carol} {0}
Figure 2 An Assertion Showing the Use of Ignorance
m{Adam} = 0.6 m{Q} = 0.4
Figure 3 An Assertion Showing Belief Assigned to a Subset
m{Adam, Bob} = 0.7 m{Q} = 0.3
Figure 4 Combining Two Assertions Using Dempster's Rule of Combination
Figure 5 A New Assertion
m{Carol} = 0.9 m{Q} = 0.1
Figure 6 Combining Result of Figure 4 with Figure 5
Figure 7 Using Three bits to Represent the Hypotheses
bits hypothesis

000 {0}

001 {Carol}

010 {Bob}

011 {Bob, Carol}

100 {Adam}

101 {Adam, Carol}

110 {Adam, Bob}

111 {Adam, Bob, Carol} or {Q}

Listing 1
/*******************************************************************
* file d:\tc\cujds.c
*
* Functions: This file contains
* main
* display_belief_vector
* clear_belief_vector
* enter_belief_vector
* combine_using_dempsters_rule
*
* Purpose:
* This program demonstrates how to implement Dempster's
* rule of combination.
*
* NOTE: This is written for Borland's Turbo C
* Version 1.5. This allows us to use some
* nice user interface functions. The actual

* combination code is compiler independent.
*
******************************************************************/

extern unsigned int _stklen = 40000;


#include "d:\tc\include\stdio.h"
#include "d:\tc\include\io.h"
#include "d:\tc\include\fcntl.h"
#include "d:\tc\include\dos.h"
#include "d:\tc\include\math.h"
#include "d:\tc\include\graphics.h"
#include "d:\tc\include\conio.h"
#include "d:\tc\include\sys\stat.h"

#define LENGTH_OF_BELIEF_VECTOR 8



main()
{

char response[80];

int choice,
i,
j,
not_finished;

short place;
float a[LENGTH_OF_BELIEF_VECTOR],
belief,
v[LENGTH_OF_BELIEF_VECTOR];


textbackground(1);
textcolor(7);
clrscr();

not_finished = 1;
while(not_finished){

clrscr();
printf("\n> You may now either:");
printf("\n 1. Start the process");
printf("\n 2. Enter more assertions");
printf("\n 3. Exit program");
printf("\n _\b");
get_integer(&choice);

switch (choice){

case 1:
clear_belief_vector(v);
clear_belief_vector(a);
clrscr();
enter_belief_vector(v, 1);
clrscr();

enter_belief_vector(a, 1);
clrscr();
printf("\n> Initial Belief Vector\n");
display_belief_vector(v);
printf("\n> Second Belief Vector\n");
display_belief_vector(a);
combine_using_dempsters_rule(v, a);
printf("\n> Resultant Belief Vector\n");
display_belief_vector(v);
break;

case 2:
clrscr();
clear_belief_vector(a);
enter_belief_vector(a, 1);
clrscr();
printf("\n> Initial Belief Vector\n");
display_belief_vector(v);
printf("\n> Second Belief Vector\n");
display_belief_vector(a);
combine_using_dempsters_rule ( v, a);
printf("\n> Resultant Belief Vector\n");
display_belief_vector(v);
break;

case 3:
not_finished = 0;
break;

} /* ends switch choice */
} /* ends while not_finished */
} /* ends main */


clear_belief_vector (v)
float v[];
{
int i;

for(i=0; i<LENGTH_OF BELIEF_VECTOR; i++)
v[i] = 0.0;
} /* ends clear_belief_vector */



display_belief_vector(v)
float v[];
{
int i, j;
char response[80];

j=1;
for(i=0; i<LENGTH_OF_BELIEF_VECTOR; i++){
if((j%5) == 0){
printf("\n");
j++;
}
if(v[i] > 0.0001){
printf(" [%3d]=%6f",i, v[i]);

j++;
}
}
printf("\n Hit RETURN to continue");
read_string(response);
} /* ends display_belief_vector */



enter_belief_vector(v, line)
float v[];
int line;
{
int i,
not_finished,
y;
float value;
y = line;

printf("\n> ENTER BELIEF VECTOR");

printf("\n> Enter the place (RETURN) and value (RETURN)");
printf("\n> (Enter -1 for place when you're finished)");

not_finished = 1;
while(not_finished){
printf("\n [__]=___");
y = wherey();
gotoxy(5, y);
get_integer(&i);
gotoxy(10, y);
get_float(&value);

if(i != -1){
v[i] = value;
} /* ends if i 1+ -1 */
else
not_finished = 0;

} /* ends while not_finished */
} /* ends enter_belief_vector */

/***************************************************************
*
* This is the function that implements Demptser's rule
* of combination.
* vector1 holds the original beliefs and will hold the
* result of the combination.
*
***************************************************************/

combine_using_dempsters_rule(vector1, vector2)
float vectorl[LENGTH_OF_BELIEF_VECTOR],
vector2 [LENGTH_OF_BELIEF_VECTOR];
{
float denominator,
sum_vector[LENGTH_OF_BELIEF_VECTOR];

int a,

i,
place;

/* set the sums to zero */
for(i=0; i<LENGTH_OF_BELIEF_VECTOR; i++)
sum_vector[i] = 0.0;

/* Now go through the intersection tableau. */
/* Look for the intersection of non-zero beliefs */
/* and save their products. */
for(a=1; a<LENGTH_OF_BELIEF_VECTOR; a++){
if(vector2[a] > 0.0){
for[i-0; i<LENGTH_OF_BELIEF_VECTOR; i++){
place = i & a;
if(vector1[i] > 0.0)
sum_vector[place] = (vector1[i] * vector2[a])
+ sum_vector [place];
} /* ends loop over i */
} /* ends if vector2[a] > 0.0 */
} /* ends loop over a */

denominator = 1.0 - sum_vector[0];
for(i=1; i<LENGTH_OF_BELIEF_VECTOR; i++)
vector[i] = sum_vector[i]/denominator;

} /* ends combine_using_dempsters_rule */



/* The following functions are I-O */


read_string(string)
char *string;
{
int eof,
letter,
no_error;

eof = -1;
no_error = 0;

while((letter = getchar()) != '\n' &&
letter != eof)
*string++ = letter;

*string = '\0';

return((letter == eof) ? eof : no_error);

} /* ends read_string */


clear_buffer(string)
char string[];
{
int i;
for(i=0; i<80; i++)
string[i] = ' ';

}

long_clear_buffer(string)
char string[];
{
int i;
for(i=0; i<300; i++)
string[i] = ' ';
}



#define is_digit(x) ((x >= '0' && x <= '9') ? 1 : 0)

#define is_blank(x) ((x == ' '] ? 1 : 0)

#define to_decimal(x) (x - '0')

#define NO_ERROR 0
#define IO_ERROR -1
#define NULL2 '\0'

get_integer(n)
int *n;
{
char string[80];

read_string(string);
int_convert(string, n);
}




int_convert (ascii_val, result)
char *ascii_val;
int *result;
{
int sign = 1; /* -1 if negative */

*result = 0; /* value returned to the calling routine */

/* read passed blanks */

while (is_blank(*ascii_val))
ascii_val++; /* get next letter */

/* check for sign */

if (*ascii_val == '-' *ascii_val == '+')
sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */

/*
* convert the ASCII representation to the actual
* decimal value by subtracting '0' from each character.
*
* for example, the ASCII '9' is equivalent to 57 in decimal.
* by subtracting '0' (or 48 in decimal) we get the desired
* value.

*
* if we have already converted '9' to 9 and the next character
* is '3', we must first multiply 9 by 10 and then convert '3'
* to decimal and add it to the previous total yielding 93.
*
*/

while (*ascii_val)
if (is_digit(*ascii_val))
*result = *result * 10 + to_decimal(*ascii_val++);

else
return (IO_ERROR);

*result = *result * sign;

return (NO_ERROR);
}



get_short(n)
short *n;
{
char string[80];

read_string(string);
int_convert(string, n);
}

short_convert (ascii_val, result)
char *ascii_val;
short *result;
{
int sign = 1; /* -1 if negative */

*result = 0; /* value returned to the calling routine */

/* read passed blanks */

while (is_blank(*ascii_val))
ascii_val++; /* get next letter */

/* check for sign */

if (*ascii_val == '-' *ascii_val == '+')
sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */

/*
* convert the ASCII representation to the actual
* decimal value by subtracting '0' from each character.
*
* for example, the ASCII '9' is equivalent to 57 in decimal.
* by subtracting '0' (or 48 in decimal) we get the desired
* value.
*
* if we have already converted '9' to 9 and the next character
* is '3', we must first multiply 9 by 10 and then convert '3'
* to decimal and add it to the previous total yielding 93.

*
*/

while (*ascii_val){
if (is_digit(*ascii_val)){
*result = *result * 10 + to_decimal(*ascii_val++);
if( (sign == -1) && (*result > 0)) *result = *result * -1;
}
else
return (IO_ERROR);
} /* ends while ascii_val */

return (NO_ERROR);
}



get_long(n)
long *n;
{
char string(80];

read_string(string);
long_convert(string, n);
}



long_convert (ascii_val, result)
char *ascii_val;
long *result;
{
int sign = 1; /* -1 if negative */

*result = 0; /* value returned to the calling routine */

/* read passed blanks */

while (is_blank(*ascii_val))
ascii_val++; /* get next letter */

/* check for sign */

if (*ascii_val == '-' *ascii_val == '+')
sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */

/*
* convert the ASCII representation to the actual
* decimal value by subtracting '0' from each character.
*
* for example, the ASCII '9' is equivalent to 57 in decimal.
* by subtracting '0' (or 48 in decimal) we get the desired
* value.
*
* if we have already converted '9' to 9 and the next character
* is '3', we must first multiply 9 by 10 and then convert '3'
* to decimal and add it to the previous total yielding 93.
*
*/


while (*ascii_val)
if (is_digit(*ascii_val))
*result = *result * 10 + to_decimal(*ascii_val++);

else
return (IO_ERROR);

*result = *result * sign;

return [NO_ERROR);
}




get_float(f)
float *(f);
{
char string[80];

read_string(string);
float_convert(string, f);
}

float_convert (ascii_val, result)
char *ascii_val;
float *result;
{
int count; /* # of digits to the right of the
decimal point. */
int sign = 1; /* -1 if negative */

double pow10(); /* Turbo C function */
float power(); /* function returning a value raised
to the power specified. */

*result = 0.0; /* value desired by the calling routine */

/* read passed blanks */

while (is_blank(*ascii_val))
ascii_val++; /* get the next letter */

/* check for a sign */

if (*ascii_val == '-' *ascii_val == '+')
sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */

/*
* first convert the numbers on the left of the decimal point.
*
* if the number is 33.141592 this loop will convert 33
*
* convert ASCII representation to the actual decimal
* value by subtracting '0' from each character.
*
* for example, the ASCII '9' is equivalent to 57 in decimal.
* by subtracting '0' (or 48 in decimal) we get the desired

* value.
*
* if we have already converted '9' to 9 and the next character
* is '3', we must first multiply 9 by 10 and then convert '3'
* to decimal and add it to the previous total yielding 93.
*
*/

while (*ascii_val)
if [is_digit(*ascii_val))
*result = *result * 10 + to_decimal(*ascii_val++);
else if (*ascii_val == '.') /* start the fractional part */
break;

else
return (IO_ERROR);

/*
* find number to the right of the decimal point.
*


* if the number is 33.141592 this portion will return 141592.
*
* by converting a character and then dividing it by 10
* raised to the number of digits to the right of the
* decimal place the digits are placed in the correct locations.
*
* 4 / power = (10, 2) ==> 0.04
*
*/

if (*ascii_val != NULL2)
{

ascii_val++; /* past decimal point */

for (count = 1; *ascii_val != NULL2; count++, ascii_val++)

/*************************************************
*
* The following change was made 16 June 1987.
* For some reason the power function below
* was not working. Borland's Turbo C pow10
* was substituted.
*
*************************************************/

if (is_digit(*ascii_val)){
*result = *result
+ to_decimal(*ascii_val)/((float)(pow10(count)));

/***********
*result = *result
+ to_decimal(*ascii_val)/power(10.0,count);
************/
}

else

return (IO_ERROR);
}

*result = *result *sign; /* positive or negative value */

return (NO_ERROR);
}

float power(value, n)
float value;
int n;
{

int count;
float result;

if(n < 0)
return(-1.0);

result = 1;
for(count=1; count<=n; count++){
result = result * value;
}


Listing 2 C Code to Implement Dempster's Rule of Combination
/*
* This is the function that implements dempster's rule
* of combination.
* vector1 & vector2 are belief vectors. vector2 will
* hold the result of the combination.
*/
#define LENGTH_OF_BELIEF_VECTOR 8

combine_using_dempsters_rule (vector1, vector2)
float vector1[LENGTH_OF_BELIEF_VECTOR],
vector2[LENGTH_OF_BELIEF_VECTOR];
{
float denominator, sum_vector[LENGTH_OF_BELIEF_VECTOR];

int a, i, place;

/* set the sums to zero */
for(i=0; i<LENGTH_OF_BELIEF_VECTOR; i++)
sum_vector[i] = 0.0;

/* Now go through the intersection tableau.
* Look for the intersection of non-zero beliefs
* and save their products. */

for(a=1; a<LENGTH_OF_BELIEF_VECTOR; a++){

if(vector2[a] > 0.0){

for(i=0; i<LENGTH_OF_BELIEF_VECTOR; i++){
place = i & a;
if(vector1[i] > 0.0)
sum_vector[place] = (vector1[i] * vector2[a])
+ sum_vector[place];

} /* ends loop over i */

} /* ends if vector2[a] > 0.0 */

} /* ends loop over a */

denominator = 1.0 - sum_vector[0];
for(i=1; i<LENGTH_OF_BELIEF_VECTOR; i++)
vectorl[i] = sum_vector[i]/demoninator;

} /* ends combine_using_dempsters_rule */




















































An Introduction To Speech Recognition


B.J. Gleason


B.J. Gleason is an Assistant Professor at Upsala College in New Jersey, and
holds a master's degree in computer science. He is currently working on an
Ed.D. in computer education at Nova University. Contact Mr. Gleason care of
the Computer Science Department, Upsala College, East Orange, NJ 07019.


"Open the pod bay door, Hal."
"I'm sorry, Dave, I'm afraid I can't do that." --
2001, A Space Odyssey
"Shields Open." --Batman
One of the friendliest user interfaces should be voice. At some point in the
future, we will be able to talk to our computer system and have it understand
us. In the two examples above, the computer understood the spoken word. In
Hal's case, he could also lip-read (An article for this is currently in
progress.). With the Batmobile, the speech recognition is more realistic, if
not the automatic pilot in the car.
In the above paragraph, I use the word "should" to describe voice as a
friendly interface. It should be, but at this time, it isn't. Not for lack of
trying, however. Speech recognition (SR) is not yet in the home although it
has been around for about 20 years. SR is still unreliable and not easy to
use. Yet.
Speech recognition is a wide field that must be broken down if it is to be
understood. There are two portions to a speech recognition (SR) system. The
first is the recognition portion. This is almost easy. The person says
something and the recognizer returns the words that were spoken. The second
part, the understanding of what was said, is much harder. The second portion
falls into the area of Natural Language Processing. In the Batman example, the
understand portion is easy. In the 2001 example, it is harder. For example,
Hal must not only know how to open the door, but why the door is to be opened.
Hal understands this, and realizes that he must not open the door for Dave.
In this article, I will describe the first portion, the recognition procedure.
It is beyond the present scope to describe Natural Language Processing. We
will take a look at the current techniques used in SR and I will provide you
with everything you need in order to experiment with SR on your own.


Intro To Speech Recognition


Most SR systems belong to one of two fundamental classes: speaker dependent or
speaker independent.
With speaker dependent systems, the user must first "train" the system to
recognize his or her voice. During training the system displays a word, the
user pronounces it, and the system saves the resulting voice pattern. Once a
sample of all the required words is stored, the user may begin using the
system. When the user speaks, the system will take the unknown voice pattern
and compare it to the patterns stored in memory. The word associated with the
closest matching pattern is returned as being the word the system believes the
user spoke. This is the most common technique.
Speaker dependent SR units are capable of 91% to 99% accuracy. While the
training is time consuming, this technique is economical. You can pick up one
of these units for less than 100 dollars. If you are electronically inclined,
you can build one for less than 50 dollars.
A speaker independent system requires no training. It should recognize words
as soon as it is turned on. Speaker independence is much harder to accomplish.
The voice pattern is broken down and analyzed for certain features -- features
chosen to distinguish among the words to be recognized.
Speaker independent systems are very attractive, and will become more so in
the future. At this time however, most speaker independent units are
expensive, achieve accuracies of only 85% to 95%, and often recognize fewer
words as well. Speaker independent systems require more processing power to do
the analysis, require dedicated processors and start in the 500 dollar range.
Radio Shack is now selling a speaker independent isolated word recognizer chip
for about 10 dollars. Build an amplifier circuit for it and it will recognize
nine different words. This particular chip is used in a number of voice
activated toys.


Specifications


How many words do we need to recognize? The ultimate dream, of course, is the
voice typewriter which would have to be speaker independent and to recognize
about 50,000 or so words. But many applications can get away with much less. A
simple voice calculator for example, would need to recognize only 10 digits,
four operations, a decimal point, and an equal sign -- only 16 words. Several
companies use voice inventory systems. These require the digits and a few
commands, again around 16 or so words.
Many applications need only a few words. A small vocabulary has important
advantages. Larger vocabularies require more memory and more processing time
to find a match. With larger vocabularies, accuracy will typically decrease.
Given the current state of the art, it would be best to limit the vocabulary
to the smallest possible number of words.
Speech habits also affect recognition systems. Most humans talk in what is
known as continuous speech. Most SR units depend upon isolated speech.
Research has shown that it is very difficult to separate the words in
continuous speech. The pauses between words in continuous speech are very
short -- sometimes even shorter than the natural pauses that occur within
words. The emphasis and accents common to continous speech create additional
problems. There is typically a marked difference between the pronunciation of
a sentence and a collection of words.
Dictating to an isolated word recognizer for dictation takes some getting used
to. You must pause, typically for one tenth of a second or more, between
words. Until you get used to it, it can be frustrating.


The Hardware


In order to talk to your computer, we need a device to translate voice waves
into binary values. Figure 1 shows a block diagram of a typical SR unit. The
microphone is fed into a preamp circuit. The output of the preamp is fed into
two bandpass filters. The lowband filter has a range of 150Hz to 900Hz. The
highband filter has a range of 1Khz to about 4Khz. The output of the bandpass
filters is a sinewave whose frequency falls within the range of the associated
filter.
Using these bandpass filters we can isolate the high and low frequencies of
the utterance. More sophisticated SR units might have more bandpass filters to
include midranges as well.
The output of the bandpass filters goes into a zero crossing detector (ZCD)
and a rectifier/averager (R/A) circuit. The ZCD is a comparator that tests its
input against a reference voltage. When the signal is above the reference
voltage, the output is one; when less than the reference voltage, it is zero.
The timing output of the ZCD will approximate the input frequency.
The R/A circuit converts the average RMS AC signal to its DC equivalent. This
signal is then inverted and fed into a comparator set to slightly less than
the reference voltage. With no speech, the normal output would be zero. When
speech enters the system, the output of the comparator will go to one,
indicating that voice input is present.
These four outputs are fed into a parallel port and read by the computer.


Speaker Independent Systems


Now that we can get the speech into the computer system, we need to process
it. There are several different techniques for speaker independent speech. The
technique I will describe is phonetic analysis.
The sound that we produce to form words can be broken down into several
categories:
Pure Voice Vowels (V): a,e,i,o,u,uh, aa,ee,er,uu,ar,aw
Nasal (N): m,n,ng
Voice Fricative (VF): z,zh,v,dh

Unvoiced Fricative (UF): s,sh,f,th
Plosive (P): b,d,g,p,t,k,h
Glide (G): r,w,l,y
Our speaker independent board and program would first take the speech
utterance and break it down into these categories. For example, vowels sounds
are continuous and generate low frequency energy, whereas fricatives are
continuous but generate high frequency energy. The software must look for the
characteristics of each group in the input speech. It would then generate a
sequence of these phonetic grouping and look up the word in a dictionary:

0 VF-V-G-V YES G-V-UF
1 G-V-N NO N-V
2 P-V START UF-P-V-P
3 UF-G-V STOP UF-P-V-P
4 UF-V-G
5 UF-V-VF
6 UF-V-P-UF
7 UF-V-VF-N
8 V-P
9 N-V-N
This small table starts to illustrate some of the problems with this
technique. As we add words, correctly grouping the phonemes becomes a parsing
nightmare. Notice that the phonetic makeup for START and STOP are the same!
How can we tell the difference? We can't. Not easily anyway. Larger
vocabularies well generate even more "collisions". In a 20,000 word
dictionary, there are only about 2000 unique phonic combinations. 10,000 of
the words are five phonetic symbols or less.
Another technique, template averaging, reads in several copies of a word and
then finds the major features of the word. This technique is a mix between
speaker dependent and independent. In the training (similar to speaker
dependent) phase, five people save the same set of words. The program finds
the major features and uses these collective patterns as the model. The system
is then speaker independent and will recognize almost any voice pronouncing
the word.


Speaker Dependent Systems


Speaker dependent systems are reasonably straightforward. They all work on the
same principle. In the training phase, the sounds are stored in a reference
template. In the speech phase, the unknown sound is compared against the known
sounds for the closest match.
I have prepared a program and data files that will allow you to experiment
with an isolated word, speaker independent system. The program appears in
Listing 1. The data files are available on the code disk for this issue.


How It Works


If we are training or analyzing the speech, the same process goes on when a
person talks into the microphone.
To capture sound and reproduce any waveform, we must sample it at twice the
highest frequency (the Nyquist rate). For speech we must capture and store
8000 four-byte samples per second. At this rate, we would need 32K of storage
for a single word. Most SR units don't really sample at the Nyquist rate,
since they don't reproduce the sound. SR units commonly capture only 100
four-byte samples per second. Assuming a maximum of 1.5 seconds of speech per
utterance, each word will require less than 600 bytes of storage.
The comparator circuit converts the output of the bandpass filters into square
waves at a relative frequency. The software counts the number of square waves
in both the low and the high bandpass filters during each 0.01 second
interval, producing a one-byte approximation to the frequency in each band.
These numbers, along with the values of high and low energy (four values all
together), are stored in a "raw speech buffer". Up to 150 samples can be
stored, enough for about 1.5 seconds of speech. The beginning of speech is
indicated by at least 0.1 seconds of sound, and the end of speech is at least
0.1 seconds of silence.
The "raw" bytes produced by this process are still not very useful, as they
may still be "distorted" by "time warping." If you say the word ONE twice and
each time say it slightly slower, each repetition will appear to be a longer
word. To compensate for this variance, we time-normalize the word so that it
can be matched against faster or slower pronunciations.
Once the end of speech is detected, the buffer is broken down into 16 evenly
spaced points, based on the total length of the word. At each of the 16
points, a value is taken from the high and low energy, as well as the two zero
crossing detectors. The final result is a 64-byte template of the utterance.
Time normalization forces all the templates to be the same size, making it
easier to compare against one another.
A template constructed during the training phase would be stored in an array,
indexed to the number of a word. A template constructed while analyzing the
speech is instead compared to all the known templates in the array.
During the recognition phase, the speech input is reduced to a template which
is compared against the stored templates. As the unknown template is compared
against each stored template, a difference value -- the minimum difference --
is computed.
During the matching process, each minimum difference is also compared against
a rejection limit; if the program can't find a match with a minimum difference
at least as small as the rejection limit, it will respond "unknown word"
instead of making a wild guess. Thus, the rejection limit sets the degree of
"confidence" required to announce a match. Higher rejection limits will result
in more erroneous matches; lower limits will result in more "unknown word"
responses.
Calculating all the minimum differences is time consuming. We can avoid some
of these calculations by abandoning the calculation as soon as the partial
result exceeds the rejection limit. We can get even better results by
"remembering" the best minimum difference calculated so far and abandoning the
calculation as soon as the partial result exceeds this limit. Both
optimizations can be folded into one test if during the search the "rejection
limit" is adjusted dynamically so that it is either the programmed rejection
limit or the best minimum difference seen during the current matching attempt.
We can enhance the rejection process by using the delta difference technique.
As we calculate the differences of all the words to the unknown sample, we may
get two words that are very close to each other but both under the rejection
limit. The delta difference technique requires that the correct word "beat"
the other words by a certain amount. If the delta difference is set to 10, and
one template difference is 124 and the other is 129, both candidates will be
rejected. In general, the greater the delta difference between the two best
choices, the better.
Once a speaker dependent system is trained, it is tested. If a particular word
generates a large number of errors, the user may re-train on that particular
word.


Operation Considerations


To get the best results with any voice recognition circuit, remember a few
simple guidelines: speak slowly and clearly; try to repeat the words as
consistently as possible; operate it in a quiet environment; be careful when
choosing your word lists, avoiding words that sound alike; and hold the
microphone close to your lips.
The smaller the list of words, the greater the accuracy of the system. For
example, for a game of Hi/Lo, only four words are needed: higher, lower, yes,
and no. By clearing the other templates, your recognition of these should be
100 percent.
One can also increase recognition by storing two templates (derived from
different training sessions) in the vocabulary. This will give you two chances
to match the word.


The Program


The main program, called SPEECH.C, is the simplest version of the speech
recognition algorithm. This program will work quickly with reasonable
accuracy. The code was written in Turbo C v2.0 and has been written for
clarity rather than speed or compactness.
The program is menu-driven. It has options to load and train a set of voice
(data) files. With the "Train" option you may vary the voice files
individually. The "Debug" option (normally on) will display the waveforms of
the words and indicate the elements being extracted for the templates. During
the "Perform" choice, the debug option will also display the minimum
difference table along with the delta difference. With the debug option off,
the system will just display which word is recognized.


Data Files



The disk includes data files for the digits zero through none and a telephone
number (with area code). These files are taken from the raw speech buffer.
These files are included so that you can experiment with voice recognition
algorithms without having the hardware. The function getvoice() on my system
will wait for a word to be spoken and place the raw speech into an array. On
your system, it will open up and read the data file and place the raw speech
into an array.
If the debugging feature is turned on, a character-based plot of the wave
forms will be generated on the screen in a vertical fashion as they are read
in. If this drawing is too crude for your tastes, you can import these files
into a spreadsheet program, such as Lotus, and plot them.
You can use the digit files to train the SR system, and then use it to
recognize the phone number.


Alternate Techniques


I include two sets of digit data, each set captured during a separate speech
session. You can use this extra data to experiment with techniques that may
increase the accuracy of the system:
Duplicate Entries -- Train the system with both sets, so you have twenty
templates. During recognition, if the template matched is greater than nine,
subtract 10 to get the "real" number.
Template Averaging -- Take two samples of each digit, average them together
and use the result as the template.
Input Averaging -- Average each point on each band to the point directly ahead
and behind. This will help to smooth the band and eliminate noise that can
creep into the system.
Amplitude Normalization -- On some system, the volume of the sound can greatly
effect the signals coming from the speech board. You can eliminate most of
this effect with normalization. Calculate the average of an entire band,
compare this to a standard amplitude factor (for example, 20), then multiply
this factor by each element of that band. This will have a sliding effect on
the band. Alternatively, you can just subtract the average from each element.
Interpolation -- When extracting the 16 samples from the raw speech buffer, I
calculated the precise position, but used the nearest element. For a more
exact representation of the wave form, a new sample should be interpolated
from the adjacent two samples.
Number of Samples -- I have used 16 normalized samples. You can vary this
number up or down and test the results. The larger values save more
information but increase the processing time. Smaller values increase
processing speed, but make for closer delta differences.


Speech Recognition With Natural Language


Many databases now have "natural" language interfaces so that you can ask
questions such as "What is the highest mountain in New Jersey?"
While many of these interfaces seem natural, most of them have a hidden
structure behind them. For example, in the Batmobile after the word "Shields",
the system will probably accept "Open", "Close" and nothing. If only the word
"Shields" is spoken, the shields will close by default.
The SR unit would have the words Shields, Open, Close, Stop, and a wide (we
assume) list of other words. To help cut down on processing time, we can
eliminate illogical words as we parse the sentence. "Shields Stop" would be
illogical. When we identify the word "Shields", we compare the next utterance
to "Open" and "Close" only. If no word is spoken, or the word is rejected, we
close the shields.
Natural language processing also helps with the "To, Too and Two" problem of
similar sounding words. If we said "I want to go too", the NL processor would
be able to sort out the correct usage based on the context in which the word
was said.


Acknowledgements


The author would like to thank Larry Eckelkamp and Nuala Murphy for their help
with this article.


Bibliography


Ainsworth, W. A., Mechanisms of Speech Recognition, Elmsford, NY: Pergamon
Press, 1976.
Carter, John P., Electronically Hearing: Computer Speech Recognition,
Indianapolis, IN: Howard W. Sams & Co., Inc., 1984.
Staugaard, Jr. Andrew C., Robotics and AI, Englewood Cliffs, NJ:
Prentice-Hall, Inc., 1987.
Figure 1 Circuit Block Diagram

Listing 1
/* Speech Recognizer */

/* bj gleason, Upsala College, Computer Science Department */
/* East Orange, nj 07019 (201)-998-1037 */

#include <stdio.h>
#include <conio.h>

#define RAWBUFFERSIZE 150 /* 100 samples/second 1.5 seconds */
#define VOCABSIZE 10 /* 10 digits, 0 - 9 */
#define NUMSAMPLES 16 /* Number of samples to extract */
#define NUMBANDS 4 /* High, Low Freq, High, Low Energy */
#define BIGNUM 32767 /* large number for min diff. calc */


int rawspeech[RAWBUFFERSIZE][NUMBANDS]; /* to hold raw speech */

int index[VOCABSIZE]; /* indicate if digit trained */
int template[VOCABSIZE][NUMSAMPLES][NUMBANDS]; /* known templates */
int unknown[NUMSAMPLES][NUMBANDS]; /* unknown voice template */
int min_diff[VOCABSIZE]; /* min diff. each digit */
int sam_size; /* current sample size */
int debug; /* show debugging info */

/* getspeech will read in a file from disk. The length in bytes */
/* will be returned. The rawspeech buffer will be modified. */
int getspeech()
{
FILE *fptr;
int i,j;
char fname[80];

if (debug) printf("\nReading in Speech");
printf("\nEnter name of file?");
gets(fname);
if ((fptr=fopen(fname,"rt"))==NULL)
{
printf("\nCant find file %s",fname);
return(0);
}
else
{
for(i=0;i<=RAWBUFFERSIZE;i++)
for(j=0;j < NUMBANDS;j++)
{
if ((fscanf(fptr,"%i",&rawspeech[i][j]))==EOF)
{
fclose(fptr);
return(i);
}
}
}
}

int plot_it()
{

int i,j,x,y;

printf("\n\n");
for (i=0;i < sam_size;i++)
{
for (j=NUMBANDS-1; j >= 0; j--)
{
x=rawspeech[i][j]+(j*2);
gotoxy(x,wherey());
putchar(j+48);
}
printf("\n");
}
}

/* the closest match routine compares the unknown template with */
/* known templates. It builds a minimum difference list that is */
/* the difference between unknown and each known. We then scan */
/* list to find the closest match. */

int closest_match()
{
int p,i,j;
int low, next_low,digit,next_digit;

if (debug) printf("\nFinding Closest Match");

for (p = 0; p < VOCABSIZE; p++)
if (index[p] != 0)
{
min_diff[p] = 0;
for(i = 0; i < NUMSAMPLES; i++)
for(j = 0; j < NUMBANDS; j++)
/* for each digit, find the absolute difference */
/* between known and unknown templates */
min_diff[p] = min_diff[p] + abs(unknown[i][j]
-template [p] [i] [j]);
}
else
{
min_diff[p]=BIGNUM; /* put in a big number if digit not */
} /* trained. */

/* min_diff now has the difference for each template. Search */
/* to find the smallest difference. This will be our digit. */
/* Find the next lowest match to calculate the delta diff. */
digit = -1;
next_digit = -1;
low = BIGNUM;
next_low = BIGNUM;
if (debug) printf("\nTP# Diff Low Digit");
for (p = 0; p < VOCABSIZE; p++)
{
if(min_diff[p] < low)
{
next_low = low;
next_digit = digit;
digit = p;
low = min_diff[p];
}
if (debug) printf("\n%3i %5i %5i %2i",p,min_diff[p],low,digit);
}
if (debug == 1)
{
printf("\nMinimun Difference was %i, Digit is %i",low, digit);
printf("\nNext Closest Diff was %i, Digit is %i",next_low
,next_digit);
printf("\nWith the delta difference of %i",next_low-low);
}

/* it would be right here where your would add the code */
/* to set a rejection limit or a delta difference limit. */
/* If the digit is rejected, send back error, such as -1. */
return(digit);
}

/* Extract template will extract a template from the raw */
/* speech buffer. This is to reduce the size of the */
/* template and to elimate time warping. */

/* the rate is kept in floating point to prevent truncation */
/* errors. */
int extract_template()
{
int i,j,p;
float rate,x;

if (debug) printf("\nExtracting Template");
rate = (float) sam_size / NUMSAMPLES;
p = 0;
if (debug)
{
printf("\nExtracting %i elements from Raw Speech",
NUMSAMPLES);
printf("\nTake every %f element", rate);
printf("\n\n UN RS");
}
for (x = 0; x < sam_size ; x = x + rate)
{
for (j = 0; j < NUMBANDS; j++)
unknown[p][j] = rawspeech[(int)x][j];
if (debug) printf("\n%3i %3i",p,(int)x);
p++;
}
}
/* During the training phase, this will take the extracted template */
/* and store it in the array of known templates. */
int store_template(int position)
{
int i,j;

if (debug) printf("\nStoring template at position %i",position);
for (i = 0; i < NUMSAMPLES ; i++)
for (j = 0; j < NUMBANDS; j++)
{
template[position][i][j] = unknown[i][j];
}
}

/* Perform - Get the speech, extract an unknown template, compare */
/* against the rest, and print the resulting digit. */
int perform()
{
int digit;

sam_size = getspeech();
if (debug) plot_it();
if (debug) printf("\nSize of Sample = %i",sam_size);

extract_template(); /* break raw buffer up and */
/* place into unknown template */

digit = closest_match();
printf("\nDigit spoken was %i",digit);

}

/* Training - Get the speech, extract an unknown template, */
/* find from the user what digit it was, then store it in */

/* the known template array. */
int train()
{
char ans[10];
int digit;

sam_size = getspeech();
if (debug) plot_it();
if (debug) printf("\nSize of Sample = %i",sam_size);
printf("\nEnter the digit spoken ?");
gets(ans);
digit = atoi(ans);

index[digit] = 1; /* indicate this digit is trained */

extract_template(); /* break raw buffer up and */
/* place into unknown template */

store_template(digit); /* store the template */

}

/* Eztrain - This is to quickly load in files a0 - a9 */
int eztrain(char fname[80], int digit)
{
FILE *fptr;
int i,j;
if ((fptr=fopen[fname,"rt"))!=NULL)
{
sam_size = 0;
printf("\nReading file %s", fname);
for(i=0;i<=RAWBUFFERSIZE;i++)
for(j=0;j < NUMBANDS;j++)
if ((fscanf(fptr,"%i",&rawspeech[i][j]))!=EOF)
sam_size = i;
fclose(fptr);
if (debug) plot_it();
if (debug) printf("\nSize of Sample = %i",sam_size);
index[digit] = 1;
extract_template();
store_template(digit);
}
}


main()
{
int i;
char ans[80];
char choice;

/* clear the training index... nothing has been entered */
for (i=0; i<VOCABSIZE; i++)
index[i] = 0;


printf("\nWelcome to Speech Recognition Demo, Version 1.0\n");

debug = 1; /* display debugging information */

do
{
printf("\n\nTrain, Perform, Load A or B, Debug ");
if (debug) printf("Off"); else printf("On");
printf(", or Quit? (T/P/A/B/D/Q)");
gets(ans);
choice = toupper(ans[0]);
if (choice == 'A')
{
eztrain["a0",0); eztrain["a1",1); eztrain["a2",2);
eztrain["a3",3); eztrain["a4",4); eztrain["a5",5);
eztrain("a6",6); eztrain("a7",7); eztrain("a8",8);
eztrain("a9",9);
}
if (choice == 'B')
{
eztrain("b0",0); eztrain("b1",1); eztrain("b2",2);
eztrain("b3",3); eztrain("b4",4); eztrain("b5",5);
eztrain("b6",6); eztrain("b7",7); eztrain("b8",8);
eztrain("b9",9);
}
if (choice == 'D')

{
debug = !debug;
printf("\n Debugging Trace ");
if (debug) printf("On"); else printf("Off");
}
if (choice == 'T') train();
if (choice == 'P') perform();
}
while(choice != 'Q');
printf("\n\nAll done.");
}





























The World Of Command Line Options


Scott Maley


Scott Maley is a member of the technical staff at The Analytic Science
Corporation (TASC). He has more than fifteen years experience in areas of
software engineering ranging from Space Shuttle flight simulation to Cobol
maintenance.


Visual, iconic or graphic interfaces reduce complexity for the user by
relegating details to another level. Complexity is neither created or
destroyed -- it only changes its appearance or location and distribution.
Thus, beneath the surface of many visual interfaces the various graphical
tools exchange a great deal of information, often by means of command line
options, where the programmer, rather than the user, must deal with the
complexity.
This article describes a command-line option-handling package that pushes some
of the command line complexity down a level, reducing the amount of complexity
with which the programmer must deal. This article is not intended to help you
decide when to use command line options. If you have decided to use them, it
may help you decide how. I have assumed that you know how arguments (which
include options) are passed to a C program. Even if that is not completely
clear, you must at least understand that a pointer is a way to refer to an
object, and not the object itself.


Overview


Options are used in two fundamentally different ways, distinguished by their
position (in)dependence. An option may indicate that something is desired for
all arguments (position independent), or for all following arguments (position
dependent). No option can be both, because we can't distinguish which is
intended unless context is expanded to more than one option at once.
The function cmd_opts presented here assumes that it must deal with only
position independent options. It associates arguments with each recognized
option (basically a sorting process) and leaves all unrecognized options,
including position dependent options, undisturbed.
Thus, the cmd_opts function handles position independent options, and the
programmer continues to handle position dependent options.
cmd_opts accepts the same number of arguments as getopt (provided with UNIX
System V). The first and second arguments are identical to those passed to
getopt. The third encapsulates expanded information in a nil terminated array
of options structures, each option isolated in a separate structure.
cmd_opts works backwards through the command line, grouping arguments for
options as it goes. By working backwards, the pointer to each option's array
of arguments ends up right where we want it, pointing to the first associated
argument; cmd_opts leaves any options it doesn't understand where it found
them (relatively speaking) and returns an error count to warn us. See Listing
3.


The Options Structure


Listing 1 presents the elements of the structure options. The first element,
options.s, is the character that will be used on the command line for this
option (e.g. s in -s). Options.arg_flg indicates the minimum number of this
option we expect on the command line. The third element, poptv, controls how
arguments are associated with options. If poptv is set to NULL, no arguments
can be associated with the option. Otherwise cmd_opts will point poptv to the
first such argument.
Note that the ADDRESS (&) of the array of pointers (e.g. barg) must be placed
in options.poptv, so that it may be set to point to the first of any arguments
for the option. The arg_flg for each option is used to return the number of
valid instances of the option that were encountered. It may also be directly
used as a flag, since a nonzero value is considered TRUE in C.
The second (arg_flg) and third (poptv) elements of the options structure
interact to determine how an option is handled by cmd_opts. Listing 2 presents
examples of how they interact, where:
 aoptional, no associated arguments
 boptional, arguments expected
 crequired, no associated arguments (for completeness)
 drequired, arguments expected
You can gain a better understanding of the interactions by experimenting with
various combinations (compile and link tcmdopts.c with cmd_opts. c -- both
include cmd_opts.h).


Conclusions


cmd_opts is much easier to use than the widely used getopt package provided
with UNIX System V (source code freely available). While getopt accepts a list
of option switch characters and has a means of specifying which require
arguments, it has a number of shortcomings. After all the work it does to
isolate option switches and associated arguments, getopt requires you to
perform similar work. Once getopt returns with a switch character you must
determine which it was, then associate any arguments. Worse, getopt passes
some of the information via globals (e.g. optarg).
cmd_opts handles the burden of associating arguments with options for those
which are position independent. Yet, it leaves any unrecognized options where
they were, so that we may handle position dependent options. Thus, we may
typically dispense (at least for position independent options) with the switch
statement that is often used to associate arguments with options when using
the getopt function. No globals are used by cmd_opts, so possible side-effects
have been minimized, and the package is more usable in shared code libraries.
Finally, encapsulating the information associated with each option in a
structure makes cmd_opts easier to understand and use than getopt.

Listing 1
/* cmd_opts.h, c\include
* structure definition for command line options
*/
struct options
{
char s; /* valid switch letter */
int arg_flg; /* flag to indicate an
argument is required */
char ***poptv; /* pointer to option's
value vector */
} ;


Listing 2

/* tcmdopts.c, c\lib\test
* Test cmd_options routine
*/
#include <stdio.h>
#include "cmd_opts.h"

main(argc, argv) 
int argc;
char *argv[];
{
int cmd_errs, i;
static char **barg, **darg;
static struct options sw[] =
{'a',0,NULL,
'b',0,&barg,
'c',1,NULL, /* generally useless */
'd',1,&darg,
0, 0,NULL};

cmd_errs= cmd_options( & argc, argv, sw);
if (sw[0].arg_flg > 0)
printf("%d -a\n",sw[0].arg_flg);
for (i= 0; i < sw[1].arg_flg; i++)
printf("-b %s\n", barg[i]);
if (sw[2].arg_flg > 0)
printf("%d -c\n",sw[2].arg_flg);
for (i= 0; i < sw[3].arg_flg; i++)
printf("-d %s\n", darg[i]);
puts("Unclaimed:");
for (i= 1; i < argc; i++) /* argv[0] is still
the command */
printf(" %s",argv[i]);
puts("\n");
if (cmd_errs != 0)
{
printf("\7\ntcmdopts [-a] [-b<value>] -c -d<value> ...\n");
printf("\n%d Command line options invalid\n", -cmd_errs);
exit(1);
}
}

Listing 3
/* cmd_opts.c, c\lib\src, (c) 1989 Scott D. Maley
May be freely used, as long as copyright notice is preserved

cmd_options(argc, argv, option)

int *argc; -- pointer to command line arg count
char *argv[]; -- pointer to array of pointers to
command line arguments
struct options option[]; -- structure array defining valid
options

This is a function to process command line options (or
switches). The full set of command line arguments is passed
to the routine via argc and argv. Every option switch
encountered that is a valid match for a switch specified in
the option array is counted, removed from argv, and the
pointer to it's associated value (if any) is moved to

the optv array. A count of switches which are not valid
matches of any option is returned, and those switches are
left in argv.

A switch's value may be contiguous with it, or be separated
from it by white-space (e.g. -svalue, -s value).
White-space is commonly blanks and tabs, but may also include
commas in some C implementations. This routine doesn't care.
The C runtime initialization routine which runs before main()
is entered parses the command line into tokens (which the
elements of argv point to), based on it's definition of
white-space.

The structure "options" is used to define what this routine
will parse:

struct options
{
char s; -- The option (switch) letter
int arg_flg; -- indicates if an arg is required
char **poptv[];-- pointer to option value vector
-- NULL, if none expected
} ;

The third argument to this routine, option, is an array of
the options structures. The end of this array is signaled
with s == 0.

This routine returns:
0 - if all switches encountered were valid options.
-n - Negative of the count of invalid (e.g. no
value followed the switch when one was expected, or
a value was contiguous with the switch, but none
was expected) switches encountered. N also includes
a count of switches that were expected, but not
encountered in argv.

It also sets arg_flg to indicate how many of each
switch encountered.

Sample use:
----------

#include <stdio.h>
#include "cmd_opts.h"

main(argc, argv)
int argc;
char *argv [];
{
char s, *farg[] *marg[];
static struct options sw[] =
{'a', 0, NULL, -- optional, no value
'f', 0, &farg, -- optional, w/ value
'm', 1, &marg, -- required, w/ value
0, 0, NULL};

if (cmd_options( & argc, argv, sw) < 0)
{

--- error, handle it here
---
}
---
--- continue with rest of program
---
}

*-- History:
* 30 Jan 89 SDM (TASC) No need to calloc optv, we
* can work entirely within argv (plus
* a temp pointer).
* 27 Jan 89 SDM (TASC) Handle multiple instances
* of a switch.
* Retain everything not
* specified in opts in argv, and set
* argc accordingly.
* 20 Jan 89 S.D. Maley (TASC) Initial implementation.
*-- End History
*/

#include <stdio.h>
#include "cmd_opts.h"

#define EOS '\0'

#define MoveOptFromArg(optv,argv,i,argc) \
(char *temp;\
temp= argv[i];\
RemoveArg(argv,i,argc);\
(optv)--;\
optv[0]= temp;\
}

#define RemoveArg(argv,i,argc) \
{int j;\
(argc)--;\
for(j=i;j<argc;j++) argv[j]=argv[j+1];\
}

#define SWFLG '-'
#define SwChr *(argv[i]+1)
#define SwMatch (*argv[i] == SWFLG && opts[j].s == SwChr)
#define SwValContig (*(argv[i]+2) != EOS)
#define SwValNext (i+1 < *argc && *argv[i+1] != SWFLG)

cmd_options(argc, argv, opts)
int *argc;
char *argv[];
struct options opts[];
{
int i,j, njth, stat;
char **optv; /* equivalent to: *optv[] */

optv = argv + *argc; /* work from back to front */

/*-- Transfer options from argv to optv
* -- and check against expectations
*/

stat = 0;

for (j = 0; opts[j].s != 0; j++)
{
njth = 0;

for (i = *argc - 1; i > 0; i--)
{ /* back to front, we build optv */
if (SwMatch)
{
if (opts[j].poptv == NULL)
{ /* no arg value desired */
if (SwValContig)
continue; /* next i */
else
RemoveArg(argv,i,*argc);
} else { /* A value is desired */
if (SwValContig)
{
argv[i] += 2; /* past "-'opt_char'" */
MoveOptFromArg(optv, argv,i,*argc);
} else if (SwValNext)
{ /*-- pick up value from next arg */
RemoveArg(argv,i,*argc);
MoveOptFromArg(optv, argv,i,*argc);
} else
continue; /* next i */
}
njth++; /* only count valid switches */
} /* if SwMatch */
} /* for i */

if (opts[j].poptv != NULL)
*opts[j].poptv= optv; /* point to option value vector */
if (opts[j].arg_flg > njth )
stat _= opts[j].arg_flg - njth ; /* not enough */
opts[j].arg_flg = njth;

} /* for j */

for (i= 1; i < *argc; i++)
if (*argv[i] == SWFLG)
stat--; /* a switch we couldn't handle */
return(stat);
}


















Multitasking With Lightweight Threads


Gregory Colvin


Trained in cognitive psychology, Dr. Colvin first learned to program in 1972,
in BASIC on a PDP-8. He later had the distinction of being the first Cornell
University graduate student to purchase an Apple II with student loan money,
and has been happily hacking microcomputers ever since. He has been
programming professionally in C since 1983. He welcomes comments and queries
at 680 Hartford, Boulder, CO 80303 (303) 499-7254.


Often, and against my better judgment, I contract to create applications under
operating systems which do not support multitasking. Lightweight threads can
sometimes be used to circumvent this limitation; both Microsoft's Windows and
Apple's Multifinder use lightweight threads to retrofit multitasking
facilities to single tasking operating systems.
This article presents the ANSI C source for a multitasking kernel based on
lightweight threads. I have tried to make this kernel as small, fast, simple,
and portable as possible. I have successfully used the predecessor of this
kernel to implement a real-time graphics display system and to provide
background query processing for a database application.


Threads


A thread of computer execution consists of at least three elements: a memory
segment containing executable machine instructions, an instruction pointer
register which indicates the next instruction to execute, and a data segment
for variable allocation. Most computers also support function calls by
providing a stack memory segment; a frame pointer register, which points to
the current stack frame; and a stack pointer register, which points to the
next available space for a stack frame. A stack frame contains the arguments
and local variables for a function, and the instruction pointer and frame
pointer of the function that called it (see Figure 1).
A function call typically creates a new stack frame by pushing the function
arguments and the current instruction and frame pointer registers on the
stack, moving the current stack pointer to the frame pointer register,
decreasing the stack pointer enough to leave room for local variables, and
moving the address of the called function to the instruction pointer. Thus
nested function calls result in a linked list of stack frames on the stack
segment, which is traversed as functions return.


Context Switching


A multitasking operating system can execute several threads "simultaneously"
on one machine. Since most machines can only execute a single instruction at a
time, only one thread is really executing at any one time, but multiple
threads appear to run simultaneously because the O.S. performs context
switches between threads at frequent intervals. At each context switch, the
O.S. saves the contents of the machine registers for the current thread and
restores the state of the registers for another thread. Each thread has its
own code, data, and stack segments, so that threads cannot ordinarily
interfere with one another.
C, unlike ADA or Modula 2, is single threaded, so that an executing C program
has one instruction pointer and one set of memory segments. However, the ANSI
Standard C library does provide a pair of functions, setjmp() and longjmp(),
for saving and restoring the contents of the machine resisters. I have used
(some might say abused) these functions to implement multiple threads within a
single C program.
The setjmp() and longjmp() calls are unusual, in that the longjmp() call, when
successfull, never returns to the function that calls it, whereas the setjmp()
call can return to the same function any number of times. The first call to
setjmp(jmp_buf) saves the current state of the machine registers in the
jmp_buf structure and returns 0. A subsequent call to longjmp(jmp_buf,int)
restores the saved registers, causing the nonzero int specified in the calling
argument to be returned by setjmp(jmp_buf) to the function that called it.
Usually longjmp() is used to abort from errors in deeply nested functions
without actually returning from all the functions. An error handler is
installed with a C statement like:
if (error=setjmp(buf))
e(error);
else
f();
The call to setjmp() returns false, so that function f() is called. Errors in
f() or any functions called within f() can then be handled by calling longjmp
(buf,error), which causes setjmp() to return error, so that e(error) is
called. To use these functions to support switching among multiple threads
requires several jmp_buf structures -- at least one for each thread of
execution.


Implementation


To implement lightweight threads, the single thread of execution of a normal C
program must be divided into independent tasks. Since a C program begins its
life with a single stack segment, this segment must be divided into pieces,
one for each thread. The ThInit(int n,int size) function does this by first
calling setjmp() to mark an entry point for a thread, then calling itself
until enough room for a thread (size bytes) has been used on the stock. It
repeats this recursive process until the desired number of threads (n) has
been created. The saved machine registers for each thread are kept in Threads,
a global array of thread structures, one structure for each thread. Having
thus divided up the stack, new threads can be created by ThNew(void (*root)
(int)), which simply saves the address of the root() function for the new
thread in Root, then does a longjmp() to restore the registers set by
ThInit(). The reactivated ThInit() calls (*Root) (ThCurr), and the new thread
is underway. ThNew() returns the ID of the new thread, which is a non-zero
index into the Threads table.
The threads created in this way are lightweight in two senses. First, they
share the same code and data segments, and are thus not protected from each
other by the operating system's memory management. Second, they are are
treated as one process by the operating system, and thus are not automatically
switched. Thus, the ThJump(int ID) function is needed so that a thread can
cause a context switch to another thread, specified by ID. ThJump calls
setjmp() to save the state of the current thread, then calls longjmp() to
restore the saved state of the destination thread. The ThJump() function does
not return until another thread jumps back to it, in which case it returns the
thread ID of the jumper. Figure 2 shows a picture of the stack and the global
Threads array while running two threads. Thread 1 is the initial thread (that
is, the thread that called ThInit()), and Thread 2 is the currently running
thread, which has made several function calls since it was jumped to by Thread
1.


Communication And Deadlock


Since lightweight threads share the same data segment they can communicate
easily through global variables and shared memory buffers. On this basis, you
may implement semaphores, pipes, messages, or any other communication
discipline. Whatever communication method you choose, you must "beware the
Jabberwock" of deadlock.
For instance, a simple message passing scheme (not a very efficient one) can
be implemented by placing the address of a message into an array of pointers,
one for each thread, initialized to zero:
Message = (char **) calloc(N_Threads,
sizeof(char *));
and then jumping to the message destination:
void msg_send(char *message,int
destination)
{
int id = ThId();
Message[id] = message;
do
ThJump (destination);
while (Message[id]);

}
The ThId() macro is used to get the ID of the current thread. The sending
thread waits in a loop until the message is received. The destination thread
can then receive a message with:
char *msg_recv ()
{
int id;
char *m;
do
id = ThJump(ThNext());
while (!Message[id]);
m -- Message[id];
Message[id] = 0;
return m;
}
If no thread ever sends a message then the receiving thread will never leave
the loop, a condition called starvation. This may not be a problem, since if
no message is sent there may be nothing for the waiting thread to do. However,
consider the following code, which waits for a message from a particular
thread:
char *msg_wait(int godot)
{
int id;
char *m;
while (!Message[godot])
ThJump(godot);
m = Message[godot];
Message[godot] = O;
return m;
}
If Thread 1 is in the loop, waiting for a message from Thread 2, and Thread 2
is also in the loop, waiting for a message from Thread 1, then neither thread
will ever get out of the loop, and no other threads will get to run. This is a
deadlock.


Preventing Deadlock


In general, deadlock can occur whenever threads must block to wait for
exclusive use of a particular resource (memory buffer,screen, keyboard,disk
controller, printer, etc.) that is in use by another thread. In the examples
above I have implemented blocking by busy waiting in a loop. It would be
better to add a flag to the thread structure, so that blocking could be
handled by the kernel. It would then be possible to centralize deadlock
control within the kernel.
The easiest way to prevent deadlock is simply not to share resources. For
instance, one thread might handle all file IO, another all printer output.
Another easy solution is to arrange resources and threads in a hierarchy, so
that there is never a conflict between threads. For instance, one thread might
buffer keyboard input, while a second thread reads the keyboard buffer,
performs computations, and buffers window output, and a third thread reads the
window buffers and displays them on the screen. Other cases are harder, and
the solutions tend to be task specific.
For example, database programs can prevent deadlock, and ensure data
integrity, with the concept of a transaction. A thread needing resources, such
as write access to a set of data records, sets out to acquire them, one by
one. If all the needed resources are acquired, the transaction succeeds and
releases its resources for the next transaction. If any resource cannot be
acquired, the transaction aborts, releasing all its resources. The thread then
waits for a while, and tries again. Thus, no thread is ever holding a resource
while waiting for another resource.
For other examples on preventing deadlock, be sure to check the relevant
literature for the tasks you are implementing. A good general discussion, with
C source code and further references, can be found in Andrew Tanenbaum's
Operating Systems: Design and Implementation (Prentice Ha11,1987). If you fail
to ensure that your application is deadlock free you can look forward to
mysterious system hangs and other evidence of Murphy's Law.


Caveats


I have tested the code presented here with Microsoft C 5.0 on my 386 clone and
with MPW C 3.0 on my Macintosh SE. On my clone the kernel compiles to under
one K of code, and executes over 80,000 jumps per second. (Ed: The code is
available from the CUG Library; see New Releases.) Be sure to design and test
very carefully if you implement a similar kernel. Although the setjmp() and
longjmp() functions are portable, this implementation depends on non-portable
details of stack implementation. Be especially careful not to overrun the
stack areas set up for each thread. I have provided a simple ThProbe() macro
that exits with a message if an overrun is detected, but I have succeeded in
crashing threads with printf() between calls to ThProbe(). For a more powerful
approach to ensuring stack integrity, see the article "A Stack Checking
Function" by Eric White in The C Users Journal (Volume 7, Number 3, April
1989). If you exercise proper care, you will find the concept of lightweight
threads to be a useful addition to your tool kit.
Figure 1
Figure 2

Listing 1
/*********** THREAD.C COPYRIGHT 1989 GREGORY COLVIN ************
This program may be distributed free with this copyright notice.
***************************************************************/
#include "thread.h"

thread *Threads; /* table of threads */
int ThCurr=1; /* current executing thread */
static int N_Threads; /* number of threads in table */
static int Free=2; /* first free thread */
static int Next=1; /* next runnable thread */
static char *Stack; /* bottom of stack for init */
static void (*Root)(void); /* for temporary use in exec */

thread *ThInit(int n,int size) /* create n size byte threads */
{ int i;

if (!N_Threads) { /* if just entered */
if (n < 2)
return 0; /* error, n too small */
Threads= /* create table */
(thread *)calloc(n,sizeof(thread));
if (!Threads)
return Threads; /* error, bad calloc */
Threads--; /* will index from 1 */
N_Threads= n, n= 1;
if (setjmp(Threads[1].exit)) /* set exit point */
exit(0); /* exit init thread */
} else if (!Stack) { /* start new thread */
Stack= (char *)&size; /* at top of stack */
if (setjmp(Threads[n].exec)) { /* set entry point */
if (!setjmp(Threads[ThCurr].exit))/* set exit point */
(*Root)(); /* call root function */
Threads[ThCurr].free= Free; /* come here on exit */

Free= ThCurr; /* put on free list */
Next= Threads[Free].next; /* take off run list */
for (i=1; i <= N_Threads; i++) { /* clean up table */
if (Threads[i].parent == Free) /* if abandoned child */
Threads[i].parent = 1; /* adopt by init */
if (Threads[i].next == Free) /* patch run list */
Threads[i].next= Threads[Free].next;
}
ThCurr= Threads[Free].parent; /* will jump to parent */
Threads[Free].parent= 0; /* Free is parentless */
longjmp(Threads[ThCurr].jump,Free);
}
}
if (Stack - (char *)&size < size) /* if not enough stack */
ThInit(n,size); /* push more stack */
else { /* done with a thread */
Threads[n].stack =(char *)&size; /* save top of stack */
Stack= 0 ; /* start new thread */
if (n < N_Threads) { /* if not done */
Threads[n].free= n + 1; /* link to free list */
ThInit(++n,size); /* push more stack */
} else
Threads[n].free = 0; /* at end of free list */

}
return Threads; /* done: return table */
}
void ThFree() /* free the thread table */
{ free (Threads+1); /* goodbye */
Threads= 0, N_Threads= 0; /* can init again */
}

int ThNew(void (*root)(void)) /* fork and exec new thread */
{ int parent, fork;
ThProbe(); /* stack probe */
fork= Free; /* fork to free thread*/
if (fork == 0)
return -1; /* error, none free */
Free= Threads[fork].free; /* take off free list */
parent= ThCurr; /* current is parent */
Threads[fork].parent= parent; /* set parent */

ThCurr= fork; /* will run fork next */
if (!Threads[Next].next) /* link to run list */
Threads[ThCurr].next= Next; /* make circular list */
else
Threads[ThCurr].next= Threads[Next].next;
Threads[Next].next= ThCurr;
Next= ThCurr; /* next on run list */
if (setjmp(Threads[parent].jump)) /* put parent to sleep */
return fork; /* parent is awake */
Root= root; /* who to call */
longjmp(Threads[ThCurr].exec,fork); /* call root from init */
}

void ThExit(void) /* exit to parent */
{ ThProbe(); /* stack probe */
longjmp(Threads[ThCurr].exit,ThCurr);
}

int ThJump(int id) /* jump to another thread */
{ int jumper, caller;
ThProbe(); /* stack probe */
if (id == 0 ) /* if no destination */
id= Threads[ThCurr].next; /* next on run list */

if (id < 1 id > N_Threads Threads[id].parent < 0)
return -1; /* error, bad id */
caller= ThCurr; /* where we came from */
if (id == caller)
return ThCurr; /* nowhere to go */
ThCurr= id; /* where we are going */
if (jumper=setjmp(Threads[caller].jump))
return jumper; /* return who jumped */
longjmp(Threads[id].jump,caller); /* jump to ThCurr */
}

static void test()
{
ThProbe();

printf("test: called from thread %d\n",ThId());
ThJump(0);
printf("test: falling off thread %d\n",ThId());
}

main()
{ int i;

ThInit(3,2048);
for (i=1; i <= 9; i++) {
printf("main: loop %d\n",i);
printf("main: created new thread %d\n",ThNew(test));
printf("main: created new thread %d\n",ThNew(test));
printf("main: exited from thread %d\n",ThJump(0));
printf("main: exited from thread %d\n",ThJump(0));
}
}


Listing 2

/*********** THREAD.H COPYRIGHT 1989 GREGORY COLVIN ************
This program may be distributed free with this copyright notice.
***************************************************************/
#ifndef THREAD
#define THREAD

#include <assert.h>
#include <setjmp.h>
#include <stdio.h>

typedef struct {
jmp_buf exec; /* state of thread for exec */
jmp_buf jump; /* state of thread for jump */
jmp_buf exit; /* state of thread for exit */
int parent; /* id of parent thread */
int nchildren; /* number of children */
int free; /* id of next free thread */
int next; /* id of next thread to run */
char *stack; /* top of stack for thread */
} thread;

extern thread *Threads; /* table of threads */
extern int ThCurr; /* current thread */

#define ThProbe() { char p; assert(Threads[ThCurr].stack < &p);}

#define ThId() ThCurr

thread *ThInit(int n,int size); /* create n size byte threads */
void ThFree(void); /* free the thread table */
int ThNew(void (*root)(void)); /* fork and exec new thread */
int ThJump(int id); /* jump to another thread */
void ThExit(void); /* exit to parent */

#endif




























Writing Your Own Standard Headers: <stdlib.h>, <stddef.h>, <stdarg.h> And
<limits.h>


Dan Saks


Dan Saks is the owner of Saks & Associates, which offers training and
consulting in C and C++. He is a member of X3J11, the ANSI C committee. He has
an M.S.E. in computer science from the University of Pennsylvania. You can
write to him at 287 W. McCreight Ave., Springfield, OH 45504 or call (513)
324-3601.


In "Writing Your Own Standard Headers: The String Functions" (The C Users
Journal, Jan. 1990), I presented some basic rules for creating standard
headers, and then I showed you how to apply those rules to create <string.h>.
This article shows how to write five other headers you most likely need. But
first, here is a non-standard header that simplifies writing the standard ones
and eliminates some irritating portability problems.


<quirks.h>


The standard headers frequently use void and void * types. void indicates that
a function returns no value, as in
void exit(int);
or to indicate that a function accepts no arguments, as in
int rand(void);
void * is the "generic data pointer" type used in declarations like
void *malloc(size_t);
void free(void *);
Many old compilers don't recognize void as a keyword. For these compilers
'void' functions are written without a return type in the function declaration
(it defaults to int), and char * is used instead of void * for generic
pointers. You can express your intent more clearly if you define
typedef int void;
typedef char *void_star;
These let you write declarations like
void_star malloc();
void free();
which look more like Standard C.
If your compiler generates code so that functions return ints the same way
they return chars, then you can safely define
typedef char void;
and write declarations like
void *malloc();
void free();
which looks even more like Standard C.
Some compilers, like cc on UNIX 4.2 BSD, implement void as a keyword, but
don't allow void * as a type. On these systems, you need only define
void_star.
After putting your definitions for void or void_star in a header called
<quirks.h>, you should include it at the beginning of every standard header.
These types will then almost appear to be built-in. You will need to include
<quirks.h> explicitly only in source files that use none of the standard
headers.
quirks.h can smooth out other differences in dialects. For example, if your
compiler doesn't implement the const and volatile keywords, you can add
#define const
#define volatile
Listing 1 shows a version of <quirks.h> for DECUS C. The protective wrapper
prevents repeated definitions of void.


<stdlib.h>


Like <string.h>, <stdlib.h> was invented by the ANSI standard. It declares the
general utility functions in the standard library, summarized in Table 1.
EXIT_SUCCESS and EXIT_FAILURE are codes used with the exit function to
indicate a program's success or failure to the host environment. They expand
to integral expressions that need not be constants. (An integral type is any
of the signed or unsigned forms of char, short int, int or long int, or any
enumerated type.) On MS-DOS and UNIX, the codes are usually defined by
#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1
Some systems, such as RT-11, define multiple levels of failure, such as
warning, error, severe error, etc., one of which you must pick for
EXIT_FAILURE. You can define additional codes like EXIT_WARNING, but they will
clearly be non-portable.
MB_CUR_MAX expands to a positive integer expression whose value is the maximum
number of bytes in a multibyte character as determined by the currect locale.
This is meaningful only if you already have multibyte character support, in
which case MB_CUR_MAX is already in your header. I just set it to 1.
RAND_MAX is the maximum value that can be returned by the rand function. It
must be integral and constant. The return type of rand is int, so RAND_MAX is
typically the value of the largest positive signed integer. The Standard
stipulates that RAND_MAX must be at least 32767, but if your rand operates
over a smaller range, use the smaller value until you rewrite the function.
div_t and ldiv_t are structure types returned by the div and ldiv functions,
respectively. You can define them as
typedef struct {int quot, rem} div_t;
typedef struct {long quot, rem} ldiv_t;
where quot and rem may be in either order.

wchar_t is the wide character type, an integral type whose range of values can
represent distinct codes for all members of the largest extended character set
specified among the supported locales. Like MB_CUR_MAX, this symbol relates to
multibyte and wide character support. If you don't have it, use
typedef char wchar_t;
Listing 2 shows my <stdlib.h> for UNIX 4.2 BSD. Notice that the definition of
NULL uses void_star from <quirks.h>. A protective wrapper surrounds wchar_t
because it appears in other headers in <stdlib.h>. No wrapper protects div_t
and ldiv_t because they appear only in <stdlib.h> and the protective wrapper
around the entire header prevents them from being redefined.
The abs and labs macros are just interim implementations, because the ANSI
standard requires that, unless explicitly exempted, all library functions must
be implemented as functions (so they are addressable). Functions declared in
headers may also be implemented as a macro, provided that the macro is "safe"
(i.e., it expands to code that evaluates each of its arguments only once), but
abs and labs aren't safe. When abs is a macro,
abs(*p++)
will evaluate *p++ twice, producing both unpredictable results and unwanted
side effects.


<stddef.h>


<stddef.h> contains some commonly used definitions, three of which -- NULL,
size_t and wchar_t -- are also in <stdlib.h>. <stddef.h> also introduces a new
type, ptrdiff_t, and a new macro, offsetof. My DECUS C implementation appears
as Listing 3.
ptrdiff_t is the type of the result of pointer subtraction. It is a signed
integral type either int or long int. It doesn't need a protective wrapper
because it isn't defined anywhere else.
The macro call offsetof(t, m) returns the offset (in bytes) of member m within
structure type t. offsetof expands to a constant expression of type size_t. m
cannot be a bit-field. In the rationale for the ANSI committee suggests three
possible definitions for offsetof:
#define offsetof(t, m) \
((size_t)&(((t *)0)->m))
or
#define offsetof(t, m) \
((size_t) (char *)&(((t *)O)->m))
or
#define offsetof(t, m) \
((size_t)((char *)&(((t *)X)->m) - (char *)&(x))
where X is some predeclared address. None of these definitions is guaranteed
to be portable, but so far, the first one has worked on every system I've
tried.


<stdarg.h>


This header defines a type, va_list, and three macros, va_start, va_arg, and
va_end, which access the arguments to a function with a variable length
argument list (like printf or scanf). <stdarg.h> is very similar to
<varargs.h> found with many UNIX C compilers.
Listing 4 shows a simple function, concat, that uses <stdarg.h>. The function
heading is a prototype whose parameter list ends with an ellipsis (, ...),
indicating that the length of the list is variable. va_list is the type of a
data object that tracks the current position in the argument list. va_start
initializes ap so that the first call to va_arg returns the value of the first
argument in the list's variable part. Subsequent calls to va_arg return the
values of the succeeding arguments. You must supply the argument's type to
each call to va_arg since arguments in a variable length list may be of
different types. (Bear in mind that the type of an argument in a variable
length parameter list will be promoted so that it will not be an integer type
smaller than int, nor will it be float.) va_end does any cleanup that might be
needed.
The implementation of <stdarg.h> depends on the compiler'sparameter-passing
conventions. Most compilers pass arguments by pushing them onto the run-time
stack. The rationale for the ANSI standard states that <stdarg.h> was designed
to accommodate newer machines that may pass arguments in machine registers.
Having no experience with C compilers for these machines, I will stick to the
more common stack-oriented methods.
Most MS-DOS compilers push arguments so that the first argument has the lowest
address. Figure 1 shows the argument list format for a call to
printf("%d %f %d\n", i, x, n)
using a typical MS-DOS C compiler (where i and n are 16-bit ints and x is a
64-bit double). SP represents the value of the stack pointer. The figure shows
the state of the stack just before jumping to printf.
Listing 5 presents an implementation of <stdarg.h> for moat MS-DOS C
compilers. va_start(ap, p) initializes ap to
(va_list)(&(p) + 1)
which is the address of the first parameter in the list's variable part (the
parameter after p). Some implementations write this expression as
(va_list)&(p) + sizeof(p)
which is equivalent as long as va_list is char *. va_start should expand to a
void expression, but many compilers erroneously omit the void cast.
va_arg(ap, t) returns the value of the current argument addressed by ap (cast
to type t), and advances ap to point to the next argument. On many compilers,
you can implement va_arg as
#define va_arg(ap, t) (*((t *)(ap))++)
This auto-increment expression may be a little easier to understand than the
one in Listing 5, but it relies on an extension to the C Standard. The
standard states that a cast expression, such as
(t *) (ap)
is not an lvalue, so it cannot be the operand of ++. The version of va_arg in
Listing 5 increments ap before applying the cast, then subscripts backwards to
obtain the argument originally referenced by ap. It's more obscure, but stays
within the standard.
If your compiler lets you use the auto-increment expression, is there any
reason not to? Yes. Consider Microsoft C v5.1. By default, the compiler lets
you use various language extensions. You can implement va_arg as an
auto-increment expression, but compiling your code with the /Za option
(disable language extensions) produces a warning from the compiler. Microsoft
implements va_arg as in Listing 5 so that it will work with every compiler
option.
On the other hand, Zortech C v1.07 also uses the auto-increment version of
va_arg. However, if you compile code using va_arg with the -A option (enforce
ANSI compatibility), you don't get a warning. This means the compiler can't
warn you about using this language extension in you code.
In most implementations va_end does nothing, but the standard states that it
should expand to a void expression. If your compiler complains that ((void)0)
is a useless expression, you can try using
#define va_end(ap) ((void)((ap) = 0))
If generating unnecessary code bothers you, you can
#define va_end(ap)
which works fine when va_end is called in a separate statement (as in Listing
4), but produces a syntax error when va_end is embedded in nasty (but legal)
expressions like
va_end(ap), n = 1;
If your compiler pushes the arguments so that the first one is at the highest
address, then you should use an implementation of <stdarg.h> like the one in
Listing 6. It differs from Listing 5 in two ways:
 va_start initializes ap to point to (instead of beyond) the last fixed
argument, and
 va_arg uses a pre-decrement (instead of a post-increment) to step to the next
argument.


<limits.h>



This header contains macros that define limits for the sizes and ranges of
integral types. Table 2 lists the macro names and their meanings. The standard
specifies a minimum magnitude (absolute value) for each limit. The version of
<limits.h> in Listing 7 uses these minimums. All implementation may (may
because it's permitted by the standard!) increase the magnitude of the limits,
but any program that relies on extended limits will not be portable to all
implementations.
For example, SHRT_MIN and SHRT_MAX define the range of values for type short
int. The standard requires the range to be at least -32767 to +32767 (decimal)
-- the set of values that can be represented using 16-bit ones-complement or
sign-magnitude arithmetic. On a two-complement machine, you can increase the
magnitude of SHRT_MIN to -32768, but any program that stores -32768 in a short
int might not work on other architectures.
The standard allows the range of int to be as small as the range of short int.
Hence, the minimum magnitudes for INT_MIN and INT_MAX are the same as for
SHRT_MIN and SHRT_MAX, respectively. At the opposite extreme INT_MIN and
INT_MAX could be as large as LONG_MIN and LONG_MAX, respectively.
I recommend that you write your <limits.h> to use the actual ranges supported
by your compiler. This lets you take full advantage of your architecture when
efficiency is more important than portability. When portability is important,
you must remember to avoid depending on the larger limits.
CHAR_MIN and CHAR_MAX define the range of values for "plain" char. A compiler
can choose to represent plain char as either signed char or unsigned char. If
your compiler treats plain char as signed, then use
#define CHAR_MAX SCHAR_MAX
#define CHAR_MIN SCHAR_MIN
Otherwise, use
#define CHAR_MAX UCHAR_MAX
#define CHAR_MIN 0
Some compilers let you select the representation of "plain" char. For example,
Microsoft C v5.1 normally treats char as signed, but the /J option changes it
to unsigned. This option also defines the macro _CHAR_UNSIGNED to allow
conditional compilation (as in Listing 7) to determine the appropriate
settings for CHAR_MIN and CHAR_MAX.
Borland's Turbo C v2.0 provides a switch for selecting the the representation
of "plain" char, but doesn't define a macro like _CHAR_UNSIGNED. In place of
#ifndef_CHAR_UNSIGNED
it uses
#if (((int)((char)0x80)) < 0)
According to the standard, #if expressions cannot use type casts or the sizeof
operator. Therefore, this technique can be used only on a compiler that
supports this language extension. It also means the compiler won't warn you
about using this feature even when you ask it to disable language extensions.
The standard states that every macro, except CHAR_BIT and MB_LEN_MAX is
defined as an expression that has "the same type as would an expression that
is an object of the corresponding type converted according to the integral
promotions." For example, INT_MAX is defined as an expression of type int, and
UINT_MAX is an expression of type unsigned int. On the other hand, the
character range limits (such as UCHAR_MAX) are defined as int expressions,
rather than as (signed or unsigned) char expressions, because character types
are promoted to int when used in an expression.
Notice that the unsigned limits are defined as unsigned constants. For
example, UINT_MAX is defined by
#define UINT_MAX 65535u
in Listing 7. The u suffix on the constant makes it unsigned. Without the u, a
decimal constant is either a signed int or a signed long int, depending on the
compiler. For example, DECUS C treats 65535 as (-1), but Microsoft C treats it
as 65535L (a long int).
If your compiler doesn't support the u suffix, you can try to write unsigned
int constants in octal or hex. For instance, some compilers with 16-bit ints
treat 0100000 through 0177777 and 0x8000 through 0xFFFF as unsigned int
constants. If that doesn't work, you can try
#define UINT_MAX
((unsigned)65535)
which might introduce another problem. Limits like UINT_MAX are supposed to be
usable in #if expressions; however, this definition uses a cast, which
(according to the standard) isn't usable. Even if your preprocessor won't
accept casts in #if expressions, you might still find this definition useful
in other contexts.
A similar problem occurs when you try to set INT_MIN to -32768 on some
two-complement machines (such as a PC) using 16-bit ints. In Microsoft C,
32768 is greater than INT_MAX, so it's a long int. Therefore, the definition
#define INT_MIN (-32768)
is wrong because it makes INT_MIN a long int. On the other hand,
#define INT_MIN (-32767-1)
only uses constants of type int, and so correctly defines INT_MIN as an int.


What's Been Gained?


I have shown how to write five standard headers: <string.h>, <stdlib.h>,
<stddef.h>, <stdarg.h>, and <limits.h>. I have also presented <quirks.h>,
which fakes a few new keywords that are missing from older compilers. With
just these few headers, it's much easier to port Standard C code to older
compilers.
Figure 1
Table 1
Summary of <stdlib.h>

Macros:

EXIT_FAILURE, EXIT_SUCCESS, MB_CUR_MAX, RAND_MAX, NULL

Types:

div_t, ldiv_t, size_t, wchar_t

Function Prototypes:

void abort(void);
int abs(int);
int atexit(void (*)(void));
double atof(const char *);
int atoi(const char *);
long atol(const char *);
void *bsearch
 (
 const void *, const void *, size_t, size_t,

 int (*)(const void *, const void *)
 );
void *calloc(size_t, size_t);
div_t div(int, int);
void exit(int);
void free(void *);
char *getenv(const char *);
long labs(long);
ldiv_t ldiv(long, long);
void *malloc(size_t);
int mblen(const char *, size_t);
int wctomb(char *, wchar_t);
int mbtowc(wchar_t, const char *, size_t);
void qsort
 (
 void *, size_t, size_t,
 int (*)(const void *, const void *)
 );
int rand(void);
void *realloc(void *, size_t);
void srand(unsigned);
double strtod(const char *, char **);
long strtol(const char *, char **, int);
unsigned long strtoul(const char *, char **, int);
int system(const char *);

#define _STDLIB_H_INCLUDED
#endif

Table 2
Macros defined by <limits.h>:

CHAR_BIT - number of bits in the smallest object that isn't a
 bit field (a byte)
SCHAR_MIN - minimum value for an object of type signed char
SCHAR_MAX - maximum value for an object of type signed char
UCHAR_MAX - maximum value for an object of type unsigned char
CHAR_MIN - minimum value for an object of type (plain) char
CHAR_MAX - maximum value for an object of type (plain) char
MB_LEN_MAX - maximum number of bytes in a multibyte character,
 for any supported locale
SHRT_MIN - minimum value for an object of type short int
SHRT_MAX - maximum value for an object of type short int
USHRT_MAX - maximum value for an object of type unsigned short
 int
INT_MIN - minimum value for an object of type int
INT_MAX - maximum value for an object of type int
UINT_MAX - maximum value for an object of type unsigned int
LONG_MIN - minimum value for an object of type long int
LONG_MAX - maximum value for an object of type long int
ULONG_MAX - maximum value for an object of type unsigned long
 int

Listing 1
/*
* quirks.h - eliminate quirks (for DECUS C)
*/
#ifndef _QUIRKS_H_INCLUDED


#define const
#define signed
#define volatile

typedef char void;

*define _QUIRKS_H_INCLUDED
#endif


Listing 2
/*
* stdlib.h - general utilities (for UNIX 4.2 BSD)
*/
#ifndef _STDLIB_H_INCLUDED

#include <quirks.h>

#define EXIT_SUCCESS 0
#define EXIT_FAILURE 1

#define MB_CUR_MAX 1

#define NULL ((void_star)0)

#define RAND_MAX 2147483647

typedef struct {int quot, rem} div_t;
typedef struct {long quot, rem} ldiv_t;

#ifndef _SIZE_T_DEFINED
typedef unsigned size_t;
#define _SIZE_T_DEFINED
#endif

#ifndef _WCHAR_T_DEFINED
typedef char wchar_t;
#define _WCHAR_T_DEFINED
#endif

void abort();
double atof();
int atoi();
long atol();
void_star calloc();
void exit();
void free();
char *getenv();
void_star malloc();
void qsort();
int rand();
void_star realloc();
void srand();
int system();

/*
* interim macro definitions for functions
*/
#define abs(j) ((j) >= 0 ? (j) : -(j))

#define labs(j) abs((long)(j))

/*
* missing functions
*/
int atexit();
void_star bsearch();
div_t div();
ldiv_t ldiv();
int mblen();
int wctomb();
int mbtowc();
double strtod();
long strtol();
unsigned long strtoul();

#define _STDLIB_H_INCLUDED
#endif


Listing 3
/*
* stddef.h - common definitions (for DECUS c)
*/
#ifndef _STDDEF_H_INCLUDED

#include <quirks.h>

#define NULL ((void *)0)

#define offsetof(t, m) ((size_t)&(((t *)NULL)->m))

typedef int ptrdiff_t;

#ifndef _SIZE_T_DEFINED
typedef unsigned size_t;
#define _SIZE_T_DEFINED
#endif

#ifndef _WCHAR_T_DEFINED
typedef char wchar_t;
#define _WCHAR_T_DEFINED
#endif

#define _STDDEF_H_INCLUDED
#endif


Listing 4
#include <stdio.h>
#include <stdarg.h>
#include <string.h>

/*
* Concatenate copies of a variable number strings into
* s1. The list of strings must be terminated by NULL.
* concat returns s1.
*/
char *concat(char *s1, ...)

{
char *s = s1;
const char *t;
va_list ap;

va_start(ap, s1);
while ((t = va_arg(ap, const char *)) != NULL)
{
strcpy(s, t);
s += strlen(s);
}
va_end(ap);
return s1;
}

int main(void)
{
char s[100];

puts(concat(s, "This ", "is ", "great!", NULL));
return 0;
}


Listing 5
/*
* stdarg.h - variable-length argument processing (for stack-
* oriented argument passing with the 1st argument at the
* lowest address)
*/
#ifndef _STDARG_H_INCLUDED

#include <quirks.h>

typeder char *va_list;

#define va_start(ap, p) ((void)((ap) = (va_list)(&(p) + 1)))

#define va_arg(ap, t) (((t *)((ap) += stzeof(t)))[-1])

#define va_end(ap) ((void)0)

#define _STDARG_H_INCLUDED
#endif


Listing 6
/*
* stdarg.h - variable-length argument processing (for stack-
* oriented argument passing with the 1st argument at the
* highest address)
*/
#ifndef _STDARG_H_INCLUDED

#include <quirks.h>

typedef char *va_list;

#define va_start(ap, p) ((void)((ap) = (va_list)&(p)))


#define va_arg(ap, t) (*(t *)((ap) -= sizeof(t)))

#define va_end(ap) ((void)0)

#define _STDARG_H_INCLUDED
#endif


Listing 7
/*
* limits.h - sizes of integral types (using minimum
* magnitudes)
*
#ifndef _LIMITS_H_INCLUDED

#include <quirks.h>

#define CHAR_BIT 8
#define SCHAR_MIN (-127)
#define SCHAR_MAX 127
#define UCHAR_MAX 255

#ifndef _CHAR_UNSIGNED
#define CHAR_MAX SCHAR_MAX
#define CHAR_MIN SCHAR_MIN
#else
#define CHAR_MAX UCHAR_MAX
#define CHAR_MIN 0
#endif

#define MB_LEN_MAX 1
#define SHRT_MIN (-32767)
#define SHRT_MAX 32767
#define USHRT_MAX 65535u
#define INT_MIN (-32767)
#define INT_MAX 32767
#define UINT_MAX 65535u
#define LONG_MIN (-2147483647)
#define LONG_MAX 2147483647
#define ULONG_MAX 4294967295u

#define _LIMITS_H_INCLUDED
#endif



















Linked Lists In C++


Bob Jarvis


Bob Jarvis is a Senior Capacity Planning Analyst for American Greetings
Corporation who has been programming in C for four years, and C++ for a year
and a half. His current interests include computer performance analysis and
object-oriented programming. He can be reached at American Greetings Corp.,
10500 American Road, Cleveland, OH 44144.


C++ offers a number of useful enhancements to C, including the ability to
easily derive new specialized classes from previously-defined classes. This
class derivation technique is particularly useful when working with so-called
"container" classes (i.e., linked lists, stacks, B-trees, etc.). Programmers
should be able to derive specialized containers for each type of object used
in a manner which ensures safe and efficient storage and use. This article
presents an implementation of a double linked list class for C++, and
discusses some of the problems encountered during the design and
implementation of this class.


What's In A List?


A linked list is a data structure in which each individual data element is
stored as a single node in the list and each node contains pointers, or
"links", to other elements in the list. In a double linked list there are two
pointers -- one to the previous element in the list and another to the next
element. (In a simpler form, the single linked list, each node has a single
pointer to the next element in the list. The single linked list simplifies
list maintenance and reduces the storage requirements slightly, but eliminates
the ability to traverse the list "backwards"). Unlike an array, physical
storage order is irrelevant in a linked list; with pointers the previous and
next elements may be located at a memory address before or after the current
element.
Three positions in the list are particularly important: the "head" (the first
element in the list), the "tail" (the last element in the list), and the
"current" element (the element currently being used). Note that these three
positions may not necessarily differ from one another -- the current element
may be the same as the head or tail elements. In fact all three logical
positions will point to the same physical element if there is only one element
in the list.


Implementation Considerations


Figure 1 shows the interrelationships of nodes within a hypothetical double
linked list. We can see immediately that there are two distinct object types
-- the linked list itself, and the nodes within the list. (You could also
argue that there is a third object, namely the data stored within the list).
Although my first cut at implementing LIST also implemented the list nodes as
a true class, the present version implements the nodes as a simple structure.
This was done primarily for programming convenience (I found it easier to
derive subclasses this way).
As implemented here, a LIST copies the data to be stored in the nodes and
keeps a pointer to the copy. Copying eliminates any requirement that data
stored in a LIST be static, but requires that LIST subclasses be able to
properly allocate, copy, and delete instances of the data stored in the nodes,
which in turn means that LISTs should be "aware" of the type of data that
they're storing.
This need for awareness creates a problem. As implemented, the base LIST class
stores a string of bytes, but has no knowledge of what is actually being
stored. If the object being stored in the list has pointers to
dynamically-allocated memory, those pointers will be copied, creating two
objects which point to the same memory area. When either one of those objects
is destroyed, the memory pointed to by the one object is freed, leaving the
remaining object with a pointer to a memory area which is no longer valid.
The solution to this problem uses virtual functions to create, copy, and
delete copies of the data stored in the LIST. The default functions provided
in the base LIST class should work properly for all simple objects which do
not contain pointers to dynamically-allocated storage. If the data objects
being held in a class derived from LIST do contain pointers to dynamic storage
the virtual functions (the create_data(), copy_data(), and delete_data()
member functions) will have to be rewritten to correctly invoke the
constructors and destructors as appropriate. (Note that having an operator=()
function defined as part of the data object class greatly simplifies the task
of rewriting the copy_data() function). In Listing 5 the copy_data() function
demonstrates how casts can be used to simplify the process of rewriting the
virtual functions.


Usage


Using the LIST class is fairly straightforward. An instance of the class is
declared, then elements are added to it, retrieved, etc. In Listing 1 a LIST
named ilist is declared and filled with a sequence of integers. These integers
are then retrieved and printed.


Deriving Sub-Classes


While using the generic LIST class to hold a series of integers works quite
well, adding elements to the list is a bit awkward. Two parameters must be
supplied -- the address of the integer and the size of the element being added
(in this case an integer). While a #define could be used to "neaten" things
up, a LIST which specifically handled integers would be better. Creating such
a LIST is not difficult to do since all of the member functions can be
replaced by inline calls to the member functions in the original LIST class
(Listing 4).
In this case we create a class named INTLIST, derived from LIST. The INTLIST
class does not automatically inherit the public members of LIST (it is
declared as INTLIST : LIST rather than INTLIST : public LIST). This
declaration limits users to the interface defined for INTLIST; otherwise,
users could add other non-integer items by using the generic functions defined
for LISTs.
A new data item (curr_size) was added to the INTLIST class. curr_size holds
the size of the current item in the list, and is equivalent to sizeof(int)
except in the case where an action could not be satisfied (such as getting the
next element after the last element in the list), in which case curr_size is
set to zero. The test code using the LIST class can now be modified to use an
INTLIST as in Listing 2.


Summary


The addition of container classes and the ability to derive specialized
classes from them makes using C++ faster and more reliable than C. Programmers
can concentrate on developing code to solve problems rather than writing and
rewriting repetitive functionality such as linked lists.
Figure 1

Listing 1
#include <stream.hpp>
#include "list.hpp"

main()
{
LIST ilist;
int i,size,*iptr;


for(i = 0 ; i < 10 ; ++i)
{
cout << "Adding " << i << "\n";
ilist.add_tail(&i, sizeof(i));
}

cout << "\n";

iptr = ilist.get_head(size);

while(iptr != NULL)
{
cout << "Retrieved " << *iptr << "\n";
iptr = ilist.get_next(size);
}
}


Listing 2
#include <stream.hpp>
#include "intlist.hpp"

main()
{
INTLIST ilist;
int i;

for(i = 0 ; i < 10 ; ++i)
{
cout << "Adding " << i << "\n";
ilist.add_tail(i);
}

cout << "\n";

i = ilist.get_head();
while(ilist.get_curr_size() != 0)
{
cout << "Retrieved " << i << "\n";
i = ilist.get_next();
}
}


Listing 3
#include <stream.hpp>
#include <stddef.h>
#include <string.h>
#include "list.hpp"

LIST::LIST() // constructor
{
head = curr = tail = NULL;
}

LIST::~LIST() // destructor
{
struct listelem *work;


while(head != NULL)
{
delete_data(head);
work = head->next;
delete head;
head = work;
}
}

void *LIST::get_head(unsigned int &sz)
{
curr = head;
return get_curr(sz);
}

void *LIST::get_curr(unsigned int &sz)
{
if(curr == NULL)
{
sz = 0;
return NULL;
}
sz = curr->size;
return curr->data;
}

void *LIST::get_tail(unsigned int &sz)
{
curr = tail;
return get_curr (sz);
}

void *LIST::get_prev(unsigned int &sz)
{
if(curr->prev != NULL)
{
curr = curr->prev;
return get_curr(sz);
}
else
{
sz = 0;
return NULL;
}
}

void *LIST::get_next(unsigned int &sz)
{
if(curr->next != NULL)
{
curr = curr->next;
return get_curr(sz);
}
else
{
sz = 0;
return NULL;
}
}


void LIST::add_before(void *vptr, unsigned int sz)
{
struct listelem *lptr;

lptr = new struct listelem;
if(lptr == NULL)
exit(99); // ugly - should be fixed later

lptr->size = sz;
create_data(lptr);
copy_data(lptr,vptr);

// rearrange pointers

if(curr != NULL)
{
lptr->prev = curr->prev;
lptr->next = curr;

if(lptr->prev != NULL)
lptr->prev->next = lptr;
else
head = lptr;

curr->prev = lptr;
}
else // curr == NULL - must be first element in list
{
lptr->prev = lptr->next = NULL;
head = curr = tail = lptr;
}
}

void LIST::add_after(void *vptr, unsigned int sz)
{
struct listelem *lptr;

lptr = new struct listelem;
if(lptr == NULL)
exit(99); // ugly - should be fixed later

lptr->size = sz;
create_data(lptr);
copy_data(lptr,vptr);

// rearrange pointers

if(curr != NULL)
{
lptr->prev = curr;
lptr->next = curr->next;

curr->next = lptr;

if(lptr->next != NULL)
lptr->next->prev = lptr;
else
tail = lptr;

}
else // curr == NULL - must be first element in list
{
lptr->prev = lptr->next = NULL;
head = curr = tail = lptr;
}
}

void LIST::add_head(void *vptr, unsigned int sz) 
}
struct listelem *lptr;

lptr = new struct listelem;
if(lptr == NULL)
exit(99); // ugly - should be fixed later

lptr->size = sz;
create_data(lptr);
copy_data(lptr,vptr);

if(head != NULL)
{
lptr->prev = NULL;
lptr->next = head;
head->prev = lptr;
head = lptr;
}
else
{
lptr->prev = lptr->next = NULL;
head = curr = tail = lptr;
}
}

void LIST::add_tail(void *vptr, unsigned int sz)
{
struct listelem *lptr;

lptr = new struct listelem;
if(lptr == NULL)
exit(99); // ugly - should be fixed later

lptr->size = sz;
create_data(lptr);
copy_data(lptr,vptr);

if(tail != NULL)
{
lptr->next = NULL;
lptr->prev = tail;
tail->next = lptr;
tail = lptr;
}
else
{
lptr->prev = lptr->next = NULL;
head = curr = tail = lptr;
}
}


void LIST::delete_curr()
{
struct listelem *lptr;

if(curr == NULL)
return;

lptr = curr;

if(curr->prev != NULL)
curr->prev->next = curr->next;
else
head = curr->next;

if(curr->next != NULL)
curr->next->prev = curr->prev;
else
tail = curr->prev;

if(curr->prev != NULL)
curr = curr->prev;
else if(curr->next != NULL)
curr = curr->next;
else if(head == NULL && tail == NULL)
curr = NULL; // list is now empty
else
{
cerr << "LIST::delete_curr() : deletion sequence error\n";
exit(99);
}

delete_data(lptr);
delete lptr;
}

// The following three functions should be replaced in all
// derived classes. create_data() should allocate and
// initialize space for a new instance of the class being
// stored in the list (implying that you should derive a
// new class for each type of object). delete_data() should
// handle the destruction of class instances stored in a list.
// copy_data() must take care of copying a complete class
// instance from one place to another - it must properly
// handle the situation where a class being stored in a list
// has dynamically-allocated storage. As written these
// functions should properly handle all simple types (that
// is, types which have no dynamic storage).

void LIST::create_data(struct listelem *lptr)
{
lptr->data = new char[lptr->size];
if(lptr->data == NULL)
exit(99);
}

void LIST::delete_data(struct listelem *lptr)
{
if(lptr->data !=NULL)

delete lptr->data;
}

void LIST::copy_data(struct listelem *lptr, void *from)
{
memcpy(lptr->data,from,lptr->size);
}


Listing 4
// LIST.HPP - linked list interface

#ifndef _LIST_HPP
#define _LIST_HPP

class LIST
{
struct listelem
{
struct listelem *prev, *next;
void *data;
unsigned int size;
};
struct listelem *head, *curr, *tail;
public:
LIST();
~LIST();
void *get_head(unsigned int &sz);
void *get_curr(unsigned int &sz);
void *get_tail(unsigned int &sz);
void *get_prev(unsigned int &sz);
void *get_next(unsigned int &sz);
void add_before(void *vptr, unsigned int sz);
void add_after(void *vptr, unsigned int sz);
void add_head(void *vptr, unsigned int sz);
void add_tail(void *vptr, unsigned int sz);
void delete_curr(void);
virtual void delete_data(struct listelem *lptr);
virtual void create_data(struct listelem *lptr);
virtual void copy_data(struct listelem *lptr, void *from);
};

#endif // ifndef _LIST_HPP


Listing 5
// INTLIST.HPP - list of integers - derived from LIST

#ifndef _INTLIST_HPP
#define _INTLIST_HPP

#include "list.hpp"

class INTLIST : LIST // a LIST of integers...
{
unsigned int curr_size;
public:
INTLIST(void) {curr_size = 0;}
int get_head(void) {return *((int*)LIST::get_head(curr_size));}

int get_curr(void) {return *((int*)LIST::get_curr(curr_size));}
int get_tail(void) {return *((int*)LIST::get_tail(curr_size));}
int get_prev(void) {return *((int*)LIST::get_prev(curr_size));}
int get_next(void) {return *((int*)LIST::get_next(curr_size));}
void add_before(int i) {LIST::add_before(&i, sizeof(int));}
void add_after(int i) {LIST::add_after(&i, sizeof(int));}
void add_head(int i) {LIST::add_head(&i, sizeof(int));}
void add_tail(int i) {LIST::add_tail(&i, sizeof(int));}
void delete_curr(void) {LIST::delete_curr();}
unsigned int get_curr_size(void)
{return curr_size;}
void copy_data(struct listelem *lptr, void *from)
{*((int*)lptr->data) = *((int*)from);}
};

#endif // _INTLIST_HPP















































Standard C


Quiet Changes, Part II




P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standards committee.


Last month, I began the process of describing all of the quiet changes in
Standard C. (See "Quiet Changes, Part I," CUJ February '90.) A quiet change is
a change in the meaning of Standard C versus some earlier (and presumably
popular) dialect of C. It is a change that converts an acceptable program with
one behavior to an acceptable program with different behavior. You get no
diagnostic to warn you that your program may have to be altered to keep its
old behavior.
Needless to say, quiet changes are not nice. Committee X3J11 did its best to
minimize them. It considered each such change very carefully. And it
documented all the quiet changes it felt compelled to make in the rationale
that accompanies the standard.
My goal with this column and the previous one is to show you all the quiet
changes. With each one, I give an example of code that might be affected. I
also endeavor to explain how the committee was led to introduce the change.
Last issue, I covered about half of the quiet changes. Here are the rest.


More Quiet Changes


"A program that depends upon unsigned preserving arithmetic conversions will
behave differently, probably without complaint.This is considered the most
serious semantic change made by the committee to a widespread current
practice."
For example,
unsigned char uc = digit;

if (uc - '0' > 'g')
printf("not a digit\n");
The message is no longer printed for all cases where digit is out of range.
Some older implementations always performed an unsigned compare and hence
cleverly got the test right for all cases.
The fact that I devoted a whole column to this issue should tell you that it
has a lot of ramifications. (See "Standard C Promotes Types According to Value
Preserving Rules," CUJ August '88.) Here I will simply summarize the issue. It
required the committee to choose between two divergent classes of dialects.
Neither could be dropped without causing quiet changes in programs written for
the other dialect.
The divergence occurred when C acquired the additional unsigned types besides
unsigned int. These new types required additional rules for promoting integer
types, such as when you subtract an unsigned char and an int. The decision
inside Bell Labs was to preserve unsignedness. That is, if either operand is
an unsigned type, both operands are promoted to the "cheapest" computational
unsigned type that is at least as wide as the wider operand. (The
computational types are the signed and unsigned versions of int and long.) So
unsigned char minus int yields a result of type unsigned int.
What I and some other implementors chose to do was different. We chose to
promote both operands to the cheapest computational type that would represent
all the values of each of the two operand types. So unsigned char minus int
yields a result of type int, so long as unsigned char has a narrower
representation than int.
Unlike the "unsigned preserving" promotion rules, the result type is different
for different target machines. On the other hand, the result more often has a
type that correctly reports a negative value as a negative signed result, not
some huge unsigned result. That is why the second set of promotion rules has
been dubbed "value preserving."
After much heated debate, the value preserving faction won. The most
convincing argument was that value preserving promotions produced surprising
results less often. The most convincing counter argument was that UNIX, and
lots of other important code, was written using unsigned preserving
promotions. Groups had successfully ported UNIX, however, using a compiler
with value preserving promotions. That quelled enough fears for the committee
to reach consensus.
My belief is that this was a tempest in a teapot. (Naturally, it didn't feel
that way to those of us in the teapot at the time.) It's hard to contrive
realistic examples that quietly change along with these promotion rules. I
gave the most compelling one I could think of above, in the spirit of fair
play. If you're still worried, however, go back and read the full diatribe in
my original column on the subject.
"Expressions with float operands may now be computed at lower precision. The
Base Document specified that all floating point operations be done in double."
For example,
float x, y;
x = x - y;
The subtraction can now be performed to float precision, which could retain
less significance than in the past.
C has traditionally performed all floating point arithmetic in double. This
minimized type mismatches for floating point arguments back before there were
function prototypes. It also happened to model the behavior of the PDP-11
floating point hardware used in the first implementation of C.
Unfortunately, that traditional behavior has been one of the principal reasons
why more FORTRAN programs have not been converted to C. The performance
penalty can be substantial. What you gain in retained precision is often not
worth the cost, particularly to programmers skilled in juggling precisions.
The committee had little trouble changing the promotion rules to match FORTRAN
more closely. We felt that few programs depend critically on the higher
intermediate precision promised in the past when adding operands of type
float.
"A program that uses #if expressions to determine properties of the execution
environment may now get different answers."
#if -1U/2+1 == 1U<<31
/* int is 32-bits, 2's-complement */
The comment is not necessarily true for the target environment.
Some C programs determined properties of the execution environment by testing
how the C preprocessor performed integer arithmetic. The assumption was that
preprocessor arithmetic was the same as for the target environment. That
happens not to be true for most cross compilers. It is also not true for many
compilers designed to support numerous target environments.
The committee decided that target-independent preprocessors were a Good Thing.
To be sure, compile time arithmetic must retain at least the same precision
and range as arithmetic on the target. But it need not slavishly match all of
the foibles of the target.
Instead of writing clever (and unreadable) expressions in #if directives,
programmers are now urged to test the values of appropriate macros. The
standard header <limits.h> defines macros that tell you all sorts of
interesting things about the representation of integers on the target. The
standard header <float.h> defines macros that tell you more than you're likely
ever to want to know about the representation of floating point types. The
above example should be changed to
#include <limits.h>
#if INT_MAX == 2147483647 && \
INT_MIN == -2147483648
/* int is 32-bits, 2's-complement */
An implementation can still promise to model the target arithmetic in the
preprocessor. In that case, the clever programs need not change. If you want
to port them to another implementation, however, you'd probably have to change
the #if expressions anyway.
"The empty declaration struct x; is no longer innocuous.
For example,

f() {
struct x; /* special meaning */
struct y {
struct x *px;
..... };
struct x {
struct y *py;
..... };
The first declaration now assures that the two structures point at each other,
regardless of any outer context.
Just as block structure does not mix well with external linkage, it collides
at times with forward references as well. C lets you declare a structure tag
before you declare the contents of the structure. (It is an incomplete type.)
You need to make such a forward reference when you declare two structures each
of which contains a pointer to the other.
Unfortunately, C had no way to shield a forward reference from a structure tag
definition visible in an outer block. Should you wish to plunk down a patch of
code containing forward references to tags in an arbitrary code environment,
you ran the risk of having the patch misbehave in some contexts. This defeats
much of the purpose of block structuring to protect name spaces.
This is an esoteric problem. You've probably never encountered it and you
probably never will. Nevertheless, the committee decided to solve it by giving
special meaning to an otherwise empty declaration such as struct x;. You can
contrive a program that breaks because this esoteric problem has been fixed.
You will have trouble convincing me that it is a program worth writing.
"Code which relies on a bottom-up parse of aggregate initializers with
partially elided braces will not yield the expected initialized object."
You can, of course, initialize structures containing structures, or arrays of
arrays, or arrays of structures. For any complex initializer involving
aggregates, you write braces around the stuff for each aggregate to set it
off. Unfortunately, C has a long tradition of letting you omit all but the
outermost set of braces. For a compiler writer, this is a nightmare.
When implementors on the committee started comparing nightmares, matters got
even worse. It seems that people had settled on at least two different ways to
parse aggregate initializers with partially elided braces. In terms of parsing
theory, the two general camps can be characterized as "top-down" and
"bottom-up." I will not bore you with detailed examples to illustrate the
subtle differences.
What you need to know is that the committee eventually endorsed the top-down
approach to parsing initializers. You also need to know that omitting braces
is a great way to confuse yourself and future maintainers. Never mind that
Standard C translators are all supposed to guess the same way. If an
initializer is sufficiently complex, you are asking for trouble if you omit
any of the internal braces.
Any program that suffers from this quiet change already faced portability
problems in the past.
"Type long expressions and constants in switch statements are no longer
truncated to int."
For example, on an implementation where type int occupies 16 bits,
long lo = 0x20001;
switch (lo) {
case 0x0001: /* no longer
matches */
The switch comparisons are now performed with long arithmetic, so the first
case does not match a truncated value.
The committee entertained proposals for all sorts of improvements to switch
statements. We decided against permitting floating point or pointer
expressions to control switch statements. On the other hand, we found little
justification for continuing to rule out the other integer computational types
besides int. A switch statement whose control expression is of type long will
now do all its comparisons against case values with long arithmetic.
Conceivably, an existing program contains a switch expression of type long.
Conceivably, the program depends upon the value being altered when it is
converted to type int for the comparisons. If so, then the program will
quietly change its behavior. The likelihood of such an occurrence is
reasonably remote.
"Functions that depend on char or short parameter types being widened to int,
or float to double, may behave differently."
For example,
f(x, y)
char x;
{
if (y == 0)
x = 500;
In Standard C, x remains type char. Hence, the stored value would probably be
truncated. Many past implementations would promote x to type int.
One school of thought in the past was that a parameter of type char was silly.
Everyone knew that the argument value was actually passed as type int, so why
not just rewrite the type of the parameter to match the passed value?
The alternate school was that the programmer's wishes should be obeyed.
However the argument value was passed, it should be stored in a data object of
the declared type. You get fewer surprises that way. Besides, for most integer
parameters, you can make the conversion "free" just by picking the right piece
of the argument value as the parameter data object.
The second school of thought prevailed in the end. That causes trouble for
programs written for translators that rewrite parameter types. (Again, the
trouble was already there for people who tried to move such programs between
implementations.)
Look for places where you declare a parameter as other than the "widened" type
(the type actually passed). If the function stores values too large for the
declared type, you will have a quiet change. If the function takes the address
of that parameter, you may well have a quiet change. Surprisingly, however,
the change often has no effect.
"A macro that relies on formal parameter substitution within a string literal
will produce different results."
For example,
#define pr(msg) printf("msg\n")
Standard C will not alter the string when it expands the macro. Some earlier
implementations would do so.
The folks at Berkeley decided that macros should be able to generate tailored
string literals. Consequently, they adopted the convention that a macro
parameter name within a string literal should be replaced by the actual
parameter. (This happens only for string literals written as part of the
expansion text of a macro, never outside macro expansions.)
There was general support for such a mechanism among members of the committee.
Some of us, however, objected to this particular approach. (I was the loudest
of the objectors.) It would mean greatly complicating the lexical description
of string literals. They were already complicated enough with escape sequences
and string concatenation. Adding the concept of embedded identifiers was
appalling.
Instead, the committee adopted a new convention. Any parameter name you
precede with a # gets turned into a string literal. With string concatenation,
you can paste the "stringized" parameter into a larger string. The capability
is the same, just the machinery differs. The example above must be changed to
#define pr(msg) printf(#msg "\n")
Nevertheless, any program that depends upon this practice will suffer a quiet
change.
"A program which relies on size-0 allocation requests returning a non-null
pointer will behave differently."
For example,
sscanf("%u", &size);
p = malloc(size);
If size is zero, the behavior is now undefined. The value stored in p may be a
null pointer, or the program may abort, for example.
An unfortunate schism developed within the committee over the proper behavior
of malloc(0). Should this produce a non-null pointer to an object of zero
size? Or should it be considered invalid so that an implementation must
diagnose it (perhaps at run time)? It is a religious issue, touching on
people's basic beliefs as to what constitutes elegant behavior. Like most
religious issues, many bystanders quickly tire of arguments on either side of
the matter. That made it all the more difficult for the committee to achieve
an informed resolution.
The net result was that both sides lost. The "compromise" was to label
malloc(0) undefined behavior. This means that programmers can't depend on its
working right. They also can't depend on its being diagnosed. A program that
allocates objects of varying size may now suffer a quiet change. Where once it
could handle the occasional zero-size object without special code, now it can
fail.


Conclusion



If you look back over this set of quiet changes, you will find much cause for
hope. The hope is that nearly all of the changes need not remain quiet. A good
translator can have extra checks to look for these cases and emit warning
messages. Only the last, concerning zero-size objects, must be augmented by
run-time checking.
I have not yet heard of an implementation that offers to check for quiet
changes. One that does will be a valuable migration tool. (You don't want the
checks turned on all the time, only for old C code that you are upgrading.)
Vendors please take note.





























































Dr. C's Pointers(R)


Void Pointers, Jump Tables, And Friends




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091
or via UUCP at uunet!aussie!rex.


This column started out as an example of using a void pointer to point at an
object having one of a set of possible types. The intent was to also record
the type to which it currently pointed and to discuss an efficient way of
accessing the underlying object at a later time. I did achieve these goals but
along the way, digressed into a number of other interesting areas as you shall
see.


Void Pointers


ANSI C adopted the notion of a generic pointer from C++. A generic pointer is
declared as void *pv and may point to any object or function. C does not
require pointers to different types to have the same representation, and on
word and segmented architectures there are often two or more different pointer
representations. Since a generic pointer must be able to store an address with
the smallest resolution and a char is the smallest addressable object in C, a
void pointer must be at least as big as a char pointer. (In fact, ANSI C
requires they have exactly the same representation.)
A void pointer may contain any arbitrary address and at different times, could
point to a char, a double, or a structure of some type, for example. A void
pointer does not record any information about the object (or function) to
which it currently points -- it is the programmer's responsibility to keep
track of this (just like the current contents of a union).
Since the compiler knows nothing about the object (or function) to which a
void pointer points, such a pointer has several restrictions: You cannot
dereference a void pointer, and you cannot do arithmetic on it -- both
operations require knowledge of the underlying object type.


The Linked List Problem


In some applications, it is useful to have a linked list where each link
describes an object whose type may be different from that of other objects
described by other links. An example might be a linked list of device control
blocks in an operating system. The format of control blocks for different
devices will likely vary.
If each link in a list has a different format, how can the linked list be
declared? This would require the forward and backward pointers (assuming a
doubly-linked list) in a link to be of type void * (so they could point to any
object type) and it would also require a flag in the link to indicate the type
of the forward and backward objects. This can be cumbersome particularly if a
link points to more than two places. (You may have multiple linked lists
linked to each other, for example.)
The approach I have taken is somewhat similar but it avoids the flag field for
forward and backward pointers. Essentially, each link contains a void pointer
that points to some underlying object and the type of that object is stored in
a field in the link. When you point to an object of type A, the flag is
updated to record that. And a special flag value is used to indicate when the
pointer does not point to an object.


The Switch Solution


There are several ways to implement the code. One uses the switch construct,
as in Listing 1.
The five macros TYPE* represent the five possible states of the flag field
objtype. Their values must be distinct but otherwise are immaterial, as is the
fact that objtype is unsigned -- it could just as easily have been signed.
To simplify the example, I have allocated space for only one link and have
ignored the possibility of malloc returning NULL.
Each link contains a pointer (pfwd) to the next link in the list, a pointer
(pbwd) to the previous link, a pointer (pobject) to some generic object, and a
flag (objtype) indicating the type of the object to which pobject points.
The link is initialized to point to a double object and the forward and
backward pointers are set to NULL to indicate the end of the list. You might
well ask, "Why not use calloc to allocate the space so it's initialized and
the pointers need not be set to NULL in the program?"
Certainly, that approach will succeed on some systems. However, it is not
maximally portable. calloc initializes the allocated space to all-bits-zero
and while that's the representation of integer zero it need not represent
floating-point zero, or in this case, the null pointer constant. ANSI C does
not require that NULL be represented internally as all-bits-zero. It simply
requires that the null pointer be a value that is not the address of an object
or function, and that comparing and assigning pointers with integer zero
actually works.
Once the list has been constructed, it becomes necessary to process the
objects to which each link points. The problem here is how to do this
efficiently? In this example, the switch construct is used and the correct
answer is produced -- the link does indeed point to a double object. But is
this approach efficient?
The real question comes down to "How is a switch implemented?" ANSI C does not
specify this; it simply requires that in the absence of a break statement (or
similar) that you drop through from one case to the next. And if each case is
mutually exclusive, the ordering of the cases (including default) can be
arbitrary.
A lot of programmers believe (not necessarily for any good reason) that the
order in which they specify the case labels, is important. This may or may not
be true depending on your implementation and the set of case label values. For
example, if the set of labels is dense (as in this case), the compiler might
generate a jump table of addresses. (It might even be able to take advantage
of a hardware case instruction such as exists on the VAX.) Certainly, this
would make for efficient code.
If, on the other hand, the set of label values is sparse the compiler may
generate a series of nested if/else constructs and it may do them in the order
in which the cases are specified, the reverse order, or possibly in some other
order. (As an exercise, if your compiler can produce a machine-code listing,
do so for each of the three solutions shown in this article. Compare the code
generated for the switch, if/else, and jump table approaches.)
The bottom line is that you are never guaranteed (by C) that the first case is
tested for before the second (and the third, etc.) so specifying the most
common case label value first need not be the most efficient approach. And if
nested if/elses are used, the number of tests made to resolve the branch will
be proportional to the number of cases defined.
The casts are needed since you cannot dereference a void pointer directly.


The if/else Solution


Whereas the switch construct provides no guaranteed order of case value
matching, the if/else construct does. For example:
if (pnode-objtype == TYPECHAR)
printf ("char: %c\n", *(char *)pnode-pobject);
else if (pnode-objtype == TYPEINT)

printf("int: %d\n", *(int *)pnode-pobject);
else if (pnode-objtype == TYPELONG)
printf("long: %ld\n", *(long *)pnode-pobject);
else if (pnode-objtype == TYPEDOUBLE)
printf("double: %f\n", *(double *)pnode-pobject);
else if (pnode-objtype == TYPENONE)
printf("none:\n");
Now we have complete control of the order in which the tests are done.
However, this ordering is fixed and favors those values near the front of the
set of tests. It also cannot take advantage of any jump table generation the
compiler might be able to do (unless the compiler has a very, very clever
optimizer).


The Jump Table Solution


It so happens this problem can be solved and in a manner that involves no
priority of testing. That is, the underlying object can be processed
efficiently without regard to its type. (More correctly, I should say the code
to do the processing is dispatched without favoritism.) Of course, there are
always trade-offs and in this case, the code to process each object type must
be in a function. That is, we must call a function to do the work whereas with
the switch and if/else approaches, the work could be done inline(Listing 2).
The key to the solution is the object funtable. It's an array of five objects
each of which is a pointer to a function that has no return value and has one
argument, of type pointer to void. The array is initialized with the addresses
of the five object type processing functions pro*. An array of function
pointers is often referred to as a jump table.
It is absolutely critical that the order of the initializer expressions for
funtable exactly match the values assigned in the TYPE* macros since we will
use these macros to index into the funtable array. That is, the macro
corresponding to pronone (TYPENONE) must have a value of zero since that is
the first subscript value.
The expression
(*funtable[pnode-objtype])
(pnode-pobject);
actually dispatches the type processing code. Following the operator
precedence table, funtable is first subscripted using the type flag giving the
address of the appropriate function. Then that function is called with the
generic address of the underlying object being passed as the only argument.
Regardless of the number of possible values for pobject, you only need this
one statement to call the processing function -- all type processing functions
take equally long to dispatch since they all require one lookup in funtable.
To change the number of types, you simply need to define the new processing
functions and add them to the table initializer list. The concept of
controlling the order in which types are tested for no longer exists since
using the flag as a subscript you intuitively know the function to be used
each time.
The messy looking cast expressions are still present in each processing
function. Why couldn't proint (for example) be defined as
void proint(int *parg)
{
printf("int: %d\n", *parg);
}
instead of
void proint(void *parg)
{
printf("int: %d\n", *(int *)parg);
}
Again, this may work on some systems but, according to ANSI C, the behavior is
undefined. Specifically, in main a void pointer is passed yet in proint an int
pointer is expected. As stated earlier, these two pointer types are not
required to have the same size and representation. (On a word machine such as
a Cray supercomputer such mismatching will likely result in the wrong answer
for all characters except the first in a given memory word.)
The function prochar is a special case since it could be defined to expect a
char pointer. And since char pointers and void pointers are required to have
the same representation, this would work. However, in both cases (proint and
prochar) the formal argument list in the definition would not match the
prototypes for these functions. And if you change the prototypes to match, the
table initializer will be erroneous. By definition, every function pointer
must point to a function having the same argument list as well as return type.
You could bypass the strict checking rules by leaving the argument information
out of the table declaration but this still won't help you. In the absence of
a prototype in the table declaration, the actual void pointer argument will be
passed as is, giving rise to the mismatch problem with the formal argument as
discussed earlier.
In short, the functions must all have the exact same argument list thus
requiring the explicit cast before dereferencing. Even pronone must have an
argument despite the fact it is never used.
Just what is the cost of a cast anyway? None at all on systems where all
pointers are created equal. (This is typically the case on byte architectures
having a linear address space.) On word and segmented architectures, most
pointer conversions are also nonevents except where either the cast operand or
the cast type is a char (or possibly short int) pointer. So don't be too
concerned about the cast generating code.
It was implied earlier that requiring each type's processing code to be a
function might be inefficient since we have added the overhead of calling a
function. Depending on this cost, it may or may not be significant. Also, an
increasing number of compilers are adding the ability to automatically inline
functions in each place they are called. (VAX C recently added this in V3 and
C++ supports it explicitly using the inline keyword.)


Enumerations Versus Macros


In all three solutions, macros were used to come up with a set of unique
integer values. The same result can be achieved using an enumerated type as
follows:
enum {TYPENONE, TYPECHAR,
TYPEINT,
TYPELONG, TYPEDOUBLE};
Not only do we get a set of unique int values, they also start at zero (as
required by the jump table approach). And we are relieved from having to
assign the numbers explicitly.
Regarding the spelling of the enumerations constant identifiers; should they
be in upper- or lower-case? If you follow the rule "All upper-case for macros
and all lower- or mixed case for other identifiers" then they should be all
lower-case. When I see an identifier written in upper-case I immediately
understand that identifier might expand into an arbitrarily complex expression
and I should take care how it's used. Since an enumeration constant "expands"
to a simple integer constant the connotations of spelling it in upper-case are
unwarranted. In the final analysis though, I don't think your choice will have
significant stylistic ramifications.
One final thing about the enum declaration; it has no tag and as such, no
objects can later be declared to have that type. And although tagless
structure and union declarations declared like this are useless, this is not
so for enumerations. The scope of the enumeration constants declared inside
the braces goes beyond the use of objects of that enumerated type. These
constants have type int and can be used even though no enumerated objects of
that type are actually declared. From my experience, enums are mostly used in
just this manner.

Listing 1
#include <stdio.h>
#include <stdlib.h>

/* structure type flag values */

#define TYPENONE 0 /* Not pointing at an object */
#define TYPECHAR 1 /* char */
#define TYPEINT 2 /* int */

#define TYPELONG 3 /* long */
#define TYPEDOUBLE 4 /* double */

struct node {
struct node *pfwd; /* forward ptr */
struct node *pbwd; /* backward ptr */
void *pobject; /* ptr to object */
unsigned int objtype; /* indicate object type */
};

main()
{
char c = 'A';
int i = 10;
long int 1 = 123456;
double d = 123.45;
struct node *pnode;

pnode = malloc(sizeof(struct node));

/* let's point to a double */

pnode->pobject = &d;
pnode->objtype = TYPEDOUBLE;
pnode->pfwd = NULL;
pnode->pbwd = NULL;

/* at a later point, let's process the object to which we point */

switch (pnode->objtype) {

case TYPECHAR:
printf ("char: %c\n", *(char *)pnode->pobject);
break;

case TYPEINT:
printf("int: %d\n", *(int *)pnode->pobject);
break;

case TYPELONG:
printf("long: %ld\n", *(long *)pnode->pobject);
break;

case TYPEDOUBLE:
printf("double: %f\n", *(double *)pnode->pobject);
break;

case TYPENONE:
printf ("none:\n");
break;
}
}

The output generated by this program is:
double: 123.450000


Listing 2
#include <stdio.h>

#include <stdlib.h>

void prochar(void *parg);
void proint(void *parg);
void prolong(void *parg);
void prodouble(void *parg);
void pronone(void *parg);

/* structure type flag values */

#define TYPENONE 0 /* Not pointing at an object */
#define TYPECHAR 1 /* char */
#define TYPEINT 2 /* int */
#define TYPELONG 3 /* long */
#define TYPEDOUBLE 4 /* double */

struct node {
struct node *pfwd; /* forward ptr */
struct node *pbwd; /* backward ptr */
void *pobject; /* ptr to object */
unsigned int objtype; /* indicate object type */
};

main()
{
char c = 'A';
int i = 10;
long int 1 = 123456;
double d = 123.45;
static void (*funtable[])(void *parg) = {
pronone,
prochar,
proint,
prolong,
prodouble
};
struct node *pnode;

pnode = malloc(sizeof(struct node));

/* let's point to a double */

pnode->pobject = &d;
pnode->objtype = TYPEDOUBLE;
pnode->pfwd = NULL;
pnode->pbwd = NULL;

/* at a later point, let's process the object to which we point */

(*funtable[pnode->objtype])(pnode->pobject);
}

/* processing functions */
void prochar(void *parg)
{
printf("char: %c\n", *(char *)parg);
}

void proint(void *parg)

{
printf("int: %d\n", *(int *)parg);
}

void prolong(void *parg)
{
printf("long: %ld\n"; *(long *)parg);
}

void prodouble(void *parg)
{
printf("double: %f\n", *(double *)parg);
}

void pronone(void *parg)
{
printf("none:\n");
}













































Questions & Answers


More On Keyboard Routines, A Preprocessor Puzzler




Kenneth Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


You may fax questions for Ken to (919) 493-4390. When you hear the answering
message, press the * button on your telephone. Ken also receives email at
kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP).


Q
I am writing a program using Microsoft C v5.1 that connects to a mainframe
session to execute an on-line mainframe application. I used the High Level
Language Application Programming Interface (HLLAPI) to do the terminal
emulation.
One of the options that I am trying to incorporate into this program is the
ability to toggle a debug switch on or off that will print certain information
to an audit file. My intention is to use a function key for this purpose. When
I send a string of keystrokes to the mainframe session using HLLAPI, these
characters are placed in the keyboard buffer. If I press a function key, this
is also placed in the keyboard buffer and sent with the other characters to
the mainframe session, thus messing up the mainframe command in addition to
losing the function key before the program finally gets around to testing to
see if the function key has been pressed.
In IBM Compiler BASIC, the ON KEY (x) command would handle this situation,
presumably by doing an interrupt whenever the appropriate function key is
pressed. I have not been able to find a similar prewritten function in C.
Therefore, I have been attempting to learn how to code an interrupt, but have
not been able to find a really good book that explains them well. From the
bits and pieces I have been able to find, I think that I need to change
interrupt 9 or 15 or 16, but can't figure out which one or how. What I need is
an example of how to code this interrupt and a good book on the subject (one
that uses C code for examples, not assembler, if possible).
Mike Drew
Woodridge, IL
A
You could write a TSR that intercepts the keyboard interrupt, tests the key,
and either sets a debug flag or passes it on. The Blaise toolkit provides an
easy way of creating a TSR, although there is a bit more overhead than what
you require for your one operation. That's the simple approach, with no
assembler required.
Microsoft provides an "interrupt" modifier for a function type which allows it
to be a handler for an interrupt. It receives as parameters the values of the
register when the interrupt was called. To handle an interrupt you:
1. create a function to perform the desired service with the interrupt type
modifier.
2. call_dos_getvect() to get the current interrupt handler's address
3. call_dos_setvect() to set the interrupt location to your function's address
4. call_chain_intr() to chain to the previous interrupt's address.
For your keyboard routine, you want to connect to the key I/O interrupt 0x16.
The characters typed on the keyboard are handled by the keyboard interrupt
routines (0x9) and are placed in a circular buffer. When the key I/O routine
is called by a user program (such as by kbhit() or getch()), it checks the
keyboard buffer for the next character to return.
Using far pointers, your function can look at this circular buffer and
eliminate your hot key, if it is in there. The offset of the head of the
buffer is at 0040:001A, the tail offset is at 0040:001C, and the offset of the
end is at 0040:001C. The offsets are relative to segment 0040. Note that these
addresses are only good for IBM-PC compatibles. The IBM Technical Reference
Manual gives the full information, but unfortunately it is documented in
comments to assembly code.
Q
I am currently using Turbo C v2.0 with MS-DOS v2.2 on my Zenith PC. However, I
also own an older KayPro portable equipped with a Z-80 processor and running
CPM 2.2. At the present time finances preclude replacing the KayPro with an
MS-DOS battery powered laptop.
As a result, I am searching for a CPM/Z-80 C compiler for the KayPro, that
would allow me to develop C code on the PC and then recompile and run it on
the KayPro. In addition I am not willing to pay a large price for the CPM
compiler because of its memory and speed limits. If you have information
regarding CPM/Z-80 C compilers, I would appreciate your response.
Ronald L. Nave
Horsham, PA
A
I used to use the Manx Aztec Compiler to perform exactly what you are talking
about. In addition, I also ported the C code from the IBM-PC to an Apple II.
Although you could use the latest version of the compiler on the PC, it might
be better to use an old version that matches the version available for CP/M.
(None of the recent ANSI features are available in the CP/M version.) If you
use the older version on both machines, you will not be tempted to use some of
the more modern features, such as structure assignment. This version may be
available from Manx or on the used compiler market.
Q
We are using Microsoft C v5.0. We would like to define a manifest constant to
be used both as a number and as part of a message string. For example:
#define Max 10

extern short Value;
extern DoMsg(char *Msg)

if (Value > Max)
DoMsg(<what goes here?>);
One way to do this is to itoa() the constant and assemble the message (at
runtime), (or printf could be made to do this), but we thought we could get
the preprocessor to do this sort of thing. All our attempts -- pasting,
stringizing, etc. -- have been foiled. For example, use of the stringizer,
"#", suppresses macro expansion, so the constant comes through unsubstituted.
If we set up a pasting macro, there's no way (that we could find) to paste
quotes onto something. The point of all this is to be able to have just one
manifest constant to change if the value changes, and have that apply to both
value uses and string uses within the module. Help!
Josh Cohen
Stuart Downing
Dexter, MI 48130
A
You got me on this one. I had assumed that the following would have worked, as
you tried, till I read the ANSI standard again.
#define MAX 10
#define STRING_WITH_VALUE(X)"Error message " #X


DoMsg(STRING_WITH_VALUE(MAX))
The # operator is a new ANSI preprocessor operator that only works as part of
a #define with tokens (a function-like macro). It is a "stringizing" operator.
It forms a string literal (a set of characters in quotes) from the token that
follows. However the token is not processed for replacement. Thus the output
of the preprocessor looks like:
DoMsg("Error message " "MAX")
and not
DoMsg("Error message " "10")
In the first case, the implicit concatenation of string literals in ANSI C
yields the following:
DoMsg("Error message MAX")
If the output of the preprocessor had been the second case, then you would
have gotten exactly what you wanted.
Replacement of #define names is not supposed to occur within a string literal.
The operation of the # operator appears to be consistent with that philosophy.
Perhaps a reader can solve this problem for us. A solution would certainly be
useful. For example, I have been meaning to rewrite quite a bit of code to
look something like:
#define WFIRST_NAME 20
#define WLAST_NAME 30
struct s_record
{
char first_name[WFIRST_NAME];
char last_name[WLAST_NAME];
...
};
struct s_record record;
...
#define quote(x) #x

printf("\n Record is %" quote(WFIRST_NAME) "s %"
quote (WSECOND_NAME) "s",
record.first_name, record.last_name);
which would yield (if it worked):
printf("\n Record is %20s %30s",
record.first_name, record.last_name);
One alternative is:
#define MAX_BUFFER 200 /* Or whatever */
char DoMsgBuffer[MAX_BUFFER];/* Local or Global buffer */
#define DoMsgInt(string, integer) \
{ \
sscanf(DoMsgBuffer, "%s %d", string, integer); \
DoMsg(DoMsgBuffer); \
}
Now instead of calling DoMsg directly, you call;
DoMsgInt("Error message", MAX);
This at least eliminates having to change two constants.
An uglier alternative uses a separate quote constant with lots of comments
about changing both.
#define MAX 10 /* If you change this, change S_MAX
to match - or else */
#define S_MAX "10"
DoMsg("Error Message" S_MAX);
Q
This month's C Users Journal (Nov. 89) contains an article by P.J. Plauger
describing the print facilities in Standard C. His review of these commands
brings back a gripe I have long held about C, which I wish to raise.
I was brought up as a Fortran programmer, and was weaned on its FORMAT
statements. Say what you like, Fortran allows a form which is much missed by
those of us who have to format complicated output, and that is the "repeat"
format construct. Thus in Fortran, a format like 10I3 would replace a lot of
writing in Standard C. Even more useful are statements like 4(3I2,2X,4F6.4),
which formats 28 variables in 15 characters, and even adds blank spaces where
desired.
I am well aware that no two languages are equivalent, but it would seem to me
that somebody by now has worked out a scheme to allow multiple format
statements of similar types to be written with less difficulty than Standard C
allows. What do the gurus say to this?
David Tal
Haifa, Israel
A
I am an ancient Fortran programmer also, or should I say, one who used Fortran
in relative antiquity. One of the least desirable aspects of the FORMAT
statements was having to count the number of characters (e.g. 8HVALUE IS) for
character output. Because this was the syntax for characters, the leading
digits on a format specifier (I, X, F) could be used as a repetition count.
Where the language does not feature a particular item you are interested in,
you can always come up with your own version.
With the printf format specifier, everything that is not preceded by a % is
output as a literal character. You might come up with a scheme such as: "%r%Z"
where r is a repetition count (an integer) for the next format specifier. You
might even use "%r (x)" where the x represents a string that will be repeated
r number of times.
All you need to do is write a routine something like:
char *repeat_format(format)
char *format;
which you would use in a printf such as:
printf(repeat_format("%5%3d %2(%d ABC)",a,b,c,d,e,f,g,h);

and which would print the equivalent of:
printf("%3%3d%3d%3d%3d %d ABC %d ABC",a,b,c,d,e,f,g,h);
The routine might be easier if you always required the parenthesis. Otherwise
it will have to determine where the end of a simple format specifier is. The
function would need an internal buffer, whose address it could return.


Reader Requests


I have a question which concerns especially the IBM PC but maybe you know a
solution for my problem.
In writing a special data transfer program for an IBM PS and some
telecommunications equipment I encountered the following task: Is there any
way to determine whether there is a mouse connected to a specific serial COM
port? Of course I know how to detect the presence of the mouse driver (Int
33h, function 0). But this doesn't tell me which port is used!
Thank you, and I hope to receive a reply soon.
Michael Wiedmann
West Germany
That is an interesting problem. The mouse driver that I use does not seem to
check for presence of a mouse on the COM port. It assumes that one is there
and displays a random position. Anyone have an answer? (KP)
I am a new subscriber to The C Users Journal. I found the article "Pointer
Arithmetic at Memory Segment Boundaries" by D. and N. Saks (Vol.7 #7 pp. 27)
very informative.
I'm interested in writing applications in C that use expanded memory to store
variables and data. I'm also interested in finding information regarding
writing applications in C that can run in expanded memory. Can you direct me
to sources of such information?
Thank you for your time and continued success with The C Users Journal.
Phil Pistone
Chicago, IL
The Intel/Lotus specifications give the full details on the interface (KP).
Anyone have suggestions as to a readable version?
(Ed. Note: You might find what you're after in two articles, both in old
Microsoft Systems Journals: "Expanded Memory: Writing Programs That Break the
640K Barrier", M. Hansen, Krueger, B., and Stuecklen, N., March 1987, p. 21;
and "Extended Memory Specification 2.x: Taking Advantage of the 80286
Protected Mode", Chip Anderson, July 1989, p. 17.--rlw)


Reader Responses:


Here is some information for Jeff Saraiva's questions concerning graphics and
the MSC graphics library that was published in your November issue. Sample
Programs:
I have not seen any extensive programs that use the MSC graphics library. Two
good books that I have used are: Advanced Graphics in C and Graphics
Programming in C. The authors develop their own graphics library, but the
concepts are the same and it should be an easy matter of replacing calls to
these libraries to the MSC graphics library routines.
UNIX Tests:
You pretty much named the worthwhile ones.
TIFF Information:
The May 1988 issue of Dr. Dobb's Journal has an excellent article about TIFF
starting on page 26. The article has an address to which you can write for the
source code and documentation to a TIFF library that reads and writes TIFF
files. The code and documentation is free. There are two libraries; one for
the MAC and one for the PC. I have used both of these and they work very well.
The documentation is very complete.
Ray-tracing Algorithms:
Sorry, can't help you on this one.
Joseph K. Vossen
Duluth, GA
Printscreen:
In your "Q?/A!" column in The C Users Journal, v7n8, you suggested that Roger
Glocke kill the "printscreen" key by replacing the vector for INT 5. That is
the hard way, and (as you point out) can lead to trouble.
The easy way is to set PrtScr's data byte (at address 0050:0000) to "busy" so
the ROM BIOS interrupt handler will ignore the key stroke. The attached
listing (Listing 1) illustrates the technique with a small stand-alone utility
to set the byte or restore it to "not busy." To use the technique embedded in
another program: Initialize PrtScr as a static far pointer as shown, assign -1
to the data byte (*PrtScr) at the beginning of main(), and assign 0 to it
before exit (it is safest to use a local function called by atexit() for
this).
Turbo C has the MK_FP macro (in dos. h), which could have been used to
initialize the far pointer in line 12. However, Mr. Glocke said he uses
Microsoft C, which does not provide this macro. The construction shown should
work with Microsoft C.
Murray L. Lesser
Yorktown Heights, NY
I had mixed emotions about seeing my letter in your column of the November
issue of The C Users Journal that arrived just today.
I was pleased because I learned something from your very lucid discussion of
the problem raised in my letter. I was a little embarrassed because within a
week after I wrote the letter the problem was solved. A discussion with Fred
Crigger of Watcom Products, Inc., revealed that the solution was to write some
string functions that would accept far pointers to data items and return near
pointers to data items, and vice versa.
I wrote the necessary functions, virtually all the string functions in the
standard library, and made the appropriate changes in the program. As of this
moment everything is working very well and I am rather proud of my efforts.
Incidentally, I have switched completely to Watcom C v7.0 and am delighted
with it. It does compile slowly but yields .exe files that are consistently
two-thirds the size of those produced by Microsoft. And, the people at Watcom
have been very generous with their support.
I do appreciate both your going to the trouble to answer my letter and your
column -- I am always better informed after reading it.
Fred C. McDaniel
Richardson, Texas

Listing 1
KILLIT.C

/*
*KILLIT.COM - A small utility to kill or re-enable the PrtScr function
* Usage: Call with no argument to disable PrtScr function
Call with any argument to enable PrtScr function
Written by M.L. Lesser, 10/27/89
Compiled with Turbo C (TCC) v 2.00, switches -mt -lt -o

to link as COM file
*/

#include <MIO><N>.h

char far *PrtScr = (void far *)((unsigned long)0x50 << 16);

void main(argc)
{
if (argc == 1) /* No arguments on the command line */
{
*PrtScr = 1; /* Set PrtScr "busy" signal */
cprintf("Print screen function has been disabled");
}
else
{
*PrtScr = 0; /* Set PrtScr <169>available<170> signal */
cprintf("Print screen function has been enabled");
}
}











































Applying C++


Building A Text Editor, Part 2: Buffers, Sloops, And Yachts




Tsvi Bar-David


This article is not available in electronic form.

















































Implementer's Notebook


Life With Static Buffers, Part 2




Don Libes


This article is not available in electronic form.

















































The HALO Graphics Library


Victor Volkman


Victor R. Volkman received a BS in Computer Science from Michigan
Technological University in 1986. Mr. Volkman is a frequent contributor to The
C Users Journal and the C Gazette. He is currently employed as Software
Engineer at Cimage Corporation of Ann Arbor, MI. He can be reached at the HAL
9000 BBS, (313) 663-4173, 1200/2400/9600 baud.


The HALO Graphics Library by Media Cybernetics, Inc. supports
device-independent graphics programming with more than 200 functions. HALO
provides device drivers for dozens of vector and bitmap graphics boards, dot
matrix and laser printers, page scanners and video digitizers, graphics
tablets, mice, and plotters. Some of the more popular graphics boards
supported include CGA, EGA, VGA, Extended-VGA, MCGA, PGA, Hercules, and AT&T
Targa. HALO for DOS lists at $395. HALO for OS/2 lists at $695 and is
source-code compatible with the DOS version.
The BARGRAPH application in Listing 1 demonstrates the style and ease with
which HALO can be integrated with C programs.


System Requirements


HALO makes only modest system hardware requirements. It will run on an IBM XT,
AT, 3270 PC, AT&T 6300 or other true compatible computer with a base memory of
256k RAM. The computer must have at least one supported graphics device.
Additionally, HALO requires MS-DOS v2.1 or later. For software development,
you must have any one of the supported languages: Microsoft MASM v5.0, BASICA,
QuickBASIC v4.0, Turbo BASIC v1.0, Lattice C v3.0+, Microsoft C v3.0+, Turbo
C, Microsoft FORTRAN, Ryan-McFarland FORTRAN, Gold Hills Lisp, Microsoft
Pascal, or Turbo Pascal v4.0. Development for the BARGRAPH application was
completed on a 12.5 Mhz AT-compatible with 640K RAM and an Everex EV-640
graphics card (CGA and Hercules compatible). The BARGRAPH program was compiled
with Microsoft C 5.1 and linked with the small model HALO library.
HALO is a graphics kernel system structured like a layercake (see Fig. 1).
Each layer may only talk to the layers directly above and below it. On the top
layer is your source application program as written in any supported language
(C, BASIC, Pascal, etc.). The application program layer contains many function
call references to HALO . Since each language has its own parameter passing
mechanism, a language binding layer is needed. The language binding presents
the function arguments to the graphics kernel in a standard fashion.
The operations of the graphics functions themselves are split between the
graphics kernel and device driver layers. The graphics kernel is linked into
your application program. It performs the device independent functions such as
polygon drawing, text manipulation, and viewport management. The components in
the device driver layer speak directly to the hardware. Typical graphic device
driver functions are vector drawing and bitmap panning. For maximum
flexibility the device drivers are loaded dynamically.
HALO supports devices with four types of color palette management: devices
which have a fixed set of colors that cannot be changed; devices which allow
you to switch between several predefined palletes; devices which support more
colors than can be displayed at one time (e.g. IBM EGA); and devices which
support a programmable palette. (In the third case, colors are changed by
specifying both index and bitmask for the palette.)
HALO supports devices in modes up to 16 bits per pixel (65,536 colors). In
general, the bits of the same magnitude (i.e. power of 2) of each pixel in the
display are referred to collectively as a bit plane. The graphics bitmap is
defined as the sum of all the bit planes. The actual physical mapping of
pixels in memory varies enormously between various graphics cards and their
display modes. Fortunately, the HALO device drivers sufficiently hide this
information so the programmer need never be concerned with such low-level
details.
The HALO package supports three different types of coordinate systems: device
coordinates, world coordinates, and normalized device coordinates. Which
system you choose depends entirely on your requirements for
device-independence. HALO provides functions to convert between any of the
coordinate systems.
Dealing with aspect ratios is an important part of the graphic environment.
Aspect ratio is used to convert from the perfect mathematical coordinate plane
to the real-world imperfect graphics device. Specifically, the aspect ratio is
the ratio of a pixel's width to its height. For example, the IBM EGA displays
640 x 350 pixels on a display 9.6" wide and 6.0" high. Each pixel is
9.6 inches / 640 pixels = 0.015 inches / pixel (width)
7.2 inches / 350 pixels = 0.0205 inches / pixel (height)
Click Here for Equation
HALO automatically corrects circles, ellipses, and arcs for aspect ratio. If
the correction was not applied, then a circle with a 100 pixel radius would
appear to be 100 x 0.015 = 1.5 inches wide and 100 x 0.0205 = 2.05 inches
tall. HALO always corrects in the vertical component so that it would actually
produce a circle 100 x 0.015 = 1.5 inches wide and 100 x 0.73 x 0.0205 = 1.5
inches tall. However, it is strictly up to the programmer to include aspect
ratio in his own calculations for boxes, lines, and other objects.


Graphics Objects


HALO offers the programmer all the necessary tools for drawing a variety of
graphics objects including filled polygons and spline curves. All of the
drawing operations make use of the graphics cursor. The graphics cursor is an
invisible reference point on the display. The graphics cursor may be set at an
absolute location or moved relative to its current position. The graphics
cursor is used as the first point from which all line and polyline functions
are drawn. It is also the center point for circle, arc, pie wedge, and ellipse
drawing functions. Lastly, the graphics cursor is the starting point for fill
functions.
Since lines are the most frequently used graphics objects, they require the
most flexibility. HALO will draw a line from the current graphics cursor to a
relative position (line relative) or to an absolute position (line absolute).
You may specify both the width in pixels and the style of the line. HALO
offers three basic line styles and seven user-defined line styles.
HALO has built-in functions for creating filled circle, pie wedge, box, and
polygon shapes. Objects may be filled in the current color as solid or with a
hatch style. Objects may be filled as they are drawn or filled later with a
flood-fill function. HALO offers five basic hatching styles and five
user-defined hatching styles.
In addition to geometric objects, HALO supports three types of graphics text:
dot text, fast text, and stroke text. Dot text is a general purpose bitmapped
font. HALO includes six dot text fonts, whose height and width may be scaled
in integer multiples. Dot text may be drawn in any of the four compass
directions. HALO maintains a special cursor called a text cursor for dot and
stroke text.
Fast text is a special purpose bitmapped font whose data is taken from the
graphics board's own ROMs. Additional font files are thus not used for fast
text. Fast text may only be drawn at integer row and column text positions.
Additionally, fast text may only be drawn from left to right.
Stroke text is HALO's most sophisticated text display. Stroke text is not
defined as a bitmapped font but rather as a series of brush strokes or
vectors. Since stroke text is displayed as vectors, it uses all the current
line settings. Stroke text may be sized and rotated to any angle desired. When
using stroke text drawn at an angle, the programmer must consider the aspect
ratio of the display. The BARGRAPH program uses only the stroke text to
achieve the highest quality image. Fig. 6 summarizes the tradeoffs between the
various HALO text display schemes.


Advanced Features


HALO has a variety of features essential to the development of advanced
graphics applications including area moves, rubberband functions, and the
"Virtual Rasterizer Interface". Area moves involve copying from one part of
the bitmap to another. The movefrom() and moveto() functions allow a rectangle
of the display to be cut and pasted respectively. The moveto() function allows
the buffer to be pasted in one of several modes including XOR, AND, OR, and
complement.
Rubberband functions, like area moves, are designed to make interactive
graphics programs easier to write. A rubberband object is one that can be
stretched and dragged across the graphics screen without disturbing it. For
example, you could write a simple function which polled the mouse to
interactively position the endpoint of a vector. Each successive call of a
rubberband function deletes (XORs) the previously displayed object and
simultaneously writes it at a new position. HALO supports rubberband lines,
boxes, and circles.
The "Virtual Rasterizer Interface" (VRI) allows you to create a virtual
graphics display of any horizontal, vertical, and color resolution desired.
VRI will use any combination of MS-DOS base memory, EMS memory, and disk space
to store the image. The most common use of VRI is to assemble an image for a
laser printer page. For example, an "A" size drawing (8.5" x 11") at 300 dpi
is effectively a 2550 x 3300 pixel image, requiring just under one megabyte of
storage. Once the VRI device is initialized, it accepts the same HALO calls as
any other raster device. A VRI can be configured for up to a 16383 x 16383
resolution image or 32 megabytes, whichever is smaller.


BARGRAPH - A Small Application For HALO


The BARGRAPH demonstration application produces high-quality charts simply and
efficiently. BARGRAPH takes a language-driven approach to specify the
parameters of a chart. The PC-DOS usage of this program is "BARGRAPH datafile"
where datafile is a plain ASCII file containing command strings. Each command
string specifies a single detail of the chart such as the scale or legend.
BARGRAPH input files include the HALO specific configuration data as well as
the actual graph data. A typical BARGRAPH data file is shown in Fig. 7.
The operation of the BARGRAPH program is roughly divided into two phases. In
the first phase, the datafile is parsed a line at a time and stored into the
cmd_data[] static structure. The function process_graphics_cmd_line() is
called once per input line. This function determines the command keyword and
parses its arguments into the appropriate slot of cmd_data[]. The DATA,
COLORS, MODE, and SCALE commands call parse_delimited_number_list() to store
data in the numeric half of the udata union. Similarly, the LEGEND, FONT,
TITLES, and DEVICE commands call parse_delimited_string_list() to place data
in the string array half of the udata union. The COMMENT and END commands
serve only documentation purposes and are thus ignored. The complete BARGRAPH
syntax is diagrammed in Fig. 8.
The second phase uses data supplied in the static structures to setup the HALO
environment and plot the graph on the screen. The HALO environment is
established in two phases. First, the function setup_halo_globals() both
inquires about the capabilities of, and sets the parameters for, the graphics
device. A global structure called halo, devised expressly for this program,
tracks HALO environment values throughout the program. The setdev() and
initgraphics() functions must be the first two HALO calls in an application
program. These load the device driver from disk and set the hardware graphics
mode respectively. The remainder of the HALO calls in setup_halo_globals() set
the degree mode, world coordinate rectangle, line width, line style, drawing
color, and the stroke text font and color (see Fig. 9)
The function setup_graph_globals supervises the second phase of
initialization. A global structure graph separates the BARGRAPH program data
from the HALO data. The graph structure holds data in a form which will
simplify calculations later. If the user does not supply SCALE Y-Axis upper
and lower bounds, BARGRAPH will use the min and max data points as the scale
range.
The bar graph is drawn by draw_bar_graph(). First, draw_axes() produces the
axes in three steps. First, the legend string, horizontal X-Axis, and vertical
Y-Axis are drawn at predefined coordinates. Secondly, tick marks and their
labels are drawn along the Y-Axis. (The draw_axes() function makes a total of
ten ticks above the Y-Axis.) Finally, the title for each bar is drawn below
the X-Axis at a 45 degree angle -- the angle keeps titles from overwriting
each other. Since each stroke text character is a different size, the
inqstsize() function must be called to determine the actual space required for
each title string.
Once the axes are complete the bars are placed on the screen. If the graphics
device is monochrome or the user has not specified any bar colors then a
sequence of four hatch styles will be used. This ensures that default graphs
are displayed similarly on monochrome and color graphics devices. The
equations for determining the bar size are shown in Fig. 10.



Improving BARGRAPH


Some simple enhancements which might greatly increase the utility of BARGRAPH
include the following:
(1) Read the HALO-dependent commands (DEVICE, PRINTER, MODE, etc.) from a
default configuration file (e.g. BARGRAPH.CFG) so they need not be repeated in
each data file.
(2) Add aspect-ratio calculations to standardize the look of the graph.
(3) Add line graph and pie-slice graph types to the program. Create a new
command called CHART to specify the graph type.
(4) Allow the user to capture the graph display and save it with a file format
which can be read by desktop publishing programs.


Conclusion


The HALO Graphics Library by Media Cybernetics is a highly useful programming
tool for developing your own graphically oriented programs. The versatility,
efficiency, and functionality of HALO are easily demonstrated by BARGRAPH. The
BARGRAPH applications program as presented required less than two dozen
different functions out of the 200 offered in HALO. The executable file
amounts to just under 100K plus about 12K for device drivers, a fairly modest
memory requirement. The most important contribution to BARGRAPH is the ability
to operate with any combination of the dozens of screen and printer drivers
that HALO offers.


Rasters, Pixels, Vectors, Palletes -- Elements Of The Graphic Environment


Graphics objects may be constructed from pixels or vectors. A pixel (or
Picture Element) is the smallest resolvable discrete point on a graphics
device. Graphics devices addressable only by pixels are known as raster
devices. The resolution of a raster-type graphics card or mode is expressed in
pixels. For example, the minimum resolution of the IBM EGA card is 640 columns
x 350 rows of pixels. In the special case of a monochrome display, a pixel
directly corresponds to a single bit in display memory. Color displays require
more than one bit per pixel to describe the color of the pixel. For example,
the IBM EGA card uses four bits per pixel to produce a total of 24 = 16
colors.
In contrast, vectors are line segments defined by a starting point, direction,
and length. Although every raster device can display vectors, vector devices
do not have bitmaps and cannot display pixels, as such. For example, a pen
plotter typically has no knowledge of vectors it has already drawn. Certain
hybrid graphics devices, such as the Control Systems Artist, accept both
raster and vector data.
Every graphics device, raster and vector, has a finite set of discrete
displayable colors called the palette. On color devices, each pixel is
displayed in the color corresponding to its index in the palette. For example,
the IBM EGA has a palette of 16 colors out of 64 available. Fig. 2 shows a
portion of an example IBM EGA palette: a pixel with index of 15 would be
bright white (all bits set) whereas a pixel with index of 3 would be dull red
(only 1 red bit set).
The most flexible graphics devices support a programmable palette.
Programmable palette devices allow you to specify integer values for the
amount of Red, Green, and Blue (RGB) components of each color. For example,
the Number Nine Revolution in 832 x 624 resolution has a palette of 16 colors.
Each index of the palette has 256 possible values for each RGB color
component.
Coordinate Systems
The HALO package supports three different types of coordinate systems: device
coordinates, world coordinates, and normalized device coordinates. Which
system you choose depends entirely on your requirements for
device-independence. A summary of the coordinate systems is presented in Fig.
10. HALO provides functions to convert between any of the coordinate systems.
The device coordinate system maps each logical coordinate directly to its
physical coordinate or pixel. In the device coordinate system, the upper-left
corner of the screen is at (0,0) and the lower-right corner is at the maximum
coordinate. For example, on the Hercules Monographics card with a resolution
of 720 x 350 the upper-left corner is (0,0) and the lower-right hand corner is
(719,349) (see Figure 3). In HALO, device coordinates have the advantage that
they can be expressed in integers rather than floats.
Since device coordinates are dependent on the resolution of the output device
you use, they are a poor choice for writing portable applications. The world
coordinate system allows you to specify your own resolution independently of
the hardware. This coordinate translation means that even though the Hercules
and IBM CGA cards have different heights and widths, your program can operate
exactly the same for both of them.
When enabled, HALO will translate from world coordinates to device coordinates
automatically. For example, if you were to define the world coordinates from
(-100.0,-100.0) to (100.0,100.0) then a reference to (0.0,0.0) would map to
the center of the display. World coordinates assume a Cartesian orientation.
In HALO, world coordinates are expressed as floats rather than integers. The
BARGRAPH program uses a world coordinate system from (0.0,0.0) to (1.0,1.0).
Normalized Device Coordinates (NDCs) are another way of mapping from logical
coordinates to physical coordinates. NDCs are like device coordinates because
the upper-left corner is always the origin of the screen (see Figure 4B). NDCs
differ from device coordinates in that the location of lower-right corner of
the screen is always the same regardless of the actual output device being
used. The only difference between NDCs and world coordinates is that the
upper-left corner and lower-right corners are fixed at (0.0,0.0) and (1.0,1.0)
respectively in the NDC system. NDCs are used in the HALO function
set-viewport() to allow viewports (i.e. windows) to be nested in a
device-independent way.
A viewport is a region of the display into which graphics are mapped. By
default, the viewport includes the entire screen from (0.0,0.0) to (1.0,1.0)
in NDCs. After setting a viewport, all graphics calls in world coordinates
will map into the new viewport. Only one viewport can be in effect at any
time. for example, to put a viewport in the upper-right hand quadrant of the
screen you would specify (0.5,0.5) and (1.0,1.0). Figure 5 shows a bargraph
mapped into the upper-right quadrant specified.
Figure 1
Figure 2 Example Palette for IBM EGA
Figure 3 Example Device Coordinates
Figure 4a Example World Coordinates
Figure 4b Normalized Device Coordinates
Figure 5
Figure 6 HALO '88 Graphics Text Summary
 Display Drawing Display
Text Type Quality Speed Flexibility
Stroke text High Slow High
Dot text Medium Medium Medium
Fast text Low Fast Low
Figure 7 Example BARGRAPH data file
COMMENT this is a test of the bargraph application
DEVICE HALOHERC.DEV
MODE 0
PRINTER HALOEPSN.PRN
ATTRIBUTES -1,-1,0,0,0,0,0,0,-1,0,0,-1,1,-1,-1,-1,0
FONT HALO104.FNT
COLORS 1,2,3
LEGEND 1989 Projected Sales
TITLES Jan,Feb,Mar,Apr,May,Jun,
TITLES Jul,Aug,Sep,Oct,Nov,Dec
SCALE 0.0,200.0
DATA 10.0,42.0,130.0,80.0,54.3,140.0
DATA 180.0,135.0,300.0,69.0,94.7,101.0
END


Figure 8 Complete BARGRAPH Command Syntax
COMMAND MEANING DEFAULT
------- ------- -------
COMMENT Documentation only N/A
DEVICE s1 Name of HALO screen device HALOIBMG.DEV
PRINTER s1 Name of HALO printer device none
FONT s1 Name of HALO stroke font to use HALO104.FNT
LEGEND s1 Legend is centered over top of graph none
TITLES s1,s2...sn Titles are displayed underneath BARs none
MODE v1 Graphics mode (device dependent) 0
ATTRIBUTES v1,v2...vn Printer attributes (device dependent) none
SCALE v1,v2 Set extent of Y-Axis from v1...v2 autoscaling
DATA v1,v2...vn Input n data values (may be repeated) none
COLORS c1,c2...cn Color pattern to use for bars monochrome hatch
END Signifies end of a graph N/A

Figure 9
Initialization of HALO '88 in setup_halo_globals()

setdev(halo.device); /* Initialize the graphics device */
setdegree(&halo.degree_mode); /* Use degrees, not radians */
setworld(&halo.x1,&halo.y1,&halo.x2,&halo.y2); /* World rectangle */
setlnwidth(&halo.lnwidth) ; /* Line width is 1 pixel */
setlnstyle(&halo.lnstyle); /* Line style is solid */
setcolor(&halo.maxcolor); /* Max screen color is usually white */
setfont(halo.font); /* Load font from disk file */
setstclr(&halo.maxcolor,&halo.maxcolor) ; /* Set stroke text color */

Figure 10
 Upper-left Lower-right Hardware
Coordinate System Corner Corner Independence
Norm. Device Coord 0,0,0,0 1,0,1,0 Yes
World Coordinates User-defined User-defined Yes
Device Coordinates 0,0 Hardw. depend. No

Figure 11


























A Small Prolog Interpreter


Lindsey Spratt


Lindsey Spratt is a graduate student in computer science at the University of
Kansas, concentrating in artificial intelligence. He received a B.S. in
mathematics from MIT. He worked developing the Multics operating system for
seven years, and then developed CASE tools and researched program
understanding (for specification recovery). A logic programming devotee, most
of his work in recent years has been done in Prolog.




Introduction


Small Prolog 1.32 by Henri de Feraudy is a minimal-featured public domain
implementation of a Prolog interpreter which uses the Cambridge (Lisp-like)
syntax (CUG 297). The source is provided, as well as makefiles for MS-DOS, Sun
and Atari. An executable file named SPROLOG.EXE is provided for the MS-DOS
environment. The distribution also includes 11 files of Prolog examples and
documentation.
I ran Small Prolog under MS-DOS emulation -- SoftPC on a Macintosh IIx. It ran
without any problems and was easy to use.


The Question Of Syntax


There is no official standard for Prolog. However, as is true for some other
languages without official standards (e.g., Common LISP), there is a de facto
standard known as Edinburgh syntax Prolog (a variant developed at the
University of Edinburgh, Scotland). Most Prolog texts use the Edinburgh syntax
or some close variant. Nearly all commercial implementations of Prolog are
Edinburgh syntax.
Cambridge syntax represents everything as parenthesis-delimited lists, giving
it a very LISP-like appearance. The only commercial implementation which used
Cambridge syntax has completely converted to Edinburgh syntax (for a while,
this implementation supported both syntaxes).
In Edinburgh syntax, a predicate to relate an element to a list containing
that element is:
member(Element, [Element \ LisTail]).

member(Element, [IgnoredListHead \ListTail]) :-
member(Element, ListTail).
In Cambridge syntax, this same predicate is:
((member Element (Element \ ListTail)))

((member Element (IgnoredListHead \ ListTail))
(member Element ListTail))
In the Small Prolog documentation, Feraudy lists "improve the syntax" as one
of the projects you might undertake.
There are 11 files of example Small Prolog programs. Considered as a whole,
they provide a nice tutorial introduction to Prolog.
According to the documentation, Feraudy set out to meet these design goals:
A minimal usable implementation.
Maximum portability.
Educational code.
Extensibility.
A small object code.
Embeddability.
Facilitate meta-programming.
Small Prolog meets most of these goals fairly well. The implementation is
usable for executing small Prolog programs. It is minimally usable in that the
programs it supports should be less than a hundred clauses, use only modest
amounts of recursion, and do only simple arithmetic (if any). The code is
portable, extensible, small, embeddable and it supports meta-programming. The
support of meta-programming is important to provide a feel for how one
programs in (real) Prolog. Unfortunately, the source is very lightly
commented, a lamentable condition regardless of purpose, but particularly
unfortunate when the code is intended to be studied.
Small Prolog provides complete support of the logic programming paradigm. I
was particularly pleased to see that Small Prolog supported lack of data
typing and the ability to handle incomplete data structures. Incomplete data
structures are extremely useful in logic programming.
Small Prolog has some of the common Prolog extensions, but lacks others. Small
Prolog is unusual in requiring all arithmetic to be either integer or real --
no mixed type arithmetic is supported. Further, the programmer must choose the
correct arithmetic procedures based on the type of the arguments.
The Small Prolog debugging environment is incomplete. The common facilities
(found, for instance, in Quintus Prolog, C-Prolog, and LPA Prolog) allow the
user several choices when stepping through the execution of a program. These
choices are generally called abort, retry, fail, exit, skip, leap, and
continue. Small Prolog appears to only provide abort and continue.
Most Full commercial implementations, unlike Small Prolog, also allow the
programmer to set spy points on selected procedures. When tracing, the
debugger starts stepping when it encounters a spy point. The skip command
directs the debugger to skip to the next encounter of a spy point. The leap
command directs the debugger to leap to the exit of the current procedure call
(ignoring spy points encountered on the way). Small Prolog also does not
include a portray procedure, which allows the user to define how terms are
printed during a trace. The portray procedure is useful since the data in the
arguments of the goals is commonly large and complexly structured.


Performance


To test Small Prolog's capabilities, I used it to solve the classic N Queens
problem:
Given a square board of N by N cells, find a distribution of (chess) queens on
the board such that no two queens attack each other. Two queens attack each
other if they are on the same row, the same column, or the same diagonal.
The MS-DOS version of Small Prolog can solve up to 9 queens, which is a
relatively small number. The limit is due to the extensive recursion in the
program. This test convinced me that Small Prolog is not useful for any kind
of application development (its too slow and its memory limitations are too
severe). These limitations make it unlikely that Small Prolog could be
successfully embedded in any non-trivial application.



The Source Code


Because the code is sparsely commented, the reader should have another source
for describing how a Prolog interpreter works. De Feraudy mentions the books
on which he based his implementation, and I recommend that you use these when
studying his source.
The files most critical to your understanding are PRLUSH.C and PRUNIFY.C.
PRLUSH.C contains the algorithm for how procedures are executed. PRUNIFY.C
contains the algorithm for matching terms (unification). The execution of
procedures is a depth-first search, with backtracking. Procedures are selected
by pattern-matching between the arguments of the call and the parameters of
the procedure definition. The pattern-matching is done via unification.


Conclusion


Small Prolog is particularly valuable for aiding in the study of how Prolog is
implemented. Because the Cambridge syntax is not used in the available
teaching materials, I don't recommend it as a learning environment for the
serious Prolog student, though if you are just curious about how Prolog works,
you may find Small Prolog useful. This implementation is not suitable for
supporting the development of applications written in Prolog.
Still, this is the only Prolog product which provides the source code for the
interpreter. Now, if only someone would provide the source code for a Prolog
compiler.


A Crash Course In Cambridge Syntax Prolog


A Prolog program is a set of facts and rules. One executes it by posing a
query to the Prolog system, which it then tries to prove using the facts and
rules in the program. If there are variables involved, they are generally
bound to values in the course of building a proof. The output from a query is
the set of bindings of values to the variables in the query.
A query is true if there is some fact which matches it, or if there is a rule
which has a head which matches the query and which has a true body. The body
of a rule is true if all of the goals in it are true. A goal is true if there
is a matching fact, or if there is a rule which has a head which matches the
goal and which has a true body. To simplify the terminology somewhat, instead
of using the terms fact and rule, I speak of both of these as clauses. A
clause has a head and a possibly empty body. If the body is empty, the clause
is a fact. If the body is not empty, then the clause is a rule.
A Prolog program consists of an ordered set of clauses. A clause is a list.
The first element of the list is the head of the clause, and the rest of the
elements of the clause list are goals and are known collectively as the body
of the clause. A goal is a list with an atom as its first element and any term
for its other elements (a goal is a special kind of list, used in a particular
way). The first element is the functor of the goal, the number of elements
following the functor is the arity of the goal. All of the clauses having
heads with the same functor and arity are collectively known as a procedure. I
frequently refer to a procedure by its functor and arity separated by a slash,
/.
An atom is an extended alphanumeric (including underscore, _) token with an
initial lowercase alphabetic character (e.g. foo). A variable is an extended
alphanumeric token with an initial uppercase alphabetic character or an
initial underscore, _ ( e.g. Foo). The head of a clause has the same syntax as
a goal. There are three syntaxes for a list. In all three cases a list begins
with a left parenthesis, (, and ends with a right parenthesis, ). An empty
list has nothing between the parentheses (e.g. ()). A simple list has a series
of tokens separated by whitespace inside the parentheses (e.g. (a b c)). A
cons list (to borrow some terminology from Lisp), has a series of one or more
tokens separated by whitespace starting at the left parenthesis, followed by a
vertical bar, , followed by a term, followed by the right parenthesis (e.g. (a
b Foo)). Generally this last syntax is used when the term following the
vertical bar is either an unbound variable or a list.
Examples are:
Term Type

a atom
atom atom
a_thing atom

A variable
Atom variable
X variable
_vat variable
A_Variable variable

() list (empty)
(foo) list
(foo bar baz) list
(foo bar X) list (with tail of X)
(foo bar (baz)) list (with tail of (baz))
In the examples of lists, (foo bar (baz)) has the exact same meaning as (foo
bar baz).
In the member/2 procedure:
((member Element (Element ListTail)))

((member Element (IgnoredListHead ListTail))
(member Element ListTail))
There are two clauses:
((member Element (Element ListTail)))
and
((member Element (IgnoredListHead ListTail))
(member Element ListTail))
The first clause has an empty body (i.e. there are no goals in its body). Its
head is (notice that one layer of parentheses has been stripped away):
(member Element (Element ListTail))
This head is a list of the three elements: member, Element, and (Element
ListTail). The first element is an atom, the second element is a variable, and
the third element is a cons list. The first element of the cons list is the
variable Element. The tail of the cons list is the variable ListTail. The
second clause is similar in structure to the first, with the addition of a
goal in its body. The functor and arity of the goal is the same as the functor
and arity of the clause, thus this is a recursive procedure.
The interpreter prompts the user for a query with ?-. In the example uses of
member/2 below, the ?- is provided by the system.
?- (member a (a b))
Yes

?- (member b (a b))

Yes
?- ((member X (a b)) (display X) (nl) (fail))
a
b
No
In the last example display/1 is a procedure to display its argument, n1/0 is
a procedure to print a newline, and fail/0 is a procedure which always fails.
Failure in Prolog forces the preceding (successful) goal to try to find
another solution. If it succeeds in finding another solution, then the rest of
the goals (starting with the one which failed) are re-executed. If it fails,
then its preceding goal is retried.


Meta-programming


Meta-programming refers to writing procedures which use procedures as data.
For example, an if procedure can be written via meta-programming:
((if Test ThenGoals ElseGoals)
(Test)
(cut)
(ThenGoals) )

((if _ _ ElseGoals)
(ElseGoals))
This procedure can be used as follows:
(if (iless X 3) (writes "Less than 3") (writes
"Greater or equal to 3"))
The data-like procedures in the if procedure are the three arguments, Test,
ThenGoals, and ElseGoals. Each of these arguments is called in the bodies of
the two if clauses. In the example use of if, Test is (iless X 3), ThenGoals
is (writes Less than 3), and ElseGoals is (writes Greater or equal to 3).
The Test Program
This is the N queens program used to test the
interpreters performance. The program is invoked by:
(queens N Solution) where N is the number of queens
(and the size of the board) and Solution is the
resulting positions of the queens.


/*
(queens +N -Positions)
queens/2 is the main procedure for solving the
N-queens problem. N is input, the number of
queens for which a solutions is desired.
Positions is output, it is the list of the
positions of the N queens such that they dont
attack each other.
*/

((queens N Positions)
(template N Positions)
(solution N Positions))

/*
(solution +N +Position)
N is input, the number of queens for which a
solutions is desired. Positions is partially
instantiated as input and fully instantiated
as output, it is the list of the positions of
the N queens such that they dont attach each
other. On input, each position has only its X
value instantiated. On output, the Y value is
also instantiated.
*/

((solution N ()))


((solution N (PosOthers))
(solution N Others)
(pos_y Pos Y)
(between 1 Y N)
(noattack Pos Others))


/*
(noattack +NewPosition +EstablishedPositions)
NewPosition is input, a new queen position to
check against the list of established list of
queen positions. EstablishedPositions is input,
a list of established queen positions to check
against the new queen position.
It is known that there are no attacks among the
EstablishedPositions.
*/

((noattack _ ()))

((noattack NewPos (CheckPosOthers))
(pos NewPos NewX NewY)
(pos CheckPos CheckX CheckY)
(not (eq NewY CheckY))
(iminus CheckY NewY DiffY)
(iminus CheckX NewX DiffX1)
(not (eq DiffY DiffX1))
(iminus NewX CheckX DiffX2)
(not (eq DiffY DiffX2))
(noattack NewPos Others))
/*
(member ?X ?List)
X is input or output, it is a term in the List.
List is input or output, it is a list which
contains the term X.
*/

((member X (XL)))

((member X (_L)) (member X L))


/*
(template +N -Positions)
N is input, it is the number of queen positions.
Positions is output, it is a list of queen partially
instantiated positions.
The X value is instantiated and the Y value is
unbound. This procedure is used to create the
position template which is used by solution/2.
*/

((template 0 ()))

((template N (Position OtherPositions))
(iless 0 N)
(pos_x Position N)
(iminus N 1 NextN)
(template NextN OtherPositions))



/*
The following 3 procedures are for accessing
the pos data structure, which is used to describe
the positions of the queens.
*/

((pos (position X Y) X Y))

((pos_x (position X _) X))

((pos_Y (position - Y) Y))


/*
(between +Low -Middle +High)
Low isinput, it is the low value in an
integer-valued interval. Middle is output, it is an
integer value between the Low and High values.
High is input, it is the high value in an
integer-valued interval. between/3 can be
used to generate and test integers between Low
and High, until an integer is found which
is satisfactory.
*/

((between L L _))

((between L M H)
(iless L H)
(iplus L 1 NextL)
(between NextL M H))






























Understanding C


Harold C. Ogg


This article is not available in electronic form.






















































Publisher's Forum
This issue debuts a redesigned CUJ. Don't panic, we've only changed some of
the artwork -- the editorial focus and content remains the way you like it.
We have, however, reorganized the mast and table of contents. We've
standardized the treatment of columns and made the artwork for each of uniform
size. Our staff artist Susan Buchanan has designed little icons for each
department and refined the illustrations with each column. In short, in
keeping with our general philosophy, we've made a lot of incremental changes,
but the end effect should be a more attractive, accessible, and manufacturable
product.
We had meant to spring this redesign fully-grown with this issue.
Unfortunately Susan and Howard -- half our editorial staff -- have been ill
for a major portion of this cycle. As a consequence, there remains some little
tweaking to finish the project.
Periodic redesigns are one of the unavoidable "passages" for a magazine. While
the eager j-school graduates will contend fervently that a "fresh" and
successful design will increase sales and attract readers, jaded old editors
like myself learn to relate to redesigns much as experienced husbands relate
to home redecoration. Sure, it's nice to change your environment once and a
while, but it's certainly not nirvana -- and there's certainly nothing to be
gained by placing the sofa in the kitchen.
So, don't expect to find anything that will replace your Van Gogh collection.
Even so, I think Susan's icons are excellent work -- and the table of contents
(the collective work of Ann, Susan and Howard) is a vast improvement. We hope
you find the new design cleaner, easier to use, and just as informative as
ever.
Sincerely yours,
Robert Ward





















































New Products


Industry-Related News & Announcements




Interactive Bundles LPI C


LPI and Interactive Systems Corporation have signed an agreement in which
Interactive will be bundling LPI's ANSI-C development environment with the
386/ix Software Development System v2.2.
LPI has also signed an agreement with Sequoia Systems, Inc., in which LPI will
port its COBOL, NEW C, and Code Watch language products to the Sequoia Series
300, a UNIX-based fault-tolerant system based on Motorola's 68030 processor.
Sequoia will have marketing and distribution rights to LPI's COBOL, NEW C and
CodeWatch products.


Sage Acquires Plink 86 Rights


Sage Software, Inc. has acquired the exclusive worldwide marketing and source
code development rights to Plink86 +, from Phoenix Technologies, Inc. of
Norwood, MA.
Plink86+ operates on PC/XT/AT, PS/2 or compatibles, running MS-DOS v3.0 or
higher with 256K of systems memory. Plink86+ retails for $495.
For more information, contact Sage Software at (800) 547-4000, (503) 645-1150
or FAX (503) 645-4576.


Z-World Releases Z80 Compiler


Z-World has released FLASH C, a C programming tool for the Z80/HD64180/Z180 mP
which includes a compiler and source level debugger.
FLASH C enables the programmer to edit, compile, debug, and run in one
integrated environment. A 200 line C program will compile in approximately
eight seconds.
For more information, contact Z-World at 1340 Covell Blvd., #101, Davis, CA
95616 (916) 753-3722; FAX (916) 753-5141.


New SCO UNIX Supports 486/25


The Santa Cruz Operation, Inc. has released SCO UNIX System V/386 v3.2
Operating System, a multiprocessing extension, SCO MPX and a graphical user
interface, JSB MultiView DeskTop. SCO UNIX System V/386 v3.2 and Open Desktop
will support the new IBM PS/2 486/25 Power Platform for the Model 70-A21. SCO
has also demonstrated new SCO UNIX System technology that enables applications
to run on the new Intel i860-based IBM PS/2 Wizard Adapter.
Multi-user, multi-tasking SCO UNIX System V/386 v3.2 is an AT&T-licensed
implementation of UNIX System V/386 v3.2 for 386- and 486-based computers. It
complies with POSIX and X/Open standards and is designed to meet the U.S.
government's Department of Defense C2-level security requirements. It runs
both XENIX and UNIX System-based applications, and will also run MS-DOS
applications when combined with SCO VP/ix.
Open Desktop is SCO's graphical operating system. Based on SCO UNIX System V,
it includes a relational database, networking, X windows with OSF/Motif, and
MS-DOS compatibility.
SCO MPX is a multiprocessing extension to the SCO UNIX System V/386 v3.2
Operating System and to SCO's new Open Desktop.
SCO MPX is based on multiprocessing software technology developed by
Corollary, Inc. Through a joint development agreement between Corollary and
SCO this technology was adapted to become the standard multiprocessing
extension for SCO Operating Systems.
In addition to OEM designs based on the Corollary 386/smp and 486/smp, SCO MPX
supports the Apricot MC486, the Compaq Systempro, the Mitac Series 500, and
the Zenith Z1000.
SCO MPX will install on any supported computer running SCO UNIX System V/386
v3.2 or SCO's Open Desktop, and will support one additional CPU per package.
In addition to the SCO UNIX System V/386 Operating System, as many as 15 SCO
MPX packages can be utilized on a single machine, thereby supporting 16 total
CPUs.
Modifications are installed into the SCO UNIX System kernel to support
symmetrical, closely coupled multiprocessing. Each CPU runs simultaneously
from a single SCO UNIX System, processing the system and user tasks in
priority order. This automatically balances the load across all CPUs. On
supported hardware, there is nearly a linear increase in overall system CPU
throughput with each added CPU, maximizing the total available computing
resources of the system.
JSB MultiView DeskTop is a graphical user interface for MS-DOS users who want
to share data and files with SCO XENIX and SCO UNIX Systems on a network.
JSB MultiView DeskTop enables users to connect any 286- or 386-based PC
running Microsoft Windows to an SCO XENIX or SCO UNIX System-based host via a
direct RS232 connection or a local area network. The JSB MultiView DeskTop
enables users to choose from local MS-DOS applications or remote SCO XENIX or
SCO UNIX System applications. When selected, these applications appear in
concurrently running windows on the PC. Users can transfer files and "copy and
paste" data among the discrete MS-DOS, XENIX, and/or UNIX System applications.
When used in a "mixed" environment, JSB MultiView DeskTop lets users protect
their investment in third-party MS-DOS software and training while taking
advantage of all the features of the multi-user UNIX System, including
electronic mail and shared programs such as databases.
SCO MPX will be available through all SCO distribution channels in the first
quarter of 1990, and will list for $895. JSB MultiView DeskTop is $149 for a
one-user license, $495 for a five-user license, and $795 for a ten-user
license. For more information, contact The Santa Cruz Operation, Inc., 400
Encinal St., PO Box 1900, Santa Cruz, CA 95061 (408) 425-7222.


TSR Library For Turbo, Microsoft C


Microsystems Software Inc. has released CodeRunneR, an optimized library for
creating Terminate-and-Stay-Resident (TSR) programs with full MS-DOS access,
using Borland's Turbo C and Microsoft C.
With CodeRunneR, all program initialization code and data is eliminated when
the program goes resident.
Contact Microsystems Software Inc., 600 Worcester Road, Framingham MA 01701
(508) 626-8511; FAX (508) 626-8515.



Clarion Offers LEM Maker


Clarion Software Corp. released the Clarion LEM Maker, a collection of tools
for creating LEMs from object modules written in Borland International's Turbo
C.
Priced at $199 retail, the Clarion LEM Maker includes a Clarion program that
creates an assembler language interface between Clarion and Turbo C code, an
extensive library of C functions and two sample LEMs.
Clarion has also released ZIP Code Language Extension Module (LEM), which
retails for $199 and offers library and data files for creating software
applications that retrieve, check, and manipulate ZIP codes and other
information referenced by ZIP codes. For more information contact Clarion
Software Corp., 150 E. Sample Road, Pompano Beach, FL 33064 (305) 785-4555;
FAX (305) 946-1650.


Netwise Offers University Grants


Netwise, Inc. has formed the Netwise University Grant Program and the Netwise
University Discount Program. The grant program entitles qualified researchers
to receive free Netwise software development products. The discount program
allows all university applicants to receive a 75 percent discount on the price
of Netwise products. For more information contact Wayne Moore at Netwise,
(303) 442-8280.


New B-Tran Now Available


Software Translations Inc. (STI) has released v7.5 of its B-Tran Basic to C
translator, which translates QuickBASIC v4.5 to C source code.
B-Tran v7.5 is priced from $499 for the Microsoft C compiler under MS-DOS. For
more information contact Software Translations, Inc., The Carriage House, 28
Green Street, Newburyport, MA 01950 (508) 462-5523; FAX (508) 462-9198.


dAnalyst Code Manipulates dBase Files


dAnalyst for C is the latest addition to the dAnalyst product line from
Buzzwords International.
The product includes an application generator which allows users to create C
applications, such as pop-up and pull-down menus and AT SAY-GETs, with the
look and feel of dBase IV. The generated code calls a set of high-speed video
libraries compatible with MS-DOS, Xenix, UNIX, Desqview, the Apex ADL Library
and the Lattice DBC Library.
dAnalysts's Report Writer and Source Code Generator generates C to process
reports on dBase data files. The libraries that work with the application
generator also work with the report writer. The report writer is also fully
relational with dAnalyst's Screen Painter.
The Windowing Editor allows users to split their screen, then cut-and-paste
code from one side to the other. dAnalyst can convert any single-user dBase
application to multi-user, without any recoding.
The dAnalyst Series supports dBase III Plus, dBase IV, FoxBase+, Nantucket's
Clipper, QuickSilver and C, as well as Xenix and UNIX (under 8086, 68000 or
'386 platforms).
For more information, contact Buzzwords International at 2879 Hopper, Cape
Girardeau, MO 63701 (314) 334-6317.


Utility Structures FORTRAN Code


Cobalt Blue has released FOR_STRUCT v1.1, a structuring utility that
transforms spaghetti FORTRAN-IV and FORTRAN-77 into fully structured code,
with or without VAX and FORTRAN-8X extensions. FOR_STRUCT is available for
MS-DOS, Xenix/UNiX/386, Sun-3 and Sun-4.
The Sun-3 and Sun-4 versions are priced at $1850, Xenix/UNIX/386 are at $1450
and MS-DOS is $825, and include two months free technical support and
upgrades.
Contact Cobalt Blue, 2940 Union Ave., Suite C, San Jose, CA 95124 (408)
723-0474.


Source Level Debugger Works With Aztec, Z180 ICE


Softaid has released v2.0 of its Source Level Debugger (SLD), which supports
Aztec C, and an in-circuit emulator for the Zilog Z180 microprocessor, the
Z180 IceAlyzer.
Softaid's SLD is a source debugger for the firm's line of in-circuit
emulators. Running on a PC, it gives the user a multi-window emulator
interface. The entire state of the user's target system is shown, including
the registers, disassembled code, stack, memory, and I/O. The debugger shows
the user's program, whether written in C or assembly language, in its own
source window.
All facets of Aztec C are supported. The debugger will display any C variable
using the type (integer, character, float, etc.) defined in the program.
Variables local to a function are all automatically shown in the "Watchpoint"
window.
The Source Level Debugger is compatible with all of Softaid's emulators. Both
8- and 16-bit versions of the compiler are completely supported.
The Z180 IceAlyzer costs $3090 and the Source Level Debugger is $795. Both are
available from stock. For more information contact Softaid, Inc., 8930 Route
108, Columbia, MD 21045 (301) 964-8455 or (800) 433-8812; FAX (301) 596-1852.


C + + Class Library Does Matrix Ops


Rogue Wave has released a set of object-oriented numerical tools that extend
all of the standard C arithmetic operators to include vectors and matrices.
Extended versions of the standard C math functions (e.g. cos and abs) are also
included. Many new functions for statistics and numerical modeling
applications have also been provided. A complete complex number class is
included. Fast Fourier Transform server classes allow you to take the FFT or
inverse FFT of any length series (real or complex). Using the inheritance
property of C + +, new classes can be created from the Rogue Wave classes to
do specialized tasks. The classes compile under a variety of C + + compilers
under both MS-DOS and UNIX.
A complete 120-page User's Guide and Reference Manual is provided and full
source code is included for $150.
For more information, contact Rogue Wave, PO Box 85341, Seattle, WA 98145-1341
(206) 523-5831.


UNIX Version 4 Now Available



UNIX International, Inc. and AT&T's UNIX Software Operation have released UNIX
system V, v4. The new release unifies the UNIX System installed base providing
upward compatibility for more than 80 percent of current UNIX System
installations.
Version 4's primary advantages are compatibility, portability of software from
platform to platform, interoperability of software between heterogeneous
systems and scalability from PCs to mainframes.
Contact UNIX International, Waterview Corporate Centre, 20 Waterview
Boulevard, Parsippany, NJ 07054 (201) 263-8400.


HCR Releases C + + For SCO UNIX


HCR/C + + for SCO UNIX includes a C + + compiler that is compatible with
AT&T's v2.0 of C + +, and a window-based source level debugger, dbXtra, based
on dbx from Berkeley v4.3 BSD.
HCR/C + + provides type safe linkages, default membership initialization, and
the ability of each class to define its own operators. It will run on most
386-based system platforms.
HCR's dbXtra adds the ability to operate through windows, permitting users to
review their output and source code easily, even on standard terminals. In
HCR/C + +, dbXtra is linked to C + +, allowing direct debugging of C + + and
window access to the translated C source code. Because all C + + code is
translated into C before execution, programmers can apply dbXtra to examine
either C or C + + code during debugging.
HCR/C + + is $995. Each user of HCR/C + + v1 will also have the option to
upgrade to v2 for $99 delivered. For more information, contact HCR Corp. at
(416) 922-1937, 130 Bloor Street West, 10th Floor, Toronto, Ontario M5S 1N5.


JAM Supports VAX Rdb/VMS


JYACC's JAM and JAM/OB's front end tools are now available to VAX/VMS users.
VMS versions of these tools can now be used to design and develop applications
using Rdb as their database.
Applications developed with JAM are portable across 50 hardware platforms and
10 operating systems.
The JAM and JAM/DBi development kit is $990 for PCs running MS-DOS. For more
information contact JYACC, 116 John Street, New York, NY 10038 (212) 267-7722;
FAX (212) 608-6753.


Enhanced MIRACL Library Released As Commercial Product


After a previous existence as Shareware, a more polished v3.0 of the MIRACL
library is now available as a commercial product. This package allows the C or
C + + programmer to use multiprecision integer and fractional data-types in
their programs.
All routines in the basic MIRACL library are written in standard portable C,
and the source code is included. A full C + + Interface is provided. The
MIRACL library has been successfully implemented on the IBM PC, the Apple
Macintosh, Acorn Archimedes and Digital VAX machines, using a variety of
compilers. MIRACL is only available on PC compatible diskettes, but can be
uploaded from there to any computer which supports a full C compiler. The PC
version needs MS-DOS v2.1 or higher, with a minimum of 256K of memory and a
360K floppy disk drive. Introductory Price is IRœ50-00 Irish pounds (œ45-00
Sterling) for the PC version on two 5¬ inch diskettes, with full
documentation.
For more information, contact Shamus Software Ltd., 94 Shangan Road, Ballymun,
Dublin 9, Ireland Tel: 425430.


Oregon C + + New On VAX


Oregon Software, Inc., has released Oregon C + + for VAX/VMS. Major features
include a souree-level debugger, the NIH OOPS class library, and support of
shared libraries and VAX C calling sequence. Oregon C + + can call any DEC
language as well as Oregon Software's C, Pascal-2 and Modula-2. Oregon C + +
for VAX/VMS runs on VMS v5.0 and later. The Oregon C + + compiler also
includes an ANSI C and K&R C compiler at no additional charge. The release
will include complete compatibility with v2.0 of AT&T's cfront and a task
library.
License fees in the US range from $2000 to $34,000 depending on machine,
cluster or network configuration. Contact Oregon Software, Inc. at 6915 S.W.
Macadam Ave., Suite 200, Portland, OR 97219 (503) 245-2202; FAX (503)
245-8449.


Blaise Updates C Tools Plus Function Library


Blaise Computing Inc. has released C Tools Plus v6.0, a library product for
Microsoft C.
The Library includes virtual, stackable menus and windows with full mouse
support and optional "drop shadows"; multiple virtual pop-up help screens; a
miniature multi-line editor for gathering user responses; a single function
call which can move, resize, and promote a window or menu on top of all
others; the ability to update covered windows automatically when they are
written to; support for EGA, VGA, and MCGA text modes including 30-, 43-, and
50-line modes; and support for the enhanced (101/102 key) keyboard. C Tools
Plus requires Microsoft C v5.0 or later or QuickC, v2.0 or later. The mouse
functions require a Microsoft-compatible mouse and its driver software.
C Tools Plus v6.0 is priced at $149.
Blaise Computing is located at 2560 Ninth Street, Suite 316, Berkeley, CA
94710 (415) 540-5441; FAX (415) 540-1938.


SilverWare Offers Async Library


SilverWare, Inc. has released the SilverComm C Async Library, an asynchronous
communication library for C programmers. The library comes with over 125
communication and 40 advanced functions, and includes free source code,
comprehensive demo and the Norton Guide Database. Documentation provides an
example for each function and includes data sheets for 8250, 16450 and 16550
UARTS. This royalty-free library links directly to your application and
supports all popular C compilers.
SilverComm C Async Library is $249. Contact SilverWare, Inc. at 3010 LBJ
Freeway, Suite 740, Dallas, TX 75234 (214) 247-0131; FAX (214) 406-9999.


Design/OA Now Works With X Windows


Meta Software Corporation has released a UNIX X-Window system version of
Design/OA, a custom CASE application available under X.

Design/OA is a graphics application development tool designed to build system
modeling tools. It can be used to develop graphically-based CASE, CAD/CAE,
object-oriented programming, or simulation applications. It can also be used
to create graphical front-ends to a database, code generator,
telecommunications network, or data processing system.
According to Meta Software's Chief Technical Officer Alan Epstein, the X
Window version is "X source code compatible and should run on all compatible
UNIX workstations with little or no modification. We plan to release optimized
versions for other UNIX workstations including IBM, DEC, and HP/Apollo over
the next several months."
Design/OA supports multiple windowing, and offers the developer full control
of the application's "look and feel" through customization of the graphical
interface, menus, dialogs, icons, commands and reports. It retails for
$15,000, and is distributed directly through Meta Software. Both the Apple
Macintosh and IBM PC versions sell for $7,500; volume discounts are available
on request, and special prices are offered to educational institutions.
Contact Meta Software at 150 CambridgePark Drive, Cambridge, MA 02140 (617)
576-6920.


New Asynch Manager Adds File Transfer Protocols


Blaise Computing Inc. has released C Asynch Manager v3.0 and Asynch Plus v5.0,
upgrades to its communications toolkits for C and Pascal programmers.
These new versions add features in two main areas: modem control and file
transfer.
The new modem control routines let programs talk to multiple modems,
supporting the features and peculiarities of each simultaneously.
The new file transfer capabilities include 1K packets, CRC error checking,
true Y-Modem (multi-file transfers with file name and size and XMODEM
preserved), auto switching to incoming packet size and error detection method.
The file transfer routines have been designed to support background file
transfers, and multiple files may be sent or received simultaneously over
multiple ports.
C Asynch Manager v3.0 requires Microsoft C v4.0, v5.0, or v5.1 or Turbo C
v1.5, or v2.0. Asynch Plus v5.0 requires Turbo Pascal v4.0, v5.0, or v5.5 or
QuickPascal. The price of each package is $189.
For more information contact Blaise Computing, 2560 Ninth Street, Suite 316,
Berkeley, CA 94710 (415) 540-5441; FAX (415) 540-1938.


QNX Gets Window System


Quantum Software Systems Ltd. has released QNX Windows, a graphical user
interface environment for its QNX operating system.
QNX Windows' dialog manager handles all the basic interactions that take place
between the user and the system.
As an integral component of the operating system itself, QNX Windows is
server-based. QNX Windows can execute in parallel on the same node in the QNX
LAN or on remote nodes.
For more information, contact Quantum Software Systems, Ltd., 175 Terrence
Matthews Crescent, Kanata, Ont., Canada K2M 1W8 (613) 591-0931; FAX (613)
591-3579.


BSO Offers 8051 Tool Set


BSO Inc. has released a tool set for the 8051, which is available to run on a
number of different host platforms. Prices start at $1700 for compiler and
assembler packages. For more information contact BSO at 411 Waverley Oaks
Road, Waltham, MA 02154-8414 (800) 458-8276 or (617) 894-7800.


Quad Offers SQL Tools


Quadbase Systems, Inc. has released Quadbase-SQL, a relational database
management system, and dQuery v3.0, an interactive query management system.
Quadbase-SQL is $795, with no royalties. dQuery v3.0 is $195. Both
Quadbase-SQL and dQuery v3.0 require MS-DOS v3.1 (or above), 640K of RAM and
hard disk. They run on any MS-DOS v3.1 compatible LAN system. For more
information contact Quadbase Systems, Inc., 790 Lucerne Dr., Suite 51,
Sunnyvale, CA 94086 (408) 738-6989.


Building Block Releases New QuickGeometry


Building Block Software has released QuickGeometry Library v1.02, a collection
of math subroutines for developing programs for CAD/CAM, parametric design, NC
programming, post processing, finite element analysis and GIS.
QuickGeometry Library v2 improves the documentation and adds 3D display. It is
$199 and includes C source code, object code for MS-DOS, documentation,
working example programs, one hour of telephone support, and a 30-day
money-back guarantee.
For more information contact Building Block Software, PO Box 1373, Somerville,
MA 02144 (617) 628-5217.


Aspirin Includes Code Generator And Utilities


Arrowhead Software has released Aspirin, a C development toolkit with
libraries to support forms, windows, time and date manipulation, text
manipulations, database management (ISAM), and menus.
The package also includes a code generator, which provides facilities for
adding, deleting, and modifying fields, text, lines and boxes. It also
provides complete control over placement, attributes, and the extended
character set.
The Programmer's Utility package contains programs to aid in the development
and enhancement of programs with features such as text search and replace, C
function finder, and print utilities. The special ClipCode disk, sent
quarterly, provides source code for extending and enhancing the basic library.
Aspirin is available from Arrowhead Software for the introductory price of
$250. All source code for the Aspirin libraries is included.
For more information, contact Arrowhead Software, 7500 W. Mississippi Suite
201, Lakewood, CO 80226 (303) 922-1300.


Zortech Implements C + + Version 2.0



Zortech Inc. has announced its C + + v2.0 Developer's Edition for MS-DOS. The
C + + Developer's Edition, fully compatible with the AT&T 2.0 specification,
supports multiple inheritance and type safe linkage and has built-in support
for EMS.
The Developer's Edition includes a C + + source level debugger, source code
for the runtime library, and Zortech's C + + tools v2.0. Each of these
components may be purchased separately as well.
Zortech C + + v2.0 is compatible with Microsoft Windows and its portability to
other C environments (including Microsoft C) has been enhanced. Version 2.0
features a set of graphics classes and a TSR library that can make many
applications resident through a simple function call.
Zortech's C + + Developer's Edition sells for $450. The compiler itself can be
purchased separately for $199. The other components of the Developer's
Edition, including the new debugger, the runtime library source code, and
version 2.0 of C + + tools are available at $149 each. Updates to existing
users start at $40.
Zortech has also released its OS/2 compiler upgrade, priced at $149.
For more information contact Zortech Inc., 1165 Massachusetts Ave., Arlington,
MA 02174 (617) 646-6703.


MMC AD Offers Toolboxes


MMC AD has released C Programmer's Toolbox/PC v2.0 for PC compatibles, C
Programmer's Toolbox for Apple Macintosh Programmer's Workshop (MPW), and two
stand-alone programming tools for the Apple Macintosh, McCPrint v2.0 and
McClint v2.0.
The C Programmer's Toolbox/PC v2.0 is a set of 21 tools in two volumes. The
Toolbox works with any PC compatible system that has MS-DOS v2.1 or later. A
hard disk is highly recommended. The Toolbox is compatible with Microsoft C,
Quick C, Turbo C and other PC C compilers. Volumes I and II retail for $99.95
each or $175 for both. For registered toolbox owners, the new release is $30
per volume or $50 for both.
The C Programmer's Toolbox/MPW is a set of 20 tools that work with MPW v3.0 or
later. MPW C is not required. The Toolbox works with any C source code that is
being developed by or supported through MPW C v2.x or v3.x, Lightspeed C/Think
C, Aztec C, all PC C compilers, engineering workstation and UNIX C compilers.
The Toolbox retails for $295.
McCPrint v2.0 is a C source code beautification/reformatting system that
includes a source code formatting system, a multiple window editor and source
code highlighting system. McCPrint works with a Macintosh System v4.2 or later
(System v6.x is recommended) and at least 512 KB of memory. A hard disk is
recommended, but not required. McCPrint runs as a stand-alone application and
fully supports MultiFinder foreground and background processing. McCPrint is
compatible with all C compilers. McCPrint is $59.95. An update is available
for existing McCPrint customers for $25.
McCLint v2.0 is a C source code semantic checking system. McCLint works with a
Macintosh system v4.2 or later (system v6.x is recommended) and at least 1 MB
of memory. A hard disk and additional memory is highly recommended. McCLint
runs as a stand-alone application and fully supports MultiFinder foreground
and background processing. McCLint is $99.95. An update is available for
existing McCLint customers for $25.
For more information contact MMC AD Systems, Box 360845, Milpitas, CA 95035
(408) 263-0781.


Lattice Ships Free Amiga Updates


Lattice, Inc. has released v5.04 of its Lattice AmigaDOS C Compiler.
Version 5.04 of the Lattice C Compiler for AmigaDOS includes more than 50
enhancements to the compiler, libraries, codePRobe debugger, and utilities. A
READ.ME file on the update disk describes the changes and installation
procedure.
All registered users of the compiler received the upgrade automatically. For
more information contact Lattice, Inc. at 2500 South Highland Ave., Lombard,
IL 60148 (800) 444-4309.


MIPS RISC Gets API Standard


AT&T and MIPS have signed an agreement to build a UNIX System V v4.0
Application Binary Interface for the MIPS RISC microprocessor.
An Application Binary Interface (ABI) specification tells software vendors how
to write applications that will run in binary form -- like today's PC
applications -- on machines from any number of vendors that use the same
microprocessor architecture.
Under the agreement, the MIPS ABI will serve as the basis for trademarked UNIX
System V software for the MIPS architecture.
AT&T will make the MIPS ABI specifications available to the industry. AT&T
also said that the ABI specifications for the MIPS architecture will be
compatible with other AT&T ABI specifications.
For more information contact MIPS Computer Systems, Inc., 928 Arques Ave.,
Sunnyvale, CA 94086 (408) 991-7736 or AT&T, 60 Columbia Turnpike, Morristown,
NJ 07960 (201) 829-7212.


JYACC Offers Jterm Emulator Package


JYACC, Inc. has released Jterm, a terminal emulator package.
Jterm is equipped with a file transfer utility which offers ASCII, Xmodem,
Kermit and Kermit Server protocols, and JYACC's own protocol, Jtran. Jtran
incorporates a data compression system that reduces file transfer transmission
time by fifty percent.
The Jterm emulation provides application users and developers extensive screen
control and allows them to take full advantage of PC capabilities such as
color display, function keys and the PC graphics character set.
Jterm provides the standard features of a terminal emulation package (direct
and modem dialing, file transfer with errorchecking, initialization and script
files, etc.) and DEC VT100, VT220 and TTY emulation modes.
For more information contact JYACC at 116 John St., New York, NY 10038 (212)
267-7722; FAX (212) 608-6753.


Marietta Updates c_ndx And c_wndw


Marietta Systems has released the c_ndx relational database library and the
c_wndw v2 library for Borland Turbo C, Lattice C, Microsoft C 5.x and Quick C.
c_wndw v2 includes the c_ndx relational database library for dBase files,
improvements in facility and performance, increased flexibility, and expanded
manuals.
The c_wndw and c_ndx libraries are available in object library form for
Borland Turbo C and Microsoft C compilers at a license fee of $95. Shipping
and handling is $4.50 for US and Canada, and $11 for overseas customers.
Source license agreements are available at $195 plus S&H. For more information
contact Marietta Systems, Inc., PO Box 71506, Marietta, GA 30007 (404)
565-1560.







































































New Releases


CUG302 3-D Transforms


Written by Gus O'Donnell (CA) and submitted by Michael Yokoyama (HI), 3-D
Transforms is a library of functions used to create, manipulate and display
objects in three dimensions. The functions allow the programmer to create
representations of solid objects bound by polygons, to rotate, translate,
scale the objects in three dimensions, and to display the objects in color
with a given light source. The disk includes a brief description of each
function in the library, complete C source code, function libraries for Turbo
C, and a demonstration program which displays a cube, a tetrahedron, and
octahedron in three dimensions with each figure rotated about a different axis
(Figure 1). The program requires a Turbo C graphics library and BGI files.
Turbo C v1.5 or later is recommended.


CUG303 MC68K Disassembler


Written by John M. Collins (England) and submitted by Steven M. Ward (MA),
MC68K Disassembler runs on Motolora 68000 ports of UNIX System III and V. The
disassembled output can be assembled to generate the same object module as the
input. When disassembling stripped executable files, object modules and
libraries may be scanned, modules in the main input identified and the
appropriate names automatically inserted into the output. Also, an option is
available to convert most non-global names into local symbols, reducing the
number of symbols in the generated assembler file.
The disassembler copes reasonably with modules merged with the -r option to
ld, generating a warning message as to the number of modules involved. The
disk includes a users guide and complete C source code. Although the program
is MC68000 specific, it is easily adaptable to run in most any operating
system environment as a cross development tool.


CUG304 ROFF5


Ernest E. Bergmann (PA) has completed a major rewrite of his ROFF4 (CUG128 and
CUG145). The ROFF5, v2.00 technical text formatter has evolved from ROFF4 to
become somewhat more like UNIX's nroff and troff. ROFF5 now supports
conditional macros, page traps, roman numerals and line numbering. It is
intended for preparation of manuscripts on any dot matrix printer and can
handle equations and special symbols. Different ouput devices are supported
with device-specific ASCII files that inform ROFF5 of the special controls for
that device. Fractional line spacing for superscripts and subscripts are
supported even for printers that cannot reverse scroll. The "built-in"
commands follow the naming conventions of nroff and troff where appropriate;
however, in contrast to the UNIX formatters, ROFF5 supports register and macro
names of arbitrary length. The disk includes a complete set of C source code,
well-written documentation, and a number of test and demo files. The program
was written using Turbo C v2.0 for MS-DOS.


CUG305 HGA Mandelbrot Explorer and Card Games


Dan Schechter has submitted a Hercules monochrome Mandelbrot program, as well
as the card games, poker and blackjack. Unlike most Mandelbrot programs, which
require you to specify "color-value" information in advance, his programs,
EMANDEL and EJULIA (Figure 2) save all calculation data, allowing you to tweak
the picture by specifying color-value information afterwards.
POKER is five-card draw poker. The computer plays four hands independently
(the computer's four "players" do not consult with each other) and you play
one hand.
BLACKJACK is not quite real casino blackjack. It is just you against the
dealer. "Doubling down" is not supported. The screen display of both card
games is neatly organized using the Hercules graphics.
This disk includes C source codes as well as executables for MS-DOS. All the
programs are compiled using the Aztec C compiler.


CUG306 Thread and Synapsys


Gregory Colvin (CO) has contributed Thread and Synapsys. Thread is a
multitasking kernel based on lightweight threads. (See his story elsewhere in
this issue.) He uses the ANSI Standard C library functions, setjmp() and
longjmp() to implement multiple threads within a single C program. He has
tested the code with Microsoft C v5.0 on an IBM-AT, with MPW C v3.0 on a
Macintosh SE. On his AT machine, the kernel compiles to under 1K of code and
executes over 80,000 jumps per second.
Synapsys is a neural network simulation program which implements a very fast
backpropagation network by representing synapse layers as word arrays and
implementing all operations with integer arithmetic.
The disk includes C source code, benchmark and testing code for both programs.


Updates




CUG252, 253 C Tutor


Coronado Enterprises (12501 Coronado Ave NE, Albuquerque, NM 87122) has
released C Tutor v2.4. This new C Tutor has been modified to include many of
the proposed ANSI standard changes. CUG252 includes documentation and CUG253
includes source code. CUG257 and CUG258, C Tutor for Turbo C are not included
in the revision. We will retain them for a few months as a resource for
programmers working with older versions of Turbo C. Eventually convergence to
the ANSI standard should allow us to retire these volumes.


CUG263 c_wndw and c_ndx


This v2.02 release from Marietta Systems includes the "c_ndx" library that
supports relational database access to dBase files and B-tree indexes. This
shareware package includes a manual, sample programs, and small model library
for Turbo C and Quick C. The source code is available from the author (2917
Ashebrooke Dr, Marietta, GA 30068).


CUG265 cpio Installation Kit



Good news for AT&T 3B1 users. In the past, 3B1 users have been unable to read
CUG disks even though our physical disk format (48 tpi, 8 sectors/track, 512
bytes/sector) matches theirs. There seems to be some incompatibility between
their UNIX on 3B1 (it is okay on 3B2) and our SCO XENIX/386. T.W Kalebaugh
(KS) has created a loader and dump utility for AT&T 3B1 (UNIX PC, 7300 and
Convergent Technologies S-50). The updated disk includes his new subroutines
and makefiles.


CUG278 CXL Library


Mike Smedley has updated his shareware C function Library, CXL to v5.1. The
update includes new features such as a context-sensitive help system,
extensive mouse support, shadowed windows, multiple-field data entry forms,
enhanced menuing functions, extended keyboard support, and file encryption.
This disk includes a manual, demo programs, small model library for Microsoft
C & Quick C, Turbo C and Zortech C/C++. The source code is available from the
author (P.O. Box 33603, San Antonio, TX 78265). Additionaly, Kamran Bayegan
has contributed a "Screen and Form Designer" program which designs screens and
forms that are completely compatible with this library.


CUG297 Small Prolog


Henri de Feraudy (France) has updated his original Small Prolog. The updated
disk includes some minor bug fixes, a speed improvement involving prunify.c
and prhash.c, a better handling of type predicates such as integer, and three
new examples. A review of the earlier version appears elsewhere in this issue.
Figure 1
Figure 2














































We Have Mail
Dear Mr. Ward,
I am delighted to hear that you will be publishing every month now. While it
may be possible to get too much of a good thing, one C Users Journal per month
is still a long way from too much (perhaps your staff has a different view).
There are two things in the November issue that cause me to write. First is
Jay Martin Anderson's generally excellent overview of the IEEE-488 interface
bus. Normally, using "HP-IB" and either "GPIB" or "IEEE-488" as synonyms
causes no great confusion. However, I think that Prof. Anderson's article may
be an exception to that rule. Let me provide the precise meaning of each term,
and then indicate why I believe that the equivocation of these terms has
presented a problem in his piece.
"IEEE-488" is the generic way of identifying the standards document, "IEEE
Standard Digital Interface for Programmable Instrumentation," produced by the
IEEE. The designation "GPIB" (General Purpose Interface Bus) is the generic
term for the interface bus. "HP-IB" (Hewlett-Packard Interface Bus) is a term
used by Hewlett Packard to designate both the IEEE-488 electronic standard and
HP's software protocol for using the interface.
The most serious error in the article that comes from equating HP-IB with
IEEE-488 is the assertion that "any instrument which claims adherence to the
IEEE-488 standard must be able to respond to a serial poll." (page 29). In
fact, the IEEE-488 standard specifies many allowable subsets of the full
interface. Included among these are nine "Allowable Subsets to T Interface
Function" (table 11 of the IEEE-488 standard.) "T" is the basic talker
function. (There is a precisely analogous TE -- extended talker -- with the
same nine subdivisions of that function). Of those allowable functions, five
do not support serial polling. However, one is the degenerate case of no
talker capability, TO. Obviously, any instrument that cannot talk cannot
answer a serial poll. So, after acknowledging that it is legal for an IEEE
device to not support talking at all, the more accurate picture is that half
of the talker options that "adher[e] to the IEEE-488 standard" do not support
serial polling. While it is true that HP-IB uses serial polling, there is no
such requirement from the IEEE.
I do not want to exaggerate the significance of Prof. Anderson having elided
the HP-IB and the IEEE-488 specifications into one. In general, his article is
an excellent introduction to the HP-IB. However, it is worth recognizing that
it is not an introduction to the GPIB as such.
The second thing in the November issue that I wanted to respond to was Jeff
Saraiva's request in "Q?A!" for programming examples using the Microsoft C
v5.1 compiler's graphics library. I am sending a copy of my FTGRAPH, which is
a tool kit of FFT functions. It uses the MSC 5.1 graphics library for its
screen output (MSC_GRPH.C is the source file). While it is not a particularly
extensive graphics application, it does illustrate determining the graphics
adapter at runtime, and scaling the output to the actual adapter's resolution.
It also shows how to use both text and graphics with the graphics library. You
may include it in the C Users Group library if you think that it is suitable.
You should note, however, that the front-end (FTGRAPH.C) is quite ragged. The
library routines have evolved over the last few years to meet my employer's
needs, but the front-end was written on my own time as a way of providing a
tool kit to accompany an article on FTs that I wrote for Intelligent
Instruments and Computers. I think the library is a reasonably polished,
professional product. The front-end is a good example of what you get for
nothing. I include a reprint of the article, in case Mr. Saraiva is not
familiar with the FT and what it can be used for. I hope it is not too trivial
a graphics application to be useful to him.
Sincerely,
Tom Clune
Eye Research Institute
20 Staniford St.
Boston, MA 02114
Thanks for the HPIB/IEEE 488 clarification. We have passed your graphics
library on to Kenji for evaluation. We'll also pass a copy immediately to
Saraiva. I think "How to determine adapter type at runtime" would be a good
article by itself. Any authors? --rlw
Dear Robert,
I just finished reading the Jan 1990 issue of CUJ (again, an excellent job,
guys). I want to respond to one of your reader's (Dr. Whitaker of Boston, MA)
requests for texts on "grep", "awk", "sed" and "tr" as well as to one of the
articles which I found to be most interesting.
First the texts.
1. AT&T UNIX Programmer's Manual Volume 4 titled Document Preparation, edited
by Steven V Earhart, a CBS College Publishing by Holt, Rinehart and Winston
(HRW), ISBN 0-03-011207-9
2. AT&T UNIX Programmer's Manual Volume 5 titled Languages and Support Tools,
edited by Steven V. Earhart, a CBS College Publishing by Holt, Rinehart and
Winston (HRW), ISBN 0-03-011204-4
These two texts are probably the most complete descriptions of the utilities
in question and describe everything you ever wanted (and never wanted) to know
about them, complete with examples and option descriptions.
The next text I would recommend is UNIX Utilities by R. S. Tare published by
McGraw-Hill, ISBN 0-07-062884-X
This book is a programmer's reference and makes some assumptions about how
much the reader knows about programming in general. This book would probably
not be a good teaching guide but it's a great reference.
Lastly, I would recommend the following Bell Laboratories, technical
memoranda.
1. SED -- A Non-interactive Text Editor, by Lee E. McMahon, dated August 15,
1978.
2. AWK -- A Pattern Scanning and Processing Language, by Alfred Aho, Brian
Kernighan and Peter Weinberger, dated September 1, 1978.
I realize that these two documents might be more difficult to get ahold of,
but they are excellent user guides and no more than 10 pages.
Next I would like to present an "addendum" to a very well written article
entitled UNIX 'termcap' Facility Improves Portability by Ronald Florence. I
realize that the article was about the 'termcap' facility but since he did
mention the 'terminfo' facility, I wanted to present some additional
information about it, to you. If after reading this letter you think that a
more "in-depth" article or tutorial about it may be of interest to your
readers, I would be more than happy to contribute.
First let me say that none of this information applies to any UNIX versions
prior to UNIX System v2 but I would strongly recommend upgrading to, at least,
UNIX System v3 as soon as possible. The added security measures and bug fixes
are well worth it! Well, back to the article.
Mr. Florence stated in his article that "The termcap database is substantially
easier to modify..." than the terminfo database. I must disagree with this
statement. On most (if not all) UNIX systems that use 'termcap', the database
can only be modified by the system administrator (or super user) and rightly
so. If you, yourself, are not the super user, experimental (trial and error)
modifications to terminal descriptions are impractical to say the least. With
'terminfo', the user is free to experiment with a terminal description that
only he or she will use (at least until it's fully tested).
In order to write a 'terminfo' terminal description, you will need at least
the following: section 4 of the UNIX programmer's manual (TERMINFO(4)) and the
technical reference for the particular terminal you wish to build a
description for. Only with this information is it possible to write a terminal
description.
In order to use the "new" terminal description, it must be compiled using
TIC(1M) the terminfo compiler. The procedure is simple, once the terminal
description file is complete, just type
tic filename
and this will create subdirectories (one for each unique terminal name in the
first line, i.e., at386 makes directory "a", AT386 makes "A" and so on) and
the compiled file is placed in the subdirectory under the terminal name and
any appropriate links are made in the other subdirectories.
In order to make use of this file, the user must define and export the
environment variable TERMINFO equal to the directory under which the
subdirectories were created. The user must also define and export the TERM
variable equal to the appropriate terminal name.
For example, in the users ".profile" file have the following:
# (assuming "termdefs" contains the description file)
TERMINFO=${HOME}/termdefs
TERM= at386
export TERM TERMINFO
Programs using "curses" and "terminfo" routines will check for the TERMINFO
variable to be set first, before checking the standard terminal description
database.
Much of this information can be found in the various sections of the UNIX
manuals and there are also several books (and memoranda) on the subject. For
the convenience of your readers, I have enclosed a couple of sample listings
for 'terminfo' descriptions.
Listing 1 is the 'terminfo' description supplied by most UNIX System V/386
vendors for the 80386 based IBM PC/AT console. It should be noted that the
description supplied by most UNIX SysV/386 vendors is INCORRECT! The "xt"
boolean (destructive tabs) should be removed as it will cause problems with
programs like GNU Emacs and others.
Listing 2 is the same 'terminfo' description using the long C variable names
listed in <term.h>. This is a much clearer example of the terminal description
information.
I hope you and/or your readers will find this information of some use. If you
have any questions or wish to contact me, you can do so at the address or via
e-mail at uunet!rwbix!cci.
Sincerely,
Bob Barrett
Principal Consultant at CCI
528 North Riverside Dr.
Neptune, NJ 07753
I have always found termcap and curses to be the most difficult-to-learn parts
of UNIX, mostly because the documentation is so scattered and patchy. In
addition to your references, Kochan and Wood's book "Topics In C Programming"
includes several little tidbits (like when to use clearok()) that I haven't
found elsewhere, and Rochkind's "Advanced C Programming for Displays" includes
good advice about using termcap directly and some interesting performance
comparisons between new and old termcap and curses.--rlw
Dear Mr. Ward,
I've started an interactive curses based program that calculates topological
chemical indexes as suggested by an article in the Scientific American
Magazine (Sept. 1986, p.43). A graphics editor for drawing organic structures
using commands loosely named after those in emacs is called. The editor also
uses a small library that includes standard subunits like benzene rings,
steroids, etc. Structures can be named and saved. The drawing can be modified,
renamed, and recalculated. Only the randic index calculator has been finished.
I have considered converting the program into a filter that would pipe the
index numbers to a statistical program to check for correlation with various
chemical or biological activities. I know that various systems are used to
translate standard chemical names into codes that are machine readable but I
don't know which one is the de facto standard.
My machine is a UNIX PC, PC 3700 (System v3.5 software) but I've avoided
menus, windows, and the mouse in favor of portability. I wrote the program in
standard K&R C.
I doubt that there is enough interest to add this package to the CUG standard
distribution. Since there are so many design considerations, I would like to
contact some chemists interested in this theoretical tool so that I could
implement it to be useful for their academic use.
Sincerely,
Phil Karn, SR
230 Division Ave.

Lutherville, MD 21093
Dear Mr. Ward:
I am a recent subscriber to The C Users Journal and let me start by saying
that I think you have a great publication. Here are some topics which I would
enjoy reading about in future issues. Since I am an MS-DOS user, most of these
topics are oriented towards that environment.
Video and printer drivers. I would like for my programs to take advantage of
the hardware capabilities of different printers and video cards. Both
Microsoft Windows and Borland's BGI provide a method for doing this, but I
would prefer to use my own code. What I would really like to see discussed is
how to write device drivers which can be selected during the execution of a
program.
OCR. What are the current methods used to perform optical character
recongition. I realize that this is too big a topic for extensive coverage,
but an introductory tutorial would be very nice.
Speech Synthesis. What can be done to add speech to programs? I realize that
this is usually done with special hardware, but am curious as to what can be
done with just a standard PC. It seems that the commercial game programs keep
getting better and better sound using a standard computer.
Timing. How can I write programs which are independent of the clock speed of
the machine being run on. For some events, such as animation, the real time
clock does not give enough precision to control the timing. The commercial
games seem to have solved this problem as well.
Lynn Akers, Jr.
Akersoft, Inc.
5600 Roswell Rd. Ste. 200B
Atlanta, GA 30342
Talk about timing! Surely you'll notice the speech recognition article in this
issue. Phyllis Lang wrote a story about "Improving Timing Resolution" which
appeared in our May 1989 issue. I'm sure that story would address your timing
needs. We have sold out of that issue, but can still supply a photo copy (for
a small fee). Just call and ask for Phyllis Lang's story from Vol. 7, Issue 5.
--rlw
Dear Ward Folks;
Where do you buy your drugs? $28 for your magazine! Not a chance, if it costs
so much to produce it why did you go to coated pages and a color cover? Leave
me the old style, charge me less and maybe we can work something out.
Best of luck (Ha!)
Tom Brusehaver
1505 Ensign Dr. #C
Normal, IL 61761
P.S. No one asked me if I wanted the format to change. I would have said NO!
In fact, the coated paper we are now using is less expensive than the offset
stock we used to use. The change in price was designed to cover the additional
issues. If $24 for eight was reasonable, I fail to understand why $28 for
twelve isn't.
Frankly I don't think the price requires much defense. For $28 we deliver
roughly 1500 pages of technical coverage. Even discounting non-editorial space
you still get over 1000 pages of technical material. Have you priced any 300
page technical books lately? Have you bought a large pizza recently?

Listing 1
/* Lines ending with a '\' character are broken for readability. In
practice, this should all be on ONE line. */

AT386at386386AT386atat/386 console,
am, bw, eo, xon, xt,
colors#8, cols#80, lines#25, ncv#3, pairs#64,
acsc= ''a1fxgqh0jYk?lZm@nEooppqDrrsstCu4vAwBx3yyzz{{}}~~,
bel=^G, blink=\E[5m, bold=\E[1m, clear=\E[2J\E[H,
cr=\r, cub=\E[%p1%dD, cub1=\E[D, cud=\E[%p1%dB,
cud1=\E[B, cuf=\E[%p1%dC, cuf1=\E[C,
cup=\E[%i%p1%02d;%p2%02dH, cuu=\E[%p1%dA, cuu1=\E[A,
dch=\E[%p1%dP, dch1=\E[P, d1=\E [%p1%dM,dl1=\E[1M,
ed=\E[J, el=\E[K, flash= ^G, home=\E[H, ht=\t,
ich=\E[%p1%d@, ich1=\E[1@, il=\E[%p1%dL, il1=\E[1L,
ind=\E[S, indn=\E[%P1%dS, invis=\E[9m,
is2=\E0;10;39m, kbs=\b, kcbt=^], kclr=\E[2J,
kcub1=\E[D, kcud1=\E[B, kcuf1=\E[C, kcuu1=\E[A,
kdch1=\E[P, kend=\E[Y, kf1=\EOP, kf10=\EOY, kf11=\EOZ,
kf12=\EOA, kf2=\EOQ, kf3=\EOR, kf4=\EOS, kf5=\EOT,
kf6=\EOU, kf7=\EOV, kf8=\EOW, kf9=\EOX, khome=\E[H,
kich1=\E[@, knp=\E[U, kpp=\E[V, krmir=\EO, op=\E[0m,
rev=\E[7m, rin=\E[S, rmacs=\E[10m, rmso=\E[m,
rmul=\E[m,
setb=\E[%?%p1%{0}%=%t40m%e%p1%{1}%=%t44m%e%p1%{2}%=%t42m%e%p1 \
%{3}%=%t46m%e%p1%{4}%=%t41m%e%p1%{5}}%=%t45m%e%p1%{6}%=%t43m%e%p1 \
%{7}%=%t47m%;,
setf=\E[%?%p1%{0}%=%t30m%e%p1%{1}%=%t34m%e%p1%{2}%=%t32m%e%p1 \
%{3}%=%t36m%e%p1%{4}%=%t31m%e%p1%{5}%=%t35m%e%p1%{6}%=%t33m%e%p1 \
%{6}%=%t33m%e%p1%{7}%=%t37m%;,
sgr=\E[10m\E[0%?%p1%p3%%t;7%;%?%p2%t;4%;%?%p4%t;5%;%?%p6%t; \
1%;%?%p9%t;12%;%?%p7%t;9%;m,
sgr0=\E[0;10m, smacs=\E[12m, smso=\E[7m, smul=\E[4m,


Listing 2
/* Note: Lines ending with a '\' character are broken for readability.

In practice, this should all be on ONE line.

Terminal type at386
AT86at386386AT386atat/386 console
flags
auto_left_margin, auto_right_margin, dest_tabs_magic_smso,
erase_overstrike, xon_xoff,

numbers
columns = 80, lines = 25, max_colors = 8, max_pairs = 64,
no_color_video = 3,

strings
acs_chars = 
'''a1fxggh0jYk?lZm@nEooppqDrrsstCu4vAwBx3yyzz{{}}~~',
bell = '^G', carriage_return = '\r', clear_screen = '\E[2J\E[H',
clr_eol = '\E[K', clr_eos = '\E[J',
cursor_address = '\E[%i%p1%02d;%p2%02dH', cursor_down = '\E[B',
cursor_home = '\E[H', cursor_left = '\E[D',
cursor_right = '\E[C', cursor_up = '\E[A',
delete_character = '\E[P', delete_line = '\E[1M',
enter_alt_charset_mode = '\E[12m', enter_blink_mode = '\E[5m',
enter_bold_mode = '\E[1m', enter_reverse_mode = '\E[7m',
enter_ secure_mode = '\E[9m', enter_standout_mode = '\E[7m',
enter_underline_mode = '\E[4m',
exit_alt_charset_mode = '\E[10m',
exit_attribute_mode = '\E[0;10m', exit_standout_mode ='\E[m',
exit_underline_mode = '\E[m', flash_screen = '^G',
init_2string = '\E[0;10;39m', insert_characcter = '\E[1@',
insert_line = '\E[1L', key_backspace = '\b', key_btab = '^]',
key_clear = '\E[2J', key_dc = '\E[P', key_down = '\E[B',
key_eic = '\EO', key_end = '\E[Y', key_f1 = '\EOP',
key_f10 = '\EOY', key_f11 = '\EOZ', key_f12 = '\EOA',
key_f2 = '\EOQ', key_f3 = '\EOR', key_f4 = '\EOS',
key_f5 = '\EOT', key_f6 = '\EOU', key_f7 = '\EOV',
key_f8 = '\EOW', key_f9 = '\EOX', key_home = '\E[H',
key_ic = '\E[@', key_left = '\E[D, key_npage = '\E[U',
key_ppage = '\E[V', key_right = '\E[C', key_up = '\E[A',
orig_pair = '\E[0m', parm_dch = '\E[%p1%dP',
parm_delete_line = '\E[%p1%dM', parm_down_cursor = '\E[%p1%dB',
parm_ich = '\E[%p1%d@', parm_index = '\E[%P1%dS',
parm_insert_line = '\E[%p1%dL', parm_left_cursor = '\E[%p1%dD',
parm_right_cursor = '\E[%p1%dC', parm_rindex = '\E[S',
parm_up_cursor = '\E[%p1%dA', scroll_forward = '\E[S',
set_attributes = '\E[10m\E[0%?%p1%p3%%t;7%;%?%p2%t;4%;%?%p4%t;5%; \
%?%p6%t;1%;%?%p9%t;12%;%?%p7%t;9%m',
set_background = '\E[%?%p1%{0}%=%t40m%e%p1%{1}%-%t44m%e%p1 \
%{2}%=%t42m%e%p1%{3}%=%t46m%e%p1%{4}%=%t41m%e%p1%{5}%=%t45m%e%p1 \
%{6}%=%t43m%e%p1%{7}%=%t47m%;',
set_foreground = '\E[%?%p1%{0}%=%t30m%e%p1%{1}%=%t34m%e%p1 \
%{2}%=%t32m%e%p1%{3}%=%t36m%e%p1%{4}%=%t31m%e%p1%{5}%=%t35m%e%p1 \
%{6}%=%t33m%e%p1%{6}%=%t33m%e%p1%{7}%=%t37m%;',

tab = '\t',
end of strings




































































Discrete Event Simulation In C For Real-Time Systems


Steve Halladay and Steve Johnson


This article is not available in electronic form.






















































External Tools For Debugging C


Bob Whitten


Bob Whitten is a senior software engineer for X O Technologies, Inc.,
Valencia, CA, a manufacturer of turbine flow meters, transmitters for flow
meters, and flow totalizers and controllers. A programmer for 10 years, Bob
has been involved in many embedded systems, especially "system" code. He can
be reached at (805) 257-5542.


Though my favorite debugging environment is Turbo Debug, most of my projects
are not MS-DOS-based, running instead on a microcontroller tied directly and
intimately to the surrounding hardware. Usually, as soon as the prototype
hardware is available, an effort is made to get something running on it, to
see that the hardware works and to give us and management a good feeling that
the project is going well. After all, if the hardware works, then the project
is half-done, right? (This is where the software team lets out a loud
groan...) When that first something is running, does it work? Does it do
everything expected? More importantly, can you test the incremental software
builds as they are produced?
Sometimes the hardware doesn't work, or not as specified. Or the software team
interpreted the spec one way, and the hardware team went the other way. (Of
course, the argument goes that this should have all come out during the
technical walk-throughs, but since everybody thought they understood it,
nobody mentioned it.) Usually, an LSI interface chip is involved and the
documentation on it seemed clear, but later it turns out not to work the way
everyone thought.
Other times the software doesn't work, usually because the software team isn't
talking together enough (or in the case of a one-person job, the software
engineer isn't talking to himself enough). Embedded programs can be tricky to
write and even trickier to debug. I've been writing (and debugging) programs
of this sort for a while now, and in this article I'd like to share what I've
learned of how to use "external" tools in debugging. Three main tools -- an
Oscilloscope (and its kid brother, the logic probe), a Logic Analyzer, and an
In-Circuit Emulator provide very different levels of help, and each has its
place.


Using An Oscilloscope


An oscilloscope displays, or "traces", electrical signals from one or more
input channels on a cathode-ray tube, showing how these signals change during
a given time interval. 'Scopes have lots of knobs and switches, so they are a
practical tool if you already know how to use them or have a good working
relationship with someone who does. Attaching 'scope probes to hardware,
especially prototype hardware, can threaten your job security, so I usually
try to find someone else to do it. The oscilloscope can be useful mostly
because it has a "trigger" circuit, which can be set to initiate a trace
either when a signal goes high or when it goes low. The trigger once or
repeatedly. Repeatedly is the normal setting since the image traced on the
screen fades quickly; a signal that is repeated often will appear brighter.
Sometimes the challenge of using a 'scope is making the program cycle on a
regular enough basis to get a readable trace.
The 'scope's screen is calibrated in centimeters, with voltage measurements on
the y-axis, and time on the x-axis. You can select both the voltage range and
the timing range. For digital circuits, the voltage range should be set to
conveniently display zero to five volts.
Since the information displayed on the 'scope is limited to a couple of
channels (two bits), it seems almost useless. It's amazing what a simple tool
can do in the hands of skilled person, however, and the 'scope is no
exception. I've been fortunate to work with people that seem to make the
'scope sing a ballad.
For example, if you're programming a microcontroller, sometimes it's enough to
know whether or not the code reached a certain point. Since there is usually
some output bit somewhere that is not used or does not cause any problems if
set (like a Light-Emitting Diode), the code can include "milestones", where
these outputs are set to indicate that the code got there.
for (sum = 0, i=0; i < 2048; i++)
sum += *(PROM + i)
if (sum != 0)
while (1)
; /* hang forever */
outbyte( LED_PORT, 0x01);
/* turn on the OK light */
Now, arguably, the 'scope isn't needed here since the light will either go on
or not. But what if the light goes on, but gets reset so rapidly that it never
appears to light? What if the LED is inserted backwards, so it doesn't light?
Just putting the 'scope probe on the output pin and looking for a change will
begin to diagnose the problem.
In addition to simple "does it get there" debugging, the 'scope is a great way
to perform timing measurements. For example, a task that should complete
within 30 ms could set an I/O port at its beginning and clear the I/O bit at
completion. This will generate a pulse that can be traced on the 'scope. The
time to execute the task and the time between executions, can then be easily
read off the 'scope, based on the graduations on the CRT face.
Using two channels, the turn-around time for communications message processing
can be easily measured by attaching the transmit line to one channel and the
receive line to the other channel. You can even decode the message from the
'scope trace if you know the communications protocol well enough. (This is
lots of fun with NRZI standards like HDLC.)
A good 'scope is considered a minimum requirement in most shops where hardware
is being designed, but a 'scope can be overkill for other tasks. If you just
need to do some "did the signal go high" testing, a logic probe might be
adequate. The logic probe senses digital logic levels and has an LED for a
signal high, another for signal low, another for a "pulsing" signal (slowed to
human speeds), and yet another, labeled "memory", to show that a signal went
high and then low (a single pulse). These are cheap (less than $50), simple,
small, and don't have a lot of knobs.


Using A Logic Analyzer


A logic analyzer (LA) is like a collection of logic probes, in that it looks
at logic levels at many locations, either high or low, but unlike a logic
probe, an LA also allows those levels to be "clocked" into memory, usually
based on the microcontroller clock signal. In most applications the LA must
have at least as many input lines as there are address lines on the processor
-- more is better. The state of the lines is remembered on the basis of the
clock input, which can be set to clock on either the high-going or the
low-going edge. A careful study of the handbook for the particular processor
is often needed to set clocks and clock edges correctly, and sometimes just
experimenting till it "works" is the only way.
Since microcontrollers may go through a million or so instructions in a
second, just saving every instruction in the LA's limited memory is not
feasible. To focus on an area of interest, the LA has its own kind of trigger
mechanism, which can be as simple as waiting for some or all of the input
lines to match user-set values, e.g., a given address. The analyzer may be set
to start collecting frames into memory after the trigger is hit, or it may
collect frames until the trigger, known as a "pre-trigger". In pre-trigger
mode, if you set the "trigger" to the PANIC code, the LA will capture the
addresses of the instructions executed immediately before the PANIC. Most LAs
provide additional, very complex trigger schemes, to allow the user to catch a
bug that occurs only in unusual circumstances.
While using an LA is a definite improvement over a 'scope, it has its own
challenges. For starters, the LA doesn't understand C. It reports what it sees
in machine language (i.e., ones and zeros, converted to hexadecimal), unless
you've paid extra for a "personality module" that can display these codes in
assembly language. Thus, unless you were born with sixteen fingers, you'd
better have a hex calculator close at hand.
To use an LA to debug your C code, make your compiler produce listings with
intermixed assembly language, and learn enough assembly language to understand
what the compiler produced. Be sure you've turned off all optimizations --
otherwise you'll find your lines of code moved around or folded together. If
you use a linker, as you usually must, you will have to add the link map
offsets to the addresses in each module's listing to produce the addresses
seen by the LA. Sometimes, you can force the linker to align modules on 256
(100 hex) byte boundaries, making the hex arithmetic easier to figure in your
head.
Logic analyzers are not ICEs (in-circuit emulators); an LA can only "see" the
electrical signals on the microcontroller's bus. The LA can't "see" the
activity of important circuits (communications, Analog-Digital conversion,
timers, DMA), located within the microcontroller. Also, the LA doesn't allow
you to stop and examine things and then continue. You can mitigate this
limitation somewhat, at least during debug, by having your program copy
important internal state information to an external memory location (causing
the internal register data to appear on the external data bus).


The Art Of Debugging With A Logic Analyzer


As I remarked earlier, LAs typically have complex triggering mechanisms.
Usually, simply triggering on a given address is sufficient, but when the
really tough, once-every-hour bug comes along, the fancy triggering capability
is invaluable. This is because the bug happens long before it is detected. If
the trigger can be set to the place where the error is detected (for example,
the hardware is set to a "fail-safe" state), sometimes there is enough
"pre-trigger" memory to find out why the code got to this place.
When that is not enough, the trickery has to start. Some analyzers will allow
selective collection into memory, effectively expanding the memory by
excluding un-interesting sections of code. Or, if there are only two paths
that can bring the code to this one point, you can configure the trigger on an
"OR" case: "trigger if either of these addresses is seen." Sometimes a certain
section of code will execute correctly three times and fail consistently the
fourth time. The trigger can sometimes be set to trigger on the "Nth"
occurrence of an address.
As an aside, the LA can help find bugs that a traditional debugger like Turbo
Debug cannot, because the LA is non-invasive. The timing of the code and the
contents of memory are not affected by the logic analyzer -- both are changed
when a debugging program is loaded. Though it's usually easier to follow
unoptimized code, in some cases you may be forced to debug the optimized
version. When a bug is reported from the field, it may not manifest itself the
same way unless the exact code from the field is used, loaded at exactly the
same address.
Multi-level triggering is required when the suspicious code works most of the
time. For example, a trigger might be set to trigger if the following sequence
occurs: State 1 is reached, then State 2 is reached; if State 3 occurs before
State 4, then trigger, otherwise, start over looking for State 1. This
retiggering feature finds bugs of the type where the execution thread wanders
off into code that is run commonly, but is not correct in a certain context.
An LA can also trigger on data accesses. It can be triggered on either a read
or write, and even on the data at a certain address being accessed. This can
help in those maddening situations where a data structure is getting "bashed"
somewhere, but you haven't the foggiest where in the inch-thick listing that
might be. The fancy triggering can come in quite handy in these cases. Let's
say that the structure is legitimately changed in only one piece of code. The
retriggering mechanism works well here. The Set-up would be something like
this (this set-up is based on the Nicolet analyzer that I'm most familiar
with):
S1 -- a write request to the given data address
S2 -- the start of the code that is allowed to change this address
S3 -- the end of the code allowed to change this.
1. Collect frames until S1 occurs, then done. If S2 occurs first, go to step
2.
2. Wait for an S3 to occur, then go to step 1.

The analyzer can be set to trigger on "sequence done" or "memory full".
"Sequence done" would be a good choice here (the pre-trigger memory will have
the addresses of code leading up to the fault). This sequence should only be
"done" if some other code writes to address S1. If S1 is written to during
initialization, a step before these may be in order:
0. Wait for the address of the end of the initialization code, S4.
When a problem seems impossible to trigger on, I always get out the
instruction book for the analyzer again, and hope to find something I missed
the first dozen times through. I also do a "reality-check" if I've hooked the
analyzer up and strange results appear. A reality-check is just a trace that
triggers on the "trace memory full" condition after a restart. This trace
should show the addresses and data from the first few instructions, and gives
confidence that the analyzer is clocking correctly. It may also show how much
switch bounce is in the reset button, by starting over and over again several
times.
The most important part about using an analyzer is that setting "good" trigger
sequences is an art that will be acquired over time.


Using An ICE


An In-Circuit Emulator (ICE) is different from a logic analyzer in that it
replaces the microcontroller and allows the user a high degree of control over
the execution of the processor. Because of this control, it is much more like
the "Turbo Debug" environment. The user can single-step through the program,
set breakpoints (much like in a regular debug program), and set watchpoints
(executing until a variable is changed) that are checked in real time (not in
slow-motion like the debug programs). At a break, you can examine the internal
registers and the memory locations and the I/O ports.
Many ICEs also include a trace option that allows the emulator to do the
functions of an LA. This includes collecting a trace of where the execution
has been, and fancy multi-level triggering. The ICE also supplies a few
digital input lines for the user to connect as he pleases, to monitor the
prototype hardware.
The ICE manufacturer also may have made a deal with the C compiler companies
to allow source level debugging of the code, including single-line stepping,
and setting a break based on a line number, and examination of variables by
name. This makes the C programmer even more at home, and reduces the learning
time significantly.
Since the ICE allows the execution to be stopped, the hardware's "watch-dog"
timer must be disabled, if one exists. Also, disable code checksum tests
during testing, since the ICE can change the contents of the program. Also,
bear in mind that the C source single-step mode is line-oriented, so keep each
line simple.
ICE is also good for patching "dumb mistakes" on the fly. For example, what if
you wrote: "if ( a = = b )" but you meant "if (a > b)"? Making that one small
change could mean 20 minutes work if you have to go back to your desk, edit,
recompile, relink, reload, etc. With the ICE, you can "patch" the code and
continue.


Conclusion


The tools described in this article can be useful in various circumstances. A
'scope is sometimes my first line of attack because it does certain tasks
better than the others (like measuring timing). An emulator with integrated
logic analyzer seems like the most powerful tool, but sometimes a logic
analyzer has more triggering levels, or more input lines, or something that is
required for a particular problem. Also, while a logic analyzer isn't tied to
any one microcontroller, an emulator generally is (though you can purchase
"personality modules" for other processors).
Be flexible. Yet don't tell your boss that you can do it all with just a
'scope, either. I believe that software schedules get off track worst during
the debugging phase. Nobody wants to plan to make mistakes. Don't forget to
allow time to learn any new tools or methods that you'll have to learn.
In all of debugging, try to become "wholistic". Accept information through
whatever means it comes, not just by staring blankly into the screen on the
'scope, logic analyzer, or emulator. If your product has LEDs, make sure they
blink in the ways you expect them to. Listen to the clicks and clacks of
external hardware, or to the change in tone of the power supply when the load
changes. If you feel heat radiating from the hardware and you don't think it
should, check it out -- just do so carefully; I've burned my fingers more than
once removing PROMs that I installed backwards. If you smell smoke, make sure
it's not your hardware.
The choice of what tools to use can be very difficult. While the ICE seems the
best choice, it can also be the most expensive, since you may need to buy a
different one for the next project. Maybe the project is so simple that the
code can be checked out on a PC, with minimal testing on actual hardware. Good
debugging tools will never make up for bad programming, and many projects were
completed without any fancy tools. The best tool to use is the best tool
available for the task at hand. But who hasn't used a screwdriver handle to
tap something into place, or a table knife to remove a screw? The craftsman
can take what tools he has, and make them do his bidding.








































Forked Interrupt Systems


Marc L. Allen


Marc L. Allen is a senior design engineer with Hamilton Test Systems, Inc., a
subsidiary of United Technologies Corporation, where he designs point-of-sale
systems and equipment. He has a B.S. in computer engineering from the
University of Arizona. He may be contacted at Hamilton Test Systems, Inc.,
2202 N. Forbes Blvd., Tucson, AZ 85745.


I recently designed a system controller for a PC-based point of sale credit
card authorization system. This controller is capable of handling up to four
subordinate terminals and several miscellaneous communications and storage
devices
This application handles interrupts generated by keystrokes from subordinate
terminals, communication activity, disk I/O, and an internal timer. These
interrupts must be processed as quickly as possible while guaranteeing that
every interrupt is processed.
This system generates enough interrupt activity that I couldn't run with
interrupts disabled for fear of missing one, and in certain cases would need
to service an incoming interrupt before I had finished dealing with a previous
interrupt from the same device. To address these needs I settled on a forked
interrupt system running in protected mode and developed using Intel's IC-286
compiler under MS-DOS.
A forked interrupt system utilizes a fork queue to serialize interrupts while
minimizing the amount of time they are disabled. To do this, device drivers
are broken up into two parts. The first handles the immediacy of the
interrupt. Since interrupts are disabled during this portion of the driver, it
should perform only the minimum work required. This normally includes
acknowledging the device, clearing the interrupting condition, and (for input
interrupts) reading the input data. Finally, this interrupt-disabled portion
of the driver places the interrupt in the fork queue to be completed by the
second, interrupt-enabled, portion.
The second part of the driver is activated by the fork queue task and performs
the remaining interrupt processing. For communication devices, this portion
might store incoming data or extract and send outgoing data, perform checksum
or CRC calculations, and handle hardware handshaking details. For a timer
interrupt, the interrupt-enabled portion would handle the effect of the timer
event on the system. Listing 1 contains the two portions of a clock driver
which uses this technique. The clock interrupts occur at some
system-configurable interval and are used for task time-slicing and the
handling of timer events on the tasks waiting for them.
The interrupt-disabled portion of the clock driver, timer_int() (Listing 1),
is one of the simplest interrupt-disabled portions in the system. The timer
interrupt is cleared and then a utility routine is called to place the second
half of the driver (the interrupt-enabled portion, alarm()) in the fork queue.
fork_driver() effectively ends the interrupt-disabled portion of the clock
driver by transferring control to the fork queue task. When this transfer
occurs, the driver is suspended and is not resumed until another clock
interrupt occurs. At this point, the driver completes the call to
fork_driver() and continues to the top of the external while loop to handle
the current interrupt.
The meat of the driver is contained in alarm() (Listing 1). This
interrupt-enabled code first informs the system that a significant event (a
timeslice event) has occured, increments a system tick counter, and processes
any expired timers on the timer tick list. With interrupts enabled, other
interrupts may occur and be placed in the fork queue while this driver is in
operation. In fact, since the clock driver by design has no commonality
between its two portion, a second clock interrupt can be placed on the fork
queue while the present one is being handled. Naturally, if a driver can't
keep up with its own device, it's eventually going to have some serious
problems. But with a conservative queue size, the driver could get behind its
device during a sudden burst of activity and still catch up during the
following idle period. This can easily happen if many devices interrupt at the
same time. Remember that each interrupt will suspend the current driver until
the new interrupt can be placed into the fork queue.
The call to fork_driver() in Listing 1 is not strictly correct. fork_driver()
actually takes an additional long (four-byte) argument, allowing the
interrupt-disabled portion to pass any necessary data to the interrupt-enabled
portion. Although the choice of a long argument was appropriate for my system,
any size is acceptable. This argument is passed to the interrupt-enabled
portion as its first parameter. In practice, this parameter may be a character
received over a communications line, some kind of device identifier, or a
device status. Those who like to play games with parameters can use the long
argument to pass two integer or character values or even a structure
containing four characters. This is not ANSI standard and certainly is not
portable C; however, it does make certain operations much simpler. As the
clock driver has no need for any data, dummy is used as a place holder.
The final parameter passed to the routine is a pointer to the driver's acting
80286 Task State Segment (TSS), a structure which contains all the driver
specific information required by the system. I use the term "acting" because
this TSS is not the original TSS for that driver. The original is reserved for
the driver's interrupt-disabled portion. Otherwise, the orignal TSS might be
active when the next device interrupt occurs forcing a general protection
fault while trying to activate a busy task.
Listing 2 shows how fork_driver() operates. Notice that if the system was
running a normal task, the fork queue task would startup to handle the lastest
fork entry. If the fork queue task was already running, this entry will be
taken care of in due course, and if the system was executing a system service
call, that call would be allowed to finish. The system scheduler will start
the fork queue task at the completion of the system service. My system also
contains a fork_continue() routine. It allows a driver to place an entry in
the fork queue but returns control to the driver. fork_continue() is only used
if the driver has more than one routine to fork. The last fork operation a
driver performs should be through fork_driver().
The physical queue entry contains elements to store the address of the
driver's original TSS, the address of the interrupt-enabled routine, the long
parameter, and a link to the next element in the queue. I store the address of
the original TSS so that the fork queue task can set up an environment
identical to that of the driver before activating its interrupt-enabled
portion. This allows a driver to switch to the activating task's Local
Descriptor Table (LDT) and maintain the LDT association through the interrupt.
The initial LDT switch would be performed when a system task initiates an I/O
to the driver. Note that while the clock driver does not support direct I/O
from a system task, it is activated by a number of system service calls
regarding timeslicing and system timers. The interrupt-enabled portion does
change LDTs to gain access to different tasks' parameter blocks which may be
in local data areas.
Once started, the fork queue task (fork_execute(), Listing 3) will execute all
queue entries, including those added during queue execution.
For each entry in the queue, the fork queue task creates an exact duplicate of
the entry's original TSS with the following exceptions:
The entry is set to run with interrupts enabled.
The stack is switched to a special fork queue stack to avoid any interactions
between the two portions of the driver.
The current execution address is set to point to the fork_start() routine.
fork_start() (Listing 4) is used to front-end the entry's execution. It
provides a stack environment for the entry to return. No special routines need
to be called by the entry routine to exit the queue.
After building a copy of the TSS, the fork queue task performs a task switch
to the new copy. The new task starts running at fork_start() and calls the
entry routine, passing the four-byte parameter and the address of the TSS
copy. When the entry routine returns, fork_start() task switches back to the
fork queue task, which continues with the next queue entry.
Although this implementation of a fork queue works well for my application, it
has some limitations. While the fork queue increases the number of interrupts
that can be handled during a burst of acctivity, the extra overhead also
increases the interrupt latency (the time from when an interrupt occurs until
its processing is completed). Additionally, entry routines are not allowed to
utilize system services in the normal fashion. To perform system services, I
needed to place hooks allowing the entry routines to directly call internal
functions that normal tasks can access only through the system service calls.


Future Directions


Presently, to run a routine at a very high priority I must have the calling
task raise its priority, call the routine, and then lower its priority on
return. Placing such routines on the fork queue would be much simpler. Because
such a task would be a normal task routine, as opposed to a driver routine, it
should have access to system services.
You could add system services capability to the fork queue by creating a real
task for the fork queue. Presently, the fork queue task is an internal system
task without all the information needed to handle system services. It isn't
included in the system task table and isn't scheduled in the normal manner.
Even if a fork queue entry could use system services, certain ones should be
avoided or even ignored. Any service that requires the queue to block or wait
would defeat the purpose of the fork queue.
Another possible extension to the forked interrupt system is a prioritized
fork queue. Some devices may be considered more important than others. For
instance, an imminent power failure interrupt should take higher precedence
than a clock interrupt.


Conclusion


The forked interrupt system has shown itself to be a good way to serialize
interrupts. Drivers are easier to write since reentrancy is not required. The
fork queue allows these non-reentrant drivers to operate in an environment
where interrupts are mostly enabled, allowing a faster burst rate of
interrupts to be handled in a timely fashion.

Listing 1
/*

timer_int() -- Timer interrupt routine

This routine handle the incoming timer interrupt. The
interrupt is acknowledged and cleared. Then the alarm()
routine is forked to handle the rest of the timer stuff.

*/

void timer_int()
{
while (1)

{
/* Clear timer interrupt here. */
outp(0x20, 0x20);

/* Fork the processing routine. */
fork_driver(alarm);
}
}

/*

alarm() -- Timer Alarm routine

This routine is the fork routine of the system timer
device.

It will alert any tasks that have expiring system
timers this tick.

*/

void alarm(dummy, tcb)
unsigned long dummy;
TSS *tcb;
{

/* Time slice -- Significant event */

significant_event = 1;

/* One more tick... */

++timer_interrupts;

/* For each item on the queue (which is in least to
most time to wait order), see if the top of the
queue is ready to alert.

Alerting consists of setting the target task's event
flags and delivering any required Asynchronous
Traps (ASTs). */

while (timer_waiting)
{
/* Change to proper LDT of next task. */
set_ldt(timer_waiting_ldt, tcb);

/* Check to see if timer has expired for this task. */
if ((long) (timer_interrupts -
((P_TIMER *) timer_waiting -> pblock) ->
wakeup) >= 0)
{
* Timer has expired. Set appropriate event. */
set_event(timer_waiting -> my_handle.t,
timer_waiting -> ef_cluster,
timer_waiting -> ef_mask);

/* Deliver AST as required. */
deliver_ast(timer_waiting -> my_handle.t,

timer_waiting -> ast_addr);

/* Remove from timer list and continue. */
timer_waiting_ldt = timer_waiting -> link_ldt;
timer_waiting = timer_waiting -> link;
}
else
/* Timer hasn't expired. Don't check any more
since the tasks are sorted in ascending order. */
break;
}
}


Listing 2
/*

fork_driver(routine, param)
void (*routine)();
unsigned long param;

This routine places the passed routine onto the fork
queue. The driver is allowed to pass up to four bytes
to the target routine.

The driver's action is considered complete, and it will
not be reentered until another interrupt occurs.

*/

void fork_driver(routine, param)
void (*routine)();
unsigned long param;
{
FORK_PARAM *current_fork;
DSS *current_dcb;
unsigned chat current_name[NAME_SIZE + 1];

/* Get address of driver's TSS. */
current_dcb = get_tss(NULL);

/* Make sure we are allowed to fork this device again. */
if (current_dcb -> current_fork_count >= current_dcb ->
max_fork_count && current_dcb -> max_fork_count)
{
/* Can't fork another entry on this device. Throw
away interrupt. (Serious Problem Here!) */
if (in_executive fork_in_process)
resume_last();
else
resume(scheduler_task);
return;
}

/* Get next entry. */
if (!(current_fork = fork_free))
{
/* Out of fork space. Throw away interrupt.
(Serious Problem Here!) */

if (in_executive fork_in_process)
resume_last();
else
resume(scheduler_task);
return;
}

/* Get the next free fork queue entry link */
fork_free = fork_free -> link;

/* Fill the entry with the address of the driver's TSS,
the interrupt-enabled routine address, the passed
parameter, and clear the link. */

current_fork -> tcb = current_dcb;
current_fork -> routine = routine;
current_fork -> param1 = param;
current_fork -> link = NULL;

/* Add one to the count of fork entries for this
device. */

++current_dcb -> current_fork_count;

/* Link it onto the end of the fork queue */
if (fork_queue)
fork_queue_tail = fork_queue_tail -> link = current_fork;

else
fork_queue = fork_queue_tail = current_fork;

/* If in a system service call, or currently executing
the fork, resume what we were last doing. Otherwise
else start up the fork queue. */

if (in_executive fork_in_process)
resume_last();
else
resume(fork_queue_task);
}


Listing 3
/*

fork_execute() -- Execute the fork queue

This task executes all elements on the fork queue. While
the queue is executing, other elements can be place onto
the queue.

The queue operates by setting up a special TSS to be a
duplicate of the queued driver's TSS, except for the current
CS:IP and the stack. This allows all forked routines to
execute using the exact same environment it would have if
the task (or driver) had called the routine directly.

Actually, one enviromental difference may occur. All
routines executing on the fork will be run with

interrupts enabled.

In addition, the forked routine will be called in such a
manner that all it needs to do is issue a return to exit
the routine. The queue will handle the rest. Any return
value issued by the forked routine will be ignored. */

void fork_execute()
{
DWORD fork_addr;
OFFSET fork_sp;
SELECTOR fork_ss;
FORK_PARAM *owner;
unsigned char current_name[NAME_SIZE + 1];

current_name[NAME_SIZE] = '\0';

/* Initialize some constants, such as the base stack, and
the queue startup routine to use. */

fork_ss = fdummy_tcb.ss;
fork_sp = fdummy_tcb.sp;
fork_addr.whole = (unsigned long) fork_start;

/* Loop to continue task at each invokation */
while (1)
{

/* Tell the world that we are running the queue */
fork_in_process = 1;

/* As long as we have something to do.... */
while (fork_queue)
{
/* Get the next element */
owner = fork_queue;
fork_queue = fork_queue -> link;

/* Set up the information for fork_start() */
current_routine = owner -> routine;
current_param = owner -> param1;

/* Set up the TSS to give the target routine access
to the owning task's LDT. Also, set up for
interrupts enabled. */

movemem(owner-> tcb, &fdummy_tcb, 44);

fdummy_tcb.cs = fork_addr.high;
fdummy_tcb.ip = fork_addr.low;
fdummy_tcb.ss = fork_ss;
fdummy_tcb.sp = fork_sp;
fdummy_tcb.flag_word = F_IE;

/* Execute task (Task switch) */
fdummy_task();

/* One less fork fork entry for this driver. */
((DSS *) owner -> tcb) -> current fork count--;


/* Place used entry back on free list */
owner -> link = fork_free;
fork_free = owner;
fork_count- -;
}

/* Clear fork flag and reschedule */
fork_in_process = 0;
resume_cl (scheduler_task);
}
}


Listing 4
/*

fork_start() -- Start the forked routine

This routine is used to call the forked routine. It
passes the four bytes to the target routine and then
resumes the previous task, which should always be the
fork queue. */

void fork_start()
{
/* Call the routine, passing the long parameter and the
address of the working TSS. */

(*current_routine)(current_param, &fdummy_tcb);

/* Task switch back to the fork queue task. */
resume_last();
}





























Building A Better Boolean With C++


Ron Burk


Ron Burk has a B.S.E.E. from the University of Kansas and has been a
programmer for the past 10 years. He is currently president of Burk Labs, a
small software consulting firm.


C++ continues to evolve as a set of incremental changes and additions to C. As
a C programmer, you can begin to learn and use C++ in exactly the same way by
making incremental changes to your programming style and to the language
features that you use. This article provides a simplified introduction to C++
data types by altering a C data type a step at a time to construct a better,
C++ data type.
C++ offers many features that can be used when constructing new data types,
but this article will focus on just two: data hiding and user-defined
conversions. These two language features give you the ability to define C++
data types that have many of the same privileges as built-in data types such
as int or float.


What Do You Want From A Data Type?


Different languages take different approaches to data types. On one end of the
spectrum are typeless languages; for example, a BASIC interpreter may have a
single string data type and automatically convert between numeric and string
formats when required. On the other end of the spectrum are languages with
strict type checking; Pascal, for instance, detects and disallows any attempt
to use a variable that is not of the correct data type. C falls somewhere
between these two extremes. It contains multiple data types (and allows the
user to create more), but it also provides automatic conversions between
certain data types.
C++ is stricter about type checking than C. For example, the following code
fragment is legal C, but not legal C++:
main() {
void f();

f(45);
}
In C, you don't have to specify the number and type of arguments that a
function requires; in C++ you must.
Even a simple data type such as Boolean can be implemented in a number of
different ways. The best implementation depends upon your circumstances. For
example, if you are making a data type for a project that three programmers
will work on for four months, you will probably favor an implementation that
is easy to use and simple to construct. On the other hand, if you are making a
data type for a five-year project that will involve 100 programmers, you may
be willing to give up some ease of use in exchange for stricter type checking
so the compiler can catch more programmer mistakes.
It is in the larger coding projects, involving more than one person, that C++
holds a clear advantage over C. Why invest time carefully designing a Boolean
data type if you never write programs more than a few pages long? As the
type-constructing tools provided by C++ are contrasted with those of C, you
will see how you can use C++ to tailor a type definition to your own unique
needs.


Designing a Boolean Type


Although C does not contain a built-in Boolean type, the concept arises twice
in the language. First, the result of a relational expression is defined in C
to be of type int and equal to either one or zero. Second, C control
statements consider any non-zero expression to be true. Therefore, it could be
desirable for a Boolean type to have analagous properties. First, a Boolean
variable should always be equal to either zero or one. Second, you should be
able to assign any integer to a Boolean variable and the result should be one
if the integer was non-zero; the goal is to construct a Boolean type which is
compatible with the C type int. The following code, for example, should
perform correctly:
bool func(){
bool more;
while (more=fread (/*args*/)){
if(/*some condition*/)
return more;
}
}
The standard library function fread() reads data items from a file and returns
the number of items successfully read. In this example, the caller is only
interested in whether the number of items read successfully is zero
(end-of-file) or non-zero. The code above can only work if the type bool is
compatible with the normal built-in scalar types. Otherwise, assigning an int
to a bool would require gyrations like this:
while(more=(bool)fread(/*args*/)){
/* or worse: */
while (more=(fread (/*args*/) !=0)) {
Of course, if you accidentally assigned an int to a bool in some program, the
compiler would not complain. Choosing to make a Boolean type that is
compatible with scalars exchanges some compiler error detection for ease of
use.


typedef Does Not Create Types


Here is a possible implementation of a Boolean type in C:
/* bool.h */
typedef int bool;
One problem with this implementation is that typedef does not create a new,
distinct type; it only creates a synonym for an existing type. In a big
project, you might have more than one data type that is an int- compatible
scalar, but they may have no relationship to the bool data type.
Unfortunately, if they are created with typedefs as shown above, they will all
be synonyms for int and the compiler will let you mix and match them without
complaining.
Although typedef does not create a new data type, struct does. Here is an
alternative implementation of bool in C:
/* bool.h */
typedef struct{int val;} bool;
This introduces a new type called bool which is incompatible with other data
types. The compiler will object if you try to assign a variable of a different
data type to a variable of type bool. Unfortunately, the compiler will also
object if you try to assign an integer to your bool variable:

bool a = 1; /* error */
bool b.val = 1; /* ok,
but painful to type */


User Conversions


Although a C struct creates a new data type, it does not have any facility for
telling the compiler what other data types should be compatible with the new
one. In this case, you want to be able to specify that an int can be converted
into a bool, and a bool can be converted into an int. You could augment your C
Boolean type with type conversion functions. The result might look like this:
/* bool.h */
typedef struct{int val;} bool;
bool booli(int i);
int ibool(bool b);
The definition of the conversion functions would be placed in another file:
/* bool.c */
#include <bool.h>
bool booli(int i) {
bool ret;
ret.val = i;
return ret;
}
int ibool (bool b) {return b.val;}
Finally, you have a new data type, bool, which can be converted to and from
integers. This C implementation has several problems, however.
First, this bool implementation is clumsy to use; the syntax is less than
elegant. A typical use of this implementation might be:
bool b;
b = booli(x > 5);
printf("b = %d\n", ibool(b));
Second, there is a problem with the structure member val. Each variable of
type bool should always be either zero or one. So long as the programmer uses
the conversion functions, all is well. Unfortunately, there is nothing to
prevent the data type user from making mistakes such as this one:
int l = 45;
bool b;
/* set b to 1 */
b.val = l;
In this case, the Boolean has been set equal to 45 because the letter "l"
looks like the numeral "1".
Finally, the conversion functions create extra time and space overhead in the
generated code. A function call is relatively cheap in C, but it should not be
necessary for such simple conversions as these.


Converting To C++


You can repair these deficiencies by taking advantage of C++ language
features. The first change is to remove the typedef:
/* bool.h */
struct bool {int val;};
bool booli(int i);
int ibool (bool b);
In C++, unlike C, a struct declaration causes the structure tag name to become
a new data type; in other words, the following code compiles in C++ but it
doesn't in C:
#include <bool.h>
/* legal C & C++ */
struct bool b;
/* legal C++, not C */
bool b;
C++ also has a keyword, called private, that you can use to prevent the data
type user from accidentally setting val to something other than zero or one.
Consider the following header file:
/* bool.h */
struct bool {
private:
int val;
};
bool booli(int i);
int ibool(bool b);
The private keyword tells the compiler that the structure members that follow
it cannot be accessed from outside this data type. Thus, the following code
becomes illegal:

include <bool.>h
...
bool a;
a.val = 5;
/* error: val is private */
Of course, we've painted ourselves into a corner now booli() and ibool() won't
compile because they need to access val. Private structure members aren't much
use without a syntax for allowing certain functions to access them. The
easiest way to do that is with the friend keyword, like this:
/* bool.h */
struct bool {
friend bool booli(int i);
friend int ibool(bool b);
private:
int val;
};
This makes booli and ibool friend member functions of the structure bool. Now
the conversion functions will compile, since the compiler knows they are
allowed to access the private members of bool. Since you control the functions
that can modify a bool, you can guarantee that it only takes on the values
zero and one.
You can make the connection between the conversion functions and the data type
more explicit by making the conversion functions member functions of the
structure. A member function of a structure has the same privileges as a
friend member function, but it has a different invocation syntax and can only
be used to operate on the data type with which it was declared. Changing the
two access functions into member functions results in the following header
file:
/* bool.h */
struct bool {
void booli(int i);
int ibool ();
private:
int val;
};
Two things have changed: the friend keyword is removed, and bool is no longer
passed to, or returned by, the member functions. Here is an example of legal
invocations of the two functions:
#include "bool.h"
bool b;
b.booli(45);
int i = b.ibool();
As you can see, the syntax for calling member functions in a structure is
analogous to the syntax for selecting data members in a structure. A member
function is implicitly passed a pointer to the variable it was invoked with;
that is why ibool() no longer needs an explicit bool argument and booli() no
longer needs to explicitly return a bool value.
The definition of the two member functions must also change if they are not
friend functions. The corresponding changes in bool.c are:
/* bool.c */
#include <bool .h>
void bool::booli(int i) {
val = i;
}
int bool::ibool(){ return val; }
The first change results from the fact that the complete name of a C++ member
function is typename::function, which distinguishes it from any non-member
functions. This syntax allows you to use the same member function name in
different data types without any conflict. The second change is that the
functions can refer to the data members of the structure (in this case, val)
as though they were local variables. This is possible because member functions
cannot be invoked without some variable of the correct data type.
Of course, the user of data type bool still has to type things like b.ibool(),
just to reference a Boolean; however, you now have hidden the implementation
of the data type. If you ever discover a bool that has been set to something
other than zero or one, you know the bug must be in the member functions. If
you want to change val to be a char instead of an int, you can be confident
that only the member functions will be affected by the change.


Data Type Initialization


Actually, there is a problem in the protection for bool variables; if the user
doesn't initialize a bool variable, then it may contain garbage instead of a
one or a zero. You can eliminate this loophole by telling the compiler that
each variable of data type bool must be initialized when it is declared. This
is done by declaring a special member function called a constructor. A
constructor looks like an ordinary member function except that it has the same
name as the data type and it has no return type. In this case, you can simply
change the name booli() to bool() as follows:
/* bool.h */
struct bool {
bool (int i);
int ibool ();
private:
int val;
};
Creating a constructor for a data type has three main implications. First, the
compiler will no longer allow a variable of that data type to be declared
without being initialized. Second, if the constructor takes a single argument
as this one does, the compiler will use the constructor whenever it sees a
cast from the argument data type into the data type the constructor is a
member of. Finally, if the constructor takes a single argument, it defines an
implicit conversion from that argument's data type into the data type the
constructor is a member of. These implications deserve further explanation.
Defining a constructor guarantees you won't have any uninitialized variables
of a particular data type. In other words, with the newest header file, the
following code won't compile:
#include "bool.h"
bool b; /* error */
The compiler will complain that the variable b must be initialized. The
constructor also introduces a new syntax for initializing the variables of its
data type:
#include "bool.h"
bool b(9);
In the statement shown, the function bool::bool() gets called to convert the
integer 9 into a Boolean 1.
Defining a constructor with a single argument also enables the cast operator
for that data type. In this case, the constructor will be called whenever you
cast an integer into a Boolean as in the following example:

#include "bool.h"
bool b(1);
int i = 38;
...
b = (bool) i;
In addition to the C-style type cast, C++ allows a function-style type cast.
For example, in C++ you can say:
int i;
long l;
l = (long)i; /* legal C & C++ */
l = long(i); /* legal C++ not C */
The function-style syntax is simply easier to read. This expands the number of
ways you can initialize a bool to three:
#include "bool .h"
bool b(9);
bool c = (bool)9;
bool d = bool(9);
In all three cases, the function bool::bool() gets called to perform the
conversion.
Finally, a constructor with one argument defines an implicit conversion. Just
as you can assign an int to a long because of C's built-in implicit
conversion, you can now assign an int to a bool, because an implicit
conversion has been defined for it. In other words, you don't have to use the
cast operator and the following code fragment is legal C++.
#include "bool .h"
bool b(1);
int i;
b = i; /* OK */
Now you have a Boolean that can be assigned an integer value which gets
converted by the function you've defined.
There is no way to define a constructor function that takes a single argument
without getting all three of these effects. This is a fact to ponder when you
define a constructor for a data type. You can't tell the compiler to require
the user to use an explicit cast to convert an integer into a Boolean. You
can't tell the compiler to allow a Boolean to be initialized with an integer,
but not assigned an integer. Sometimes, this forces you to avoid using a
constructor and return to the explicit function notation used previously.
If you want all of your Booleans to be initialized, but don't want to have to
type all those initializers, you can take advantage of a C++ feature called
default arguments. Here is how it works:
/* bool.h */
stuct bool {
bool(int i=0);
int ibool ();
private:
int val;
};
Now, whenever the compiler would normally call bool::bool() but does not have
an argument for it, it will use a value of zero. Thus, both of the following
statements cause bool::bool() to be invoked with an argument of zero:
#include "bool .h"
bool b; /* initialize to zero */
bool b = bool();


Overloading The Cast Operator


Telling the compiler how to implicitly convert ints into bools is only half
the job of making the two data types compatible. The remaining task is to
replace bool::ibool() with a conversion function that tells the compiler how
to implicitly convert bools into ints. If int were a user-defined type instead
of a built-in type, you could define within it a constructor that takes a
single bool argument and that would do the trick. Since that is not possible,
C++ gives you another way to define implicit conversions: you can redefine how
the cast operator operates on your data type. (In fact, you can redefine how
most any operator operates on your data type).
Once again, the type conversion requires a member function, but this time it
is an operator member function. This revised header replaces ibool() with a
type conversion function:
/* bool.h */
struct bool {
bool (int i);
operator int();
private:
int val;
};
The revised specification of bool tells the compiler you have defined a
function that it can use whenever casting a bool into an int. The revised
implementation looks like this:
/* bool.c */
#include <bool.h>
bool::booli(int i) {
val = (i != 0);
}
bool::operator int(){ return
val; }
Having a function named bool::operator int() is a little disconcerting, but
the code is otherwise the same.
Just as with constructor functions, the conversion function can be used either
implicitly or explicitly. The following code, for example, is legal:

#include "bool.h"
bool b=1;
int i;
/*...*/
i = (int)b;
i = int(b);
i = b;
In all three assignments, the function bool::operator int() is called to
transform the bool into an int. There is no way to define a type conversion
that must be used explicitly. Just as with constructors, when implicit
conversions are undesirable, you must resort to something like the explicit
member function call defined previously using bool::ibool().
The bool data type is nearly finished now. The original example of the desired
usage of the data type is completely legal now:
bool func (){
bool more;
while (more=fread (/*args*/)){
if(/*some condition*/)
return more;
}
}
You can freely mix bools and ints while still controlling initializations of,
and assignments to, bools and still hiding the actual representation of a
Boolean so that you can change it to short or char without impacting the data
type user's code.


Efficiency


The final deficiency to remove from the bool definition is the function call
overhead. In defining complicated data types in which the conversion functions
contain a lot of code, the function call overhead would not be significant. In
this example, however, the function call overhead is probably larger than the
entire body of the conversion functions. You can remove that overhead quite
easily by placing the function definition right in the data type
specification. Doing that and adding some useful constants results in the
following header file:
/* bool.h */
#define FALSE 0
#define TRUE 1
struct bool {
bool(int i){val = (i!=O);}
operator int(){return val;}
private:
int val;
};
This style of function definition makes the functions inline, which means the
compiler will try to generate inline code for them each time they are used,
rather than call them as functions.
There is actually an inline keyword in C++ which you can place in front of any
function you would like to be inline. The simpler the function is, the more
likely the compiler can generate inline code for it. When the compiler cannot
inline the code, it simply generates a normal function call. Functions that
are defined inside data type specifications are automatically inline
functions, so there was no need for the keyword in the above specification.
Now the bool data type is as efficient as anything you could implement in C to
do the same thing. Each variable of type bool is guaranteed to be either zero
or one at all times and you are free to change how the Boolean is stored and
how it is converted to and from integers without affecting any other code.
Unfortunately our current specification still looks a bit like a C programmer
coded it; here is how a more fluent C++ programmer might write it:
// bool .h
class bool {
int val;
public:
bool(int i){val = (i!=0);}
operator int (){return val ;}
};
const bool FALSE=0,TRUE=1
For single-line comments, it's often easier to use the C++ comment convention
//. The keyword class is just like the keyword struct except that all of its
members are private by default, whereas in a struct all the members are public
by default. C++ has both a private keyword and a public keyword and they
complement each other. Finally, the #define constants are replaced with
constants that have the type bool rather than type int. Now the specification
looks more like "real" C++.


Making Incompatible Types


As mentioned, a large project might contain more than one data type that is
compatible with ints. For example, a game simulation might contain a data type
called playercount (a non-negative count of the number of players within a
particular grid). It could be desirable for a data type like this to be
compatible with ints just as bool is, but would code like this also be legal?
#include "bool .h"
#include "player.h"
playercount p;
bool b;
p = b;
The answer, thankfully, is "no". Even if the compiler has been told how to
implicitly convert a playercount into an int, and how to implicitly convert an
int into a bool, the rule is that the compiler never uses more than one
user-defined implicit conversion on a single value. Without this rule, the
number of possibly unwanted implicit conversions would mushroom as you added
each data type with a user-defined conversion.
If you really want a second user-defined conversion to be applied, you can
code it explicitly:
#include "bool .h"
#include "player.h"

playercount p;
bool b;
p = playercount (b);


Conclusion


C++ makes it easy to construct data types that isolate implementation details.
Additionally, user conversions give you a degree of control over how your data
types interact with other data types. Like many features of C++, it is
possible to get into trouble defining new data types. In particular, the
danger of user-defined conversions is that the compiler will perform silent
conversions you did not intend. This is a very real problem when you construct
multiple data types for a large project, each with its own conversion
functions. However, these potential problems are outweighed by the advantages
of being able to design a type system that suits the individual needs of your
project.






















































A Practical C File I/O Tutorial: A Mini-Database Program
Leor Zolman wrote "BDS C", the first C compiler designed exclusively for
personal computers. Since then he has designed and taught programming
workshops and has also been involved in personal growth workshops as both
participant and staff member. He still doesn't hold any degrees. His latest
incarnation is as a CUJ staff member.




Series Introduction


If you're a recent convert to C from any other high-level language and you've
tried to write programs that do any serious file input/output, then chances
are you've experienced more than a little bit of frustration. The C standard
library, in keeping with the general philosophy of the C language, provides
tools powerful enough for doing anything you want, provided you know how to
correctly combine those tools. In the case of file I/O, about the only
operations supported in a "trivial" manner are:
reading and writing bytes
reading and writing single lines of text
For reading and writing any other flavor of data structure to or from the
disk, a certain level of "C sophistication" is required. Often, the task
quickly moves beyond "How do I read and write this data?" toward the more
general problem, "What is the most appropriate way to represent this data in
order to facilitate efficient means of reading and writing it to disk?"
In this series of tutorial articles, I will develop from scratch a complete
special-purpose "mini-database" system in order to illustrate the process of
designing file based C applications. The resulting system will be functional
but intentionally inadequte for any particular task.
This first installment will consist of an operational description of the
database, broken into the following areas:
data structures
functional description
user interface (the menu system)
The second installment will present the database record editing and management
mechanism.
Later installments will present several different approaches to storing the
data on disk and will discuss the relative merits of each approach. The first
version will store all data to disk as user-readable text and will use
statically-allocated arrays in memory.
The second version will store all data to disk in binary format for rapid
transfer. I'll also develop two memory allocation systems for the binary
version: static array allocation (same as for the textual disk format) and
dynamic array allocation (to optimize the use of system memory).


Mini-Database Data Record Structure


This will not be a "general-purpose" database system, but rather a program
built to handle only one specific record format: a personnel record as in
Table 1.
The definition of the structure tag for this record, named record, is shown in
the header file (Listing 1, lines 30-37).
The system will be able to handle only one active database at a time. We'll
use dynamic memory allocation to obtain storage for the data records, so that
data memory is allocated only when necessary. For the first version of the
system, the list of data record pointers will be kept in a
statically-allocated (i.e., fixed-length) array. The definition of this array
is shown on line 49 of the header file. The name of the array is recs, and its
type is
array (of MAX_RECS elements) of
pointers to structures of type
record
The programmer must explicitly size a fixed-length array. In my code the size
is MAX_RECS. Thus, the total amount of fixed memory needed for storing the
records of the database is MAX_RECS times the size of a single record pointer.
(In later versions, I'll even show how to dynamically allocate the storage for
the recs array itself. To facilitate this modification the symbol RECS is
introduced (Listing 1, line 50) as an alias for recs.)
Lines 42-46 (of Listing 1) illustrate a necessary complication when writing
multiple-source-file programs: global data must be defined in one module and
one module only. If the data is to be known in any other modules of the
program, it must be declared in those other modules. Definitions actually
reserve storage for the specified data, while declarations only serve to
inform the compiler about the nature of data defined elsewhere. This
simplistic rule of thumb will usually differentiate between definitions and
declarations appearing in header files: 
If the extern modifier is used, you're probably looking at a declaration;
otherwise, you're probably looking at a definition.
To conform with the ANSI Standard, each data item should be defined only once
among all the source modules of a program. At first it might seem one need
only insert an extern keyword in front of all but one declaration, making it
the definition. Unfortunately, this is not easily done. Typical
multiple-module programs use lots of shared data; do we really want to
maintain separate lists of declarations in separate modules, some having the
extern keyword and some not? Of course not; we'd rather have all the data
included within a single .h file. But if the declarations/definitions must be
written differently in separate files, can we really use a single header file?
Yes. Lines 42-46 illustrate a symbolic constant to control whether the extern
keyword is generated for the critical declarations. If MAIN_MODULE exists,
then we are compiling the main module of the program and the symbolic constant
EXTERN is defined to nothing (so the items in lines 49, 52 and 53 are
defined). Otherwise EXTERN is defined to extern and the lines are treated as
declarations of external data. To force definitions to be created as the main
module is compiled, we #define MAIN_MODULE (see Listing 2, line 24) before the
inclusion of the header file. The other modules of the program do not contain
such a definition.
(Note: The difference between definitions and declarations has been rendered
fuzzy by variations among C compilers over the years. Microsoft, perhaps to
eliminate the need for exactly the sort of mechanism just presented, decided
to make its linker allow multiple definitions of the same piece of data among
source files of a program (although multiple initializations were still
flagged as errors.) While this does simplify development in some cases, it
renders C programs relying on this "feature" non-portable. Turbo C 2.0, under
which this database program was developed, makes you "do it right", even if
doing so requires a little bit more thought.)


Other Global Data


The system maintains a minimal amount of global data to describe the currently
open database's state. The variable n_recs tells how many records are
currently held in memory. The variable max_recs contains the maximum number of
records that can be represented. For the fixed-length array version, max_recs
is simply assigned the value of the symbolic constant MAX_RECS (Listing 2,
line 72).


A Simple Menu System


A simple line-oriented menu system serves as the user interface. The menu
function do_menu is shown in MDBUTIL.C (Listing 3, lines 21-39.) The menu
consists of a list of pointers to structures of type menu_item (Listing 1,
lines 55-58), where each menu_item consists of an integer action code and a
string description of the action. do_menu simply numbers and lists out each
description (up to but not including the first entry with action_code of 0),
asks the user to pick one of the choices, and returns the action_code value
associated with the selected item. Note that the action_code values need not
correspond to the choice numbers displayed by the function.


The Main Menu Options


The database operations are divided into two menus. The first menu (Listing 2,
lines 37-48) contains the options for controlling database selection, disk I/O
and program termination. The second menu, within the MDBEDIT.C module (shown
in a future article) controls all the options associated with editing the data
records of the currently active database.
The main menu controls the top-level system functions. A variable, db_active,
tells whether a database is currently open, and thus whether certain
operations are appropriate. For example, we don't want to allow the user to
open a new database if another is currently open. The main menu options are as
follows:



CREATE:


Initialize a new database. Ask the user for a name for the database (this will
also be the name of the file used to store the database on disk) and check to
make sure another file does not already exist by that name. If the name is OK,
then initialize max_recs, n_recs and db_active.


EDIT:


Call the edit_db() function to edit the records of the database.


OPEN:


Load a previously stored database from disk (via the read_db function), then
go immediately into editing that database by calling edit db. read_db()
allocates the appropriate amount of memory for the database records, assigns
the pointers to elements in the RECS array, and returns the number of records
loaded. We announce the number of records before calling edit_db().


BAKUP.


This menu entry is included to encourage backup facilities in your
applications. The backup function, backup_rib(), is just a dummy.


CLOSE:


Terminate operations on the current database, write it to disk and free up all
associated storage.


SAVE:


Write the database to disk, preventing loss of work "so far" in case of a
system crash. Do not close the database.


ABANDON:


Close the database without saving it to disk: free up all storage.


QUIT:


Exit the program.


Utility Functions


Listing 3 shows the source module MDBUTIL.C, containing utility functions used
throughout the program. In addition to the do_menu() function (already
described), this module includes error(), alloc_rec() and free_up().
The error() function is a general-purpose fatal exit. It prints a message and
exits.
The alloc_rec() function is not used by any of the code in this month's
listing, but is basic to the operation of the program. alloc_rec() is called
to obtain memory from the system to store a single record of database data.
The malloc() function is called to actually obtain the block of storage.
alloc_rec returns either NULL, signaling that the system has no more storage
to spare, or a valid memory pointer obtained from malloc().
The free_up() function returns all storage (obtained through calls to
alloc_rec) back to the system. In this system storage is always freed up for
the entire database at one time (when the current database file is closed or
abandoned.) Freeing that storage is simply a matter of walking through all the
records of the database and calling the free() function for each pointer.
Next month: Editing the database records.
Table 1
Name: Type: Value:


active char 1 if record is active, 0 if deleted
last char[25] Last name
first char[15] First name
id long ID number
age int Age
gender char 'M' or 'F'
salary float Annual salary

Listing 1
1: /*
2: * MDB.H (Static-Array-Only Version)
3: *
4: * Program: Mini-Database
5: * Written by: Leor Zolman
6: * Module: Program Header File
7: */
8:
9: #define TRUE 1
10: #define FALSE 0
11:
12: /*
13: * Prototypes:
14: */
15:
16: int do_menu(struct menu_item *mnu, char *title);
17: void write_db(char *filename);
18: int read_db(char *filename);
19: void edit_db();
20: void fix_db();
21: void backup_db();
22: void error(char *msg);
23: struct record *alloc_rec(void);
24: void free_up();
25:
26: /*
27: * Data Definitions:
28: */
29:
30: struct record { /* Database record definition */
31: char active; /* TRUE if Active, else FALSE */
32: char last[25], first[15]; /* Name */
33: long id; /* ID Number */
34: int age; /* Age */
35: char gender; /* M or F */
36: float salary; /* Annual Salary */
37: };
38:
39: #define MAX_RECS 1000 /* Maximum number of records */
40:
41:
42: #ifdef MAIN_MODULE /* Make sure data is only */
43: #define EXTERN /* DEFINED in the main module, */
44: #else /* and declared as EXTERNAL in */
45: #define EXTERN extern /* the other modules. */
46: #endif
47:
48:
49: EXTERN struct record *recs[MAX_RECS]; /* Array of ptrs to */

50: #define RECS recs /* structs of type record */
51:
52: EXTERN int n_recs; /* # of records in current db */
53: EXTERN int max_recs; /* Max # of recs allowed */
54:
55: struct menu_item { /* Menu definition record */
56: int action_code; /* Menu item code */
57: char *descrip; /* Menu item text */
58: };
59:


Listing 2
1: /*
2: * MDBMAIN.C (Static Array Only Version)
3: *
4: * Program: Mini-Database
5: * Written by: Leor Zolman
6: * Module: Main Program Module
7: *
8: * Program Description:
9: * This system is an "introductory showcase" of
10: * C programming techniques for File I/O-related
11: * applications. Areas of focus include:
12: * Static and Dynamic Array Allocation
13: * Text-based and Binary-based Disk Data Storage
14: * Elementary user-interface and error-handling
15: *
16: * Compile & Link (Turbo C):
17: * tcc mdbmain.c mdbedit.c mdbutil.c
18: * {mdbftxt.c or mdbfbin.c}
19: */
20:
21: #include <stdio.h>
22: #include <stdlib.h>
23:
24: #define MAIN_MODULE 1 /* force data definitions */
25: #include "mdb.h"
26:
27:
28: #define CREATE 1 /* Main menu action codes */
29: #define OPEN 2
30: #define EDIT 3
31: #define SAVE 4
32: #define BAKUP 5
33: #define CLOSE 6
34: #define ABANDON 7
35: #define QUIT 8
36:
37: static struct menu_item main_menu[] =
38: {
39: {CREATE, "Create New Database"},
40: {OPEN, "Select Existing Database to Work With"},
41: {EDIT, "Edit Database Records"},
42: {SAVE, "Write Database to Disk"},
43: {BAKUP, "Backup Database to Floppies"},
44: {CLOSE, "Close the Database"},
45: {ABANDON, "Abandon Changes to the Current Database"},
46: {QUIT, "Quit"},

47: {NULL} /* End of list */
48: };
49:
50:
51: main(int argc, char **argv)
52: {
53: char db_name[150];
54: int db_active = FALSE; /* No Database open */
55: FILE *fp;
56:
57: while (1)
58: {
59: switch(do_menu(main_menu, "Main Menu"))
60: {
61: case CREATE:
62: if (db_active)
63: goto still_open;
64: printf("Name for new Database? ");
65: gets(db_name);
66: if ((fp = fopen(db_name,"r")) != NULL)
67: {
68: printf("That filename already exists.\n");
69: fclose(fp);
70: break;
71: }
72: max_recs = MAX_RECS;
73: db_active = TRUE;
74: n_recs = 0;
75: printf("Entering EDIT mode:\n");
76: /* After creating, fall through to EDIT */
77:
78: case EDIT:
79: if (!db_active)
80: goto inactive;
81: edit_db(db_name); /* Edit recs in memory */
82: break;
83:
84: case OPEN:
85: if (db_active)
86: {
87: still_open: printf("Current Database still open.\n");
88: break;
89: }
90: printf("Database Name? ");
91: gets(db_name);
92: if ((n_recs = read_db(db_name)) != NULL)
93: {
94: printf("\nLoaded %d Record(s).\n",
95: n_recs);
96: db_active = TRUE;
97: }
98:
99: edit_db(db_name);
100: break;
101:
102: case BAKUP:
103: if (!db_active)
104: goto inactive;
105: backup_db(); /* Perform backup */

106: break;
107:
108: case CLOSE:
109: if (!db_active)
110: goto inactive;
111: write_db(db_name); /* write to disk */
112: free_up();
113: db_active = FALSE;
114: break;
115:
116: case SAVE:
117: if (!db_active)
118: goto inactive;
119: write_db(db_name); /* write to disk */
120: break;
121:
122: case ABANDON:
123: if (!db_active)
124: {
125: inactive: printf("Please select a Database!\n");
126: break;
127: }
128: free_up();
129: db_active = FALSE;
130: break;
131:
132: case QUIT:
133: if (db_active)
134: {
135: write_db(db_name); /* write to disk */
136: free_up();
137: }
138: exit(0);
139: }
140: }
141: }
142:
143: /*
144: * Function: backup_db
145: * Purpose: Backup current Database to floppies
146: * Parameters: None
147: * Return Value: None
148: */
149:
150: void backup_db() /* Backup module */
151: {}


Listing 3
1: /*
2: * MDBUTIL.C
3: *
4: * Program: Mini-Database
5: * Written by: Leor Zolman
6: * Module: Utility functions
7: */
8:
9: #include <stdio.h>
10: #include <stdlib.h>

11: #include "mdb.h"
12:
13:
14: /*
15: * Function: do_menu
16: * Purpose: Simple line-oriented menu handler
17: * Parameters: None
18: * Return Value: None
19: */
20:
21: int do_menu(struct menu_item *mnu, char *title)
22: {
23: int i, j;
24: char buf[150];
25:
26: printf("\n%s -- Options:\n", title);
27: for (i = 0; mnu[i].action_code != NULL; i++)
28: printf("%2d) %s\n", i+1, mnu[i].descrip);
29:
30: while (1)
31: {
32: printf("\nYour choice? "};
33: j = atoi(gets(buf));
34: if (j >= 1 && j <= i)
35: break;
36: printf("Please select from options 1-%d: ", i+1);
37: }
38:
39: return mnu[j - 1].action_code;
40: }
41:
42:
43: /*
44: * Function: error
45: * Purpose: Report error end terminate program
46: * Parameters: Message to display
47: * Return Value: None
48: */
49:
50: void error(char *msg)
51: {
52: printf ("Fatal Condition: %s\n", msg);
53: exit(-1);
54: }
55:
56:
57: /*
58: * Function: alloc_rec
59: * Purpose: Allocate memory for a Database record,
60: * checking for an allocation error
61: * Parameters: None
62: * Return Value: Pointer to memory, or NULL on error
63: */
64:
65: struct record *alloc_rec(void)
66: {
67: struct record *temp;
68:
69: if ((temp = malloc(sizeof(struct record))) == NULL)

70: return NULL;
71: else
72: return temp;
73: }
74:
75:
76: /*
77: * Function: free_up
78: * Purpose: De_allocate all records in current Database
79: * Parameters: None
80: * Return Value: None
81: */
82:
83: void free_up()
84: {
85: int i;
86:
87: for (i = 0; i < n_recs; i++)
88: free(RECS[i]);
89: }
90:
91:









































Using Files As Semaphores


Lyle Frost


Lyle Frost is the owner of Citadel, a consulting and software development
firm. He can be contacted at 241 E. Eleventh St., Brookville, IN 47012 or on
the Citadel BBS at (317) 647-2403.


Multitasking operating systems allow a single application to execute as a
group of concurrent processes. These processes must usually share access to
common resources, such as data files or shared memory. In the case of a
multi-user database system, for example, each user would access the same set
of data files through separate processes. Concurrent access to a shared
resource requires some synchronization to prevent the processes from
interfering with one another. The semaphore is one of the primary constructs
for achieving this synchronization. Implementing semaphores is in general a
difficult task. However, files may serve as a very simple implementation.


Mutual Exclusion


A program segment that accesses a shared resource is called a critical
section. While in a critical section, a process must prevent other processes
from entering critical sections requiring the same resource. This type of
synchronization is called mutual exclusion.
For example, consider two concurrent processes A and B which simultaneously
modify the same record. Two distinct accesses are needed to modify a record;
the original record must first be read from its storage area, then after being
modified, must be written back. Without mutual exclusion, the timing of the
individual accesses by process A relative to those by process B is
unpredictable. If, for instance, the write by process A occurred during the
interval between the read and write by process B, the modification made by
process A would be lost (Figure 1a). Only if the two critical sections
happened not to overlap would the correct result be obtained (Figure 1c).
Mutual exclusion ensures that conflicting critical sections do not overlap.
The basic principle of mutual exclusion is not complicated: before entering a
critical section, a process must somehow allocate the required resource for
its exclusive use. If the process fails to obtain the resource because it is
currently allocated by another process, execution of the critical section must
be postponed until the resource can be acquired. The critical section may be
executed only after successfully allocating the resource, which must be freed
at the conclusion of the critical section. Figure 2 shows two processes
attempting to modify the same record simultaneously, but using mutual
exclusion.
Since an allocated resource impedes the execution of other processes needing
the same resource, critical sections should execute as quickly as possible,
and contain only the code necessary to complete the operations on the
resource. For instance, a critical section should not contain code to read
user input (unless, of course, the resource is a keyboard).


Semaphores


Semaphores are used to enforce mutually exclusive access to shared resources.
A semaphore is simply a flag that indicates to other processes that a specific
resource has been allocated. A process lowers a semaphore to indicate that a
resource is in use, then raises the semaphore when it has finished with the
resource. Successfully lowering a semaphore and entering the critical section
is also referred to as "passing the semaphore". The terminology derives from
the device used on railroads to show when a section of track is occupied. The
railroad semaphore is historically a mechanical arm that lowered a flag when
the section of track it marked was in use, then raised it when the track
became free.
The following pseudocode outlines the semaphore lower and raise operations.
Traditionally, a value of 0 is used for a lowered semaphore and a value of 1
for a raised semaphore.
semlower(semaphore s)
{
if s == 1 /* if semaphore is raised, */
then s = 0 /* lower semaphore */
else
s not available
}

semraise(semaphore s)
{
s = 1 /* raise semaphore */
}
At first glance, these two routines may seem trivial to implement -- and they
would be, except that semlower itself has a critical section; it must first
perform a test operation to check if s is raised, followed by a set operation
to lower s. Suppose two processes simultaneously attempted to lower the same
semaphore. If the timing was such that the test and set operations of the two
processes interleaved, each process would believe it had lowered the
semaphore. The semaphore is not in itself a solution to the root problem of
implementing mutual exclusion. Introducing the semaphore merely confines the
general concurrency problem to a single critical section. Once mutual
exclusion has been effected for the critical section in semlower, semaphores
can then provide mutual exclusion for all other critical sections.
There are two fundamental requirements for a semaphore implementation:
The semaphore must be visible to all processes which manipulate it.
Mutual exclusion must be effected for the critical section in semlower.
Assuming that a shared memory facility is available, the first requirement can
be met without great difficulty. The second, however, is a problem. While
mutual exclusion algorithms have been devised, they are all quite complex.
However, by using files as semaphores, the complex algorithms, as well as the
need for shared memory capabilities, can be avoided.
Files clearly fulfill the visibility requirement, but how they provide mutual
exclusion is not so obvious.
Access to the file system is controlled by the operating system and requires
special functions referred to as system calls. In UNIX, for example, open,
close, and unlink are file-related system calls. Though invoked exactly like a
regular function, a system call causes code within the operating system to be
executed. (Note that while the stdio library functions which access the file
system are regular functions, they must be written using system calls -- fopen
would call open, remove would call unlink, etc.)
When called to delete a file, the operating system must first check that the
file exists, then delete it. This is a test and set sequence and as such,
requires mutual exclusion. Since the operating system has complete control
over process scheduling, it can easily force a test and set to execute
consecutively. Because the operating system ensures this mutual exclusion,
files can be used as semaphores. Deleting and creating a file correspond to
lowering and raising a semaphore, respectively.
Listing 1 and Listing 2 show the salient parts of the source code for an
implementation of semaphores using files. Listing 1 (semaphor.h) should be
included by programs using these routines so that all semaphore files needed
by an application may be created as a "set" in a dedicated directory. The
semaphore set control structure semset_t defined in semaphor.h contains the
two values defining a semaphore set: the name of the directory containing the
semaphore files and the number of semaphores in the set.
A semaphore set is opened using the semopen function.
semset_t *semopen(char *semdir,
int flags, int semc);
semdir is the name of the directory containing the semaphore files. flags
values are constructed by bitwise OR-ing command and permission flags. If the
SEM_CREAT flag is set, the set will be created if it does not exist. If
SEM_EXCL is also set, semopen will fail (returning -1) if the semaphore set
already exists. The operating system dependent permission flags determine
access to the semaphore set and are usually defined in <sys/stat.h> as macros
of the form S_I*. If the semaphore set must be created, it will include semc
semaphores; otherwise, semc is ignored. The macro semcount counts the number
of semaphores in an open set. semopen returns a semaphore set pointer which is
used by other semaphore functions.
An individual semaphore can be lowered using the semlower function.
int semlower(semset_t *ssp, isemno);
ssp is a semaphore set pointer obtained from a previous call to semopen, and
semno is the number of the semaphore in that set to be lowered. If the
semaphore is already lowered, semlower fails and sets errno to EAGAIN. The
following code fragment illustrates the use of semlower.
for (n = 0; n < MAXTRIES; n++){
if (semlower(ssp, semno) == -1) {
if (errno == EAGAIN) {

/*semaphore already
lowered */
continue;
} else { /* error */
break;
}
}else { /* semaphore successfully lowered */
break;
}
}
A semaphore is raised using the semraise function.
int semraise(semset_t *ssp, int
semno);
ssp and semno are the same as for semlower. The source code for semlower and
semraise is shown in Listing 2.
After finishing with a semaphore set, it should be closed using semclose.
int semclose(semset_t *ssp);
All semaphores lowered by a process should be raised before calling semclose.
Finally,
int semremove(char *semdir);
removes all the semaphore files from directory semdir then removes the
directory.


Shared Locking


By definition, a semaphore allocates a resource for exclusive use by a single
process. But in many applications there are two types of critical sections:
those sections requiring exclusive access, and
those which may share access with each other, but not with critical sections
of the first type.
Introducing the second type of critical section creates what is usually
referred to as the Readers and Writers Problem. A critical section where data
will be written to a file usually requires exclusive access to the file, but
critical sections which will only read data may share access with each other.
(In database terminology exclusive locks are also called write locks, and
shared locks are also called read locks.)
Even though semaphores allocate only for exclusive access, they can also be
used to implement read and write locking. Each resource lock requires two
semaphores and a shared integer variable. The first (write) semaphore prevents
any other process from write locking the resource. The second (read) semaphore
locks not the resource, but the shared variable. The shared variable is the
read count. It contains the number of processes which have the resource read
locked.
If wsem is the write semaphore, rsem is the read semaphore, and rc is the read
count, then the algorithm to read lock a resource is:
semlower(rsem) /* allocate read count */
if rc == 0 /* if no other readers, */
semlower(wsem) /* allocate resource */
rc++ /* increment read count */
semraise(rsem) /* free read count */
The shared variable containing the number of readers is first allocated. The
first reader lowers the write semaphore to prevent the resource from being
write locked. Incrementing the read count informs the next reader that the
write semaphore is already lowered.
The algorithm for releasing a read lock is shown below.
semlower(rsem) /* allocate readcount */
rc- /* decrement readcount */
if rc == 0 /* if no other readers, */
semraise(wsem) /* free resource */
semraise(rsem) /* free readcount */
When the last reader leaves its critical section, the write semaphore is
lowered to allow the resource to be write locked. A write lock is acquired and
released by lowering and raising the write semaphore, respectively. If any
processes have the resource read locked or write locked, the write semaphore
will already be lowered and the attempted write lock will fail.
Listing 3 and Listing 4 show a portion of the implementation of r/w locking
using files. The read counts require the same visibility as the semaphores,
and so files are also used for read counts. The routines developed above are
used to manipulate the semaphores (two for each r/w semaphore). r/w semaphores
are also grouped into sets; the read count files (one for each r/w semaphore)
and the directory containing the semaphores (twice the number of r/w
semaphores) would be isolated within a single directory.
The header rwsem.h, Listing 3, defines the r/w semaphore set control structure
rwsset_t. rwsdir contains the files used by the semset_t, and rwsc is the
number of r/w semaphores. ssp points to a semaphore set used for the write and
read semaphores. lockheld points to an array of lock types containing the type
of lock held by the calling process for each r/w semaphore. (The lock type
must be remembered because the procedure to remove a read lock is different
from that for removing a write lock.)
The r/w semaphore functions
rwsset_t *rwsopen(rwsset_t *rwsp, int flags, int rwsc);
int rwscount(rwsset_t *rwsp);
int rwsclose(rwsset_t *rwsp);
int rwsremove(char *rwsdir);
are exactly analogous to their semaphore counterparts. The single function
rwslock controls locking, in place of semlower and semraise.
int rwslock(rwsset_t *rwsp, int rwsno, int ltype);
The first two parameters are the same as for semlower and semraise. The last
specifies the type of lock.
RWS_UNLCK unlock
RWS_RDLCK read (shared) lock
RWS_WRLCK write (exclusive) lock
If ltype is RWS_RDLCK and the indicated r/w semaphore is already in a write
lock state, or if ltype is RWS_WRLCK and the indicated r/w semaphore is
already in a read lock state, rwslock will fail (return -1) and set errno to
EAGAIN. As for semlower, a busy wait loop would be used for rwslock when ltype
is RWS_RDLCK or RWS_WRLCK. The source for rwslock is shown in Listing 4.



Conclusion


The control of concurrency has been a prominent topic for many years; E. W.
Dijkstra's paper introducing semaphores was first published in 1965. But in
spite of its relatively long history, it may be new to many microcomputer
programmers who are suddenly acquiring multitasking capabilities for the first
time. Understanding the implications of concurrency and the techniques for
concurrency control will be necessary for the programmer to fully utilize the
new multitasking systems now available for microcomputers, particularly those
that are also multi-user.
References
Calingaert, P. Operating System Elements. Englewood Cliffs, NJ: Prentice-Hall,
1982.
Deitel, H. An Introduction to Operating Systems. Reading, MA: Addison-Wesley,
1984.
Figure 1 Concurrent Access Without Mutual Exclusion
Figure 2 Concurrent Access With Mutual Exclusion

Listing 1
/* semaphore.h */

#ifndef SEMAPHOR_H /* prevent multiple includes */
#define SEMAPHOR_H

#include <limits.h>
#ifndef PATH_MAX
#define PATH_MAX (256) /* max # of characters in a path name */
#endif

/* constants */
#define SEMOPEN_MAX (60) /* max # semaphore sets open at once */

/* type definitions */
typedef struct { /* semaphore set control structure */
char semdir[PATH_MAX + 1]; /* semaphore directory path name */
int semc; /* semaphore count */
} semset_t;

/* function declarations */
int semclose(semset_t *ssp);
#define semcount(ssp) ((ssp)->semc)
int semlower(semset_t *ssp, int semno);
semset_t * semopen(char *semdir, int flags, int semc);
int semraise(semset_t *ssp, int semno);
int semremove(char *semdir);

/* semopen command flags */
#define SEM_CREAT (01000) /* create and open */
#define SEM_EXCL (02000) /* exclusive open */

/* error codes */
#define SEMEOS (0) /* start of error code domain */
#define SEMEMFILE (SEMEOS - 1) /* too many semaphore sets open */
#define SEMPANIC (SEMEOS - 2) /* internal semaphore error */

#endif /* #ifndef SEMAPHOR_H */


Listing 2
/* semaphore.c */

/* Supported operating systems: UNIX, MS-DOS */
#define UNIX (1)
#define MSDOS (2)
#define HOST UNIX


#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <string.h>
#if HOST == UNIX
#define PATHDLM ('/') /* path name delimiter */
#include <fcntl.h> /* open() macro definitions */
#include "syscalkr.h" /* system call declarations */
#include <sys/stat.h> /* file mode macros */
#elif HOST == MSDOS
#define PATHDLM ('\\') /* path name delimiter */
#include <fcntl.h> /* open() macro definitions */
#include <io.h> /* close(), open() declarations */
#include <sys/types.h>
#include <sys/stat.h> /* file mode macros */
#endif
#include "semaphor.h"

/* semaphore set table definition */
static semset_t sst[SEMOPEN_MAX];

/* semlower: lower semaphore */
int semlower(semset_t *ssp, int semno)
{
char path[PATH_MAX + 1];

/* remove semaphore file */
sprintf(path, "%s%cs%d", ssp->semdir, (int)PATHDLM, semno);
if (unlink(path) == -1) {
if (errno == ENOENT) errno = EAGAIN;
return -1;
}

errno = 0;
return 0;
}

/* semraise: raise semaphore */
int semraise(semset_t *ssp, in, semno)
{
char path[PATH_MAX + 1];
int fd = 0;

/* create semaphore file */
sprintf(path, "%s%cs%d", ssp->semdir, (int)PATHDLM, semno);
#if HOST == UNIX
fd = open(path, O_WRONLY O_CREAT, O);
#elif HOST == MSDOS
fd = open(path, O_WRONLY O_CREAT, S_IREAD S_IWRITE);
#endif
if (fd == -1) {
return -1;
}
if (close(fd) == -1) {
return -1;
}

errno = 0;
return 0;

}


Listing 3
/* rwsem.h */
#ifndef RWSEM_H /* prevent multiple includes */
#define RWSEM_H

#include <limits.h>
#include "semaphor.h"

/* constants */
#define RWSOPEN_MAX SEMOPEN_MAX /* max # rwsem sets open at once */

/* type definitions */
typedef struct { /* rwsem set control structure */
char rwsdir[PATH_MAX + 1]; /* directory */
int rwsc; /* r/w semaphore count */
semset_t *ssp; /* semaphore set */
short *lockheld; /* locks held by calling process */
} rwsset_t;

/* function declarations */
int rwsclose(rwsset_t *rwsp);
#define rwscount(rwsp) ((rwsp)->rwsc)
int rwslock(rwsset_t *rwsp, int rwsno, int ltype);
rwsset_t * rwsopen(char *rwsdir, int flags, int rwsc);
int rwsremove(char *rwsdir);

/* rwsopen command flags */
#define RWS_CREAT (01000) /* create and open */
#define RWS_EXCL (02000) /* exclusive open */

/* lock types */
#define RWS_UNLCK (0) /* unlock */
#define RWS_RDLCK (1) /* read lock */
#define RWS_WRLCK (2) /* write lock */

/* error codes */
#define RWSEOS (-20) /* start of error code domain */
#define RWSEMFILE (RWSEOS - 1) /* too many rwsem sets open */
#define RWSPANIC (RWSEOS - 2) /* internal rwsem error */

#endif /* #ifndef RWSEM_H */


Listing 4
#include <errno.h> /*rwslock.c */
#include <limits.h>
#include <stdio.h>
#include <string.h>
#define PATHDLM ('/') /* path name delimiter */
#include "rwsem.h"

/* function declarations */
int getcnt(char *file, int *cntp);
int putcnt(char *file, int cnt);

/* read/write semaphore set table definition */

static rwsset_t rwst[RWSOPEN_MAX];

/* rwslock: read/write semaphore lock */
int rwslock(rwsset_t *rwsp, int rwsno, int ltype)
{
int wsem = 0; /* write semaphore */
int rsem = 0; /* read semaphore */
int rc = 0; /* readcount */
char rcpath[PATH_MAX + 1]; /* readcount file path name */

/* identify write and read semaphores and read-count file */
wsem = rwsno * 2;
rsem = wsem + 1;
sprintf(rcpath, "%s%cr%d", rwsp->rwsdir, (int)PATHDLM, rwsno);

switch (ltype) {
case RWS_UNLCK: /* unlock */
switch (rwsp->lockheld[rwsno]) {
case RWS_UNLCK: /* unlock */
break; /* case RWS_UNLCK: */
case RWS_RDLCK: /* read lock */
if (semlower(rwsp->ssp, rsem) == -1) { /* allocate readcount */
return -1;
}
getcnt(rcpath, &rc); /* get readcount */
rc--; /* decrement readcount */
if (rc == 0) { /* if no other readers, */
if (semraise(rwsp->ssp, wsem) == -1) { /* free resource */
semraise(rwsp->ssp, rsem);
return -1;
}
}
putcnt(rcpath, rc); /* store new readcount */
if (semraise(rwsp->ssp, rsem) == -1) { /* free readcount */
return -1;
}
break; /* case RWS_RDLCK: */
case RWS_WRLCK: /* write lock */
if (semraise(rwsp->ssp, wsem) == -1) {
return -1;
}
break; /* case RWS_WRLCK: */
default:
errno = RWSPANIC;
return -1;
break; /* default: */
};
break; /* case RWS_UNLCK: */
case RWS_RDLCK: /* read lock */
if (rwsp->lockheld[rwsno] == RWS_RDLCK) {
errno = 0;
return 0;
}
if (semlower(rwsp->ssp, rsem) == -1) { /* allocate readcount */
return -1;
}
getcnt(rcpath, &rc); /* get readcount */
rc++; /* increment readcount */
if (rc == 1) { /* if no other readers, */

if (semlower(rwsp->ssp, wsem) == -1) { /* allocate resource */
semraise(rwsp->ssp, rsem);
return -1;
}
}
putcnt(rcpath, rc); /* store new readcount */
if (semraise(rwsp->ssp, rsem) == -1) { /* free readcount */
if (rc == 1) semraise(rwsp->ssp, wsem);
return -1;
}
break; /* case RWS_RDLCK: */
case RWS_WRLCK: /* write lock */
if (semlower(rwsp->ssp, wsem) == -1) { /* allocate resource */
return -1;
}
break; /* case RWS_WRLCK: */
default:
errno = EINVAL;
return -1;
break; /* default: */
};

/* save type of lock held */
rwsp->lockheld[rwsno] = ltype;

errno = 0;
return 0;
}



































Fast Memory Allocation Scheme


Steve Weller


Steven Weller is president of Windsor Systems, specializing in OS-9,
system-level and real-time software, and computer graphics. An electronics
engineer from England, Steve has been in software for nine years and has
particular interest in parallel computer applications, modern computer
languages, operating systems, and the management of technology. He may be
contacted at 2407 Lime Kiln Lane, Louisville, KY 40222 (502) 425-9560.


In applications requiring the dynamic allocation of a large number of small
objects, the overhead associated with general-purpose allocation schemes can
be large: between 20 and 200 percent of the actual stored data. To minimize
this problem I use a layered allocation system in which standard system calls
allocate relatively large blocks of memory to a simpler memory management
subsystem.
All of the smaller objects belonging to a single data structure (e.g., a tree
or linked list) are then "borrowed" (using a low overhead allocation scheme)
from one (or a set) of these layer blocks. Unlike generalized allocation
routines, the "borrowing" system doesn't attach allocation information to any
of the borrowed objects, potentially reducing memory overhead. Moreover,
because the entire data structure is freed as a unit, I avoid the overhead of
attempting to coalesce adjacent, freed objects (except for the underlying
large blocks).


Why Not malloc()?


malloc() and free() are the most commonly used standard C function calls for
memory allocation and deallocation. They are general and easy to use, but
inefficient for small amounts of memory, both in terms of storage overhead and
speed.
malloc() collects the requested amount of memory, allocates it, and returns a
pointer to the allocated area. On my machine malloc() adds an overhead of
eight bytes to every piece of memory allocated. malloc() is also not
particularly fast since it must manipulate the links it maintains between
allocated blocks each time memory is allocated.
free() deallocates the memory block whose address is passed by undoing
malloc()'s links and adding the block to the list of free blocks, merging
adjacent blocks if possible. Using free() to deallocate a large number of
small blocks is very inefficient.


Borrowing Memory


Memory borrowing, as I call it, allows the user to obtain memory in small
pieces, but return it all in one go. A call to iniz_borrow() sets up the
system:
if ((id=iniz_borrow(2000))==0)
error("Can't get memory\n");
iniz_borrow() accepts a number which represents the block size to be allocated
from the system when memory is required, in this case 2000 bytes. The routine
allocates either one block and returns a memory ID, or returns zero indicating
that an error has occurred.
All subsequent allocations and deallocations use the unique memory ID number.
Any number of memory IDs may be created, each with its own allocation size,
but all the memory associated with one ID must be returned at the same time.
Normally each memory ID is associated with a separate large data structure.
Each time memory is needed for a small object within one of these structures,
borrow() is called:
if ((new=borrow(id,size))==0)
error("Can't get memory\n");
borrow() allocates memory from the block defined by the ID and returns a
pointer to it (here assigned to new), or a zero on error. Additional blocks
are acquired from the system if necessary.
Two functions free "borrowed" memory; both return all memory allocated with
one ID. return_borrow() returns all but the first block to the system, leaving
the memory ID valid and reusable. deiniz_borrow() returns all memory to the
system, making the memory ID invalid and unusable.


The Borrow Functions


Listing 1 contains the header information and the initialization routine.
iniz_borrow() allocates a block of memory and places the allocation
information in the MemBlock structure at the start of the block. The routine
returns the block's address as the memory ID. As more blocks of memory are
required, they will be linked to the first.
The allocate() routine shown in Listing 1 can be any allocation routine you
have, probably sbrk() or malloc(). Your routine must, however, return a zero
on failure.
Listing 2 shows the borrow() routine itself. The requested amount of memory is
rounded up to an even number of bytes, keeping the allocated memory addresses
on even byte boundaries. This restriction can be dropped if not required, or
changed to
need=(need+3)&~3
or
need=(need+7)&~7
to ensure that even word or even long word alignment is maintained.
Next the amount of memory requested is compared with the amount remaining in
the current block. If the remaining memory is insufficient, another block is
allocated and linked to the current block. borrow() updates the MemBlock
structure in the first allocated block to identify the newly allocated block
as the current block, and adjusts the offset to allow for a pointer at the
start of the new block. It is not necessary for any block other than the first
to contain the whole MemBlock structure.
The offset mb_offs identifies the amount of memory that has been allocated in
the current block. To satisfy a memory request, the address of the allocated
memory is computed (by adding the current block pointer mb_pres to the
offset), the offset is incremented (by the amount of memory allocated), and
the original memory address returned.
Listing 3 shows the deallocation routines. deiniz_borrow() returns all the
blocks allocated by the system by running down the allocated list, calling
deallocate() as it goes. deallocate() can be any deallocation routine
complementary to the allocation routine used in iniz_borrow(). It must, as
before, return a zero on failure.
return_borrow() is similar to deiniz_borrow() except that it does not return
the first block, and hence keeps the memory ID valid. The MemBlock structure
at the start of the first block is reset to show an empty first block -- the
same state that it was left in by iniz_borrow().


Block Size


Using a large block size results in fewer allocations and deallocations from
system memory, and hence greater speed, but at the expense of greater memory
overhead. If the block size is only a few times greater than the memory being
allocated by borrow(), then large amounts at the end of each block will remain
unused.



Conclusion


This simple memory allocation system takes advantage of the way that many
applications allocate and deallocate memory. It can be tailored to different
data structures by grouping memory allocation for each type of structure under
separate memory IDs, each with a different block size. The simple allocation
mechanism produces a fast and efficient system.

Listing 1
/* Header for memory blocks */
typedef struct MEMBLOCK {
struct MEMBLOCK *mb_next, /* Pointer to next block */
*mb_pres; /* Present block */
int mb_size, /* Size of blocks */
mb_offs; /* Present offset in block */
} MemBlock;

unsigned int iniz_borrow(), deiniz_borrow(), return_borrow();
char *borrow();

/* ------------------------------------------------------- */

/* Initialise memory */

/* Returns the memory ID or zero on error */

unsigned int iniz_borrow(block)
register int block; /* Allocation block size */
{
register MemBlock *p; /* Pointer to block */

/* Get first block */
if((int)(p=(MemBlock *)allocate(block))==0)
return(0);
p->mb_next=NULL; /* No next block */
p->mb_pres=p; /* This is the present block */
p->mb_size=block; /* Record the block size */
p->mb_offs=sizeof(MemBlock); /* Start past this info */

return((unsigned int)p);
}


Listing 2
/* Borrow Memory */

/* Returns a pointer to the allocated memory, or NULL */

char *borrow(id,need)
register MemBlock *id; /* Pointer to first block */
register int need; /* Requested memory size */
{
register MemBlock *p=id->mb_pres; /* Present block pointer */
register int oldoffs; /* Old offset */

/* Round need up to word multiple */
need+=need&1;

/* Deal with more memory required */
if(id->mb_offs+need>id->mb_size) { /* Too large to fit ? */
register MemBlock *q; /* Get another */
if((q=(MemBlock *)allocate(id->mb_size))==0)

return(NULL);
p->mb_next=q; /* Link to new block */
q->mb_next=NULL; /* Mark end of list */
id->mb_pres=q; /* New block is present one */
id->mb_offs=sizeof(MemBlock *); /* Reset offset */
p=q; /* Present block */
}
oldoffs=id->mb_offs; /* Record present offset */
id->mb_offs+=need; /* Move offset */
return((char *)((int)p+oldoffs)); /* Return address of memory */
}


Listing 3
/* Return all memory allocated to this ID */

/* NULL is returned on error */

unsigned int deiniz_borrow(id)
register MemBlock *id;
{
register MemBlock *nextone=id, /* Pointer to next block */
*thisone; /* Pointer to pres block */

while(thisone=nextone) { /* While blocks to return */
nextone=thisone->mb_next; /* Point to next block */
if(deallocate(thisone)==0) /* Return this one */
return(NULL);
}

return(id); /* Return non-zero */
}

/* --------------------------------------------------------- */

/* Return all memory but the first block */

/* NULL is returned on error */

unsigned int return_borrow(id)
register MemBlock *id;
{
register MemBlock *nextone, /* Pointer to next block */
*thisone; /* Pointer to pres block */
/* Return all but first */
if(nextone=id->mb_next) /* If anything to return */
while(thisone=nextone) { /* While blocks to return */
nextone=thisone->mb_next; /* Point to next block */
if(deallocate(thisone)==0) /* Return this one */
return(NULL);
}

/* Reset infomation in the first block */
id->mb_next=NULL; /* No next block */
id->mb_pres=id; /* This is the present one */
id->mb_offs=sizeof(MemBlock); /* Reset offset */

return(id); /* Return non-zero */
}
































































A Survey Of CUG C Compilers


Victor Volkman


Victor R. Volkman received a BS in computer science from Michigan
Technological University in 1986. Mr. Volkman is a frequent contributor to The
C Users Journal and the C Gazette. He is currently employed as Software
Engineer at Cimage Corporation of Ann Arbor, MI. He can be reached at the HAL
9000 BBS, (313) 663-4173, 1200/2400/9600 baud.


Compiler construction is alternately the most rewarding and most frustrating
area of software development. The C Users' Group offers public domain C
compilers with source code for both those who study and those who use
compilers. These packages have been independently developed by programmers who
were often the first to implement the C language on their target machines.
Some of these compilers share the ability to compile their own source to build
new versions of themselves. All of them share their authors' vision of taking
the C language to new frontiers.


A Small History Of The Small C Compiler


Since Ron Cain's introduction of the Small C compiler into the public domain
nearly a decade ago, its implementations have spread like wildfire to nearly
every popular microprocesor. The C User's Group is fortunate to be able to
offer public domain compilers which have been ported to the Z-80, 8080, 6800,
6809, 8086, and 68000 (see Figure 1).
Ron Cain's Small C Compiler v1.0, which debuted in the May 1980 issue of Dr.
Dobb's Journal, was originally a very small subset of the C language. Small C
has been a self-compiler since its first implementation. This means that
performance improvements in code generation and parsing can be immediately
incorporated back into the compiler itself. Small C is a one-pass compiler
which generates assembly language from a C input file. The subset of data
types which the original Small C recognized consisted only of characters,
integers, and one-dimensional arrays of either type. Additionally, the only
control statements were while and if. Small C was also restricted to bitwise
logical (&, ) operators since boolean (&&, ) operators were not supported.
In 1982, James E. Hendrix assumed trusteeship of Small C. Hendrix published
numerous upgrades through Dr. Dobb's Journal culminating in the release of
Small C v2.1 for CP/M in 1984. New features added along the way include code
optimization, data initializing, conditional compiling, extern storage, for,
while, switch/case, and goto statements, and a plethora of operators. To
complete the system, James E. Hendrix and Ernest Payne developed a CP/M
compatible version of the UNIX C standard I/O library. The internal design of
Small C v2.1 was the subject of Hendrix's The Small C Handbook.
The first published 8086 PC-DOS implementation of Small C v2.1 appeared in
1985. Along the way, code optimization techniques were refined even more. The
present incarnation from Hendrix, Small C v2.2, is available for 8086 PC-DOS
only. Small C v2.2 was released simultaneously with Hendrix's definitive
reference work A Small C Compiler: Language, Usage, Theory, and Design in
1988.


CUG C Compilers Based On Small C


Many of the C compilers available from CUG are based on some derivative of the
Cain or Hendrix implementation of Small C. The exceptions to this rule are the
68000 C Compiler (disk #204) which has no lineage with Small C and the DECUS C
Preprocessor (disk #243) which is not a full compiler. Some of the CUG C
compilers based on Cain's Small C v1.1, include many of the enhancements
published in Dr. Dobb's Journal over the years. This puts them approximately
at the level of Hendrix Small C v2.0 discussed earlier. These enhanced Small C
compilers are available as disk CUG104 Z-80/8080 (CP/M 80), CUG163 8086
(PC-DOS), and CUG221 6809 (FLEX OS).
An attribute which most of the CUG C compilers share is a noticeable lack of
external documentation. All disks have less than a dozen pages of
documentation with the exception of Small C w/Floats (CUG156) which includes
30 pages. Fortunately, their common heritage means their implementations
remain similar to the well-documented Cain and Hendrix designs. Specifically,
the Doctor Dobb's Journal issues from 1980 to 1982 (see bibliography) are the
best source for Small C versions before 2.0. Alternately, Hendrix's Small C
Handbook (now out of print) details these early versions. You might need to
check your local university library for these publications. Unfortunately,
Hendrix latest book A Small C Compiler will be less relevant to older versions
due to recent internal code redesigns.
The CUG C compilers based on Small C, regardless of version, also share
certain limitations of language features. In particular, struct, union, long,
float, and double data types are not supported. The exception to this rule is
of course Small C w/Floats (CUG156) which includes a 48-bit non-standard
float. Additionally, arrays are limited to one-dimension and pointer arrays
are specifically prohibited. These compilers also assume that ints and
pointers are equivalent. This means the size of code and data pointers must
also be the same. Small C-based compilers do not allow nested include files
nor parameterized macro substitutions (as used in stdio.h). Also, the full set
of C operators is often not present.
In general, the run-time libraries contain a good assortment of standard I/O,
string, and keyboard-polling functions. Higher-level functions such as
sprintf() are not always present. The libraries have very primitive linear
memory allocation with alloc() and free(). Blocks of allocated memory must be
freed in reverse order of allocation.
The overall ratings were based on my perception of the documentation,
completeness, and usability of the implementation.


CUG104: Small C For Z-80/8080 (CP/M 80)


This implementation of Small C for the Z-80/8080 was done by Mike Bernson of
Ann Arbor, MI. This Small C is not self-compiling and requires a special
assembler and linker which are included only in CP/M 80 executable form. The
compiler was developed with BDS C v1.41.
Mike Bernson has made several improvements to RC Small C v1.1 including most
of the features of JH Small C v2.1 except goto/label and the ternary operator.
The Standard C I/O library is included in both assembly language and object
code format. Only three pages of documentation are provided, consisting of two
pages of grammar and a one page listing of file contents.


CUG132: Small C For 6809 (Radio Shack Color Computer w/OS9)


Small C for the 6809 (Color Computer) was implemented by A.J. Griggs. This
version is close to RC Small C v1.0 since it lacks switch/case, for, and
goto/label statements among other things. This Small C is not self-compiling
and requires BDS C v1.41 or later to compile. This package requires a 6809
assembler and linker which are not included. Small C for 6809 is designed as a
cross-compiler which produces 6809 code while running under a 8080/Z-80
environment. After compilation, you would use the supplied serial-port driver
to download the object code in Motorola S HEX format to the target 6809
machine.
This C compiler cannot be self-compiled because it has hardware dependencies
on the byte order of 16-bit words. Specifically, the 6809 has the low and high
bytes stored in the reverse order of 8080 machines. The compiler assumes a
certain order in some cases and thus cannot compile itself.
This disk includes a serial driver, graphics library, and sample graphics
game. The graphics library supports real-time animation in the player-missle
arcade style. Graphics objects are managed in a list which stores their screen
position and x/y velocity. During animation, the routines automatically flag
collision of objects on the screen. The management of graphic objects is
similar to the use of sprites on Commodore C64 and C128 machines.
Also on this diskette are a total of eight pages of documentation, six on the
6809 port and two on use of the graphics library.


CUG146: Small C For 6800 (FLEX OS)


This implementation of Small C for 6800 (FLEX OS) was completed by Serge
Stepanoff of Livermore, CA. This version is close to RC Small C v1.0 since it
lacks switch/case, for, and goto/label statements among other things. An
additional restriction is that identifiers are limited to six significant
characters. This Small C is not self-compiling and requires BDS C v1.41 or
later to compile.
This package does not include a complete Standard C I/O library. A nonstandard
printf() is used which requires that the number of arguments be passed as the
last parameter.
Small C for 6800 (FLEX OS) does not compile to assembly or machine language,
but rather to a pseudo-code. A small pseudo-code interpreter, less than 2K,
actually executes the user's pseudocode. To run this pseudo-code in a
different environment requires only the rewrite of the interpreter and the
runtime library for the target machine. However, the source code for the
interpreter is not included on the distribution diskette.
The diskette contains 11 pages of documentation, the first five pages are
devoted to how to use the compiler and the remainder to the run-time library.


CUG156: Small C w/Floats (CP/M)



Small C w/Floats (CP/M) was implemented by James R. Van Zandt of Nashua, NH.
This package was originally available as disk #224 from the Sig/M-Amateur
Computer Group of Iselin, New Jersey. This version is close to RC Small C v1.0
since it lacks switch/case, for, and goto/label statements. Additionally, the
following operators are not supported: logical or ( ), logical and (&&),
logical not (!), bitwise-not (~), and the assignment operators (+=, -=, et.
al.).
This disk includes the executable compiler and is self-compiling. The compiler
reads C source and produces Z-80 assembly language. The two major speed
enhancements relative to Ron Cain's original compiler are a hash coded symbol
table and 1K disk buffers. Additionally, the compiler will resolve symbols
uniquely up to the first 16 characters. This disk also includes the ZMAC macro
assembler and ZLINK linker in executable form only.
Small C w/Floats supports the following usage of floating point:
double d; 48 bit floating point
double *d; pointer to double
double d(); function returning double
double d[5]; array of doubles
Storage classes, structures, multidimensional arrays, unions, and more complex
types like double **d are not included.
The layout of doubles does not conform to IEEE standard. These routines will
execute only on a Z-80. They use the alternate registers and some of the
undocumented instructions of that processor.
Small C w/Floats includes a full complement of transcendental functions for
type double (Listing 1).
If the "profile and trace" (-P) option of the compiler is used, each call to
err() results in a walkback trace of function calls. In addition, an execution
profile is displayed on the console at program termination (call to exit()).
The profile consists of a list of the functions and the number of times (up to
999999) each was called. This is sometimes useful for debugging (to spot
functions that are never called), but is most valuable for program execution
time optimization.
With 30 pages of documentation, Small C w/Floats is the best documented of any
compiler available from CUG. The documentation covers compiler usage and
internal, floating point routines, Standard C I/O library, ZMAC macro
assembler, and the ZLINK linker.


CUG163: Small C For 8086 (PC-DOS)


This implementation of Small C for 8086 (PC-DOS) was completed by Daniel R.
Hicks of Rochester, MN. Small C for 8086 (PC-DOS) is distributed on two
diskettes, the first contains the run-time library source and the second
contains the compiler source and executable. This package was originally
available as disk #152 from the Personal Computer Club of Toronto, Canada.
This is a self-compiler, but does require your own assembler and linker. This
port of Small C is based on JH Small C v2.0 so that it does support
switch/case, for, goto/label statements. Hicks standard C I/O library provides
very good compatibility with its UNIX counterpart.
Hicks implementation imposes the following additional restrictions: lower-case
and upper-case symbols are synonymous, both local declarations within a block
and goto statements may not be used simultaneously, and the sizeof() operator
is not supported.
Parameters are pushed in order of occurrence: The first parameter in a list is
the first one pushed and therefore the deepest one in the stack. This is
opposite the order of many C compilers, and it prevents some C library
functions (such as printf) from being able to determine the parameter count by
examining just the first or second parameter. For this reason, the compiler,
prior to a CALL, loads register DL with the parameter count, thus allowing
functions such as printf to be implemented.
Included on the diskette are nine pages of detailed documentation on the
capabilities and limitations of the compiler.


CUG170: Miscellany V (Caprock C, version N for IBM-PC)


Caprock Small C for 8086 (PC-DOS) was implemented by Caprock Systems, Inc. of
Arlington, TX. This disk was originally available as disk #315 from PC
Software Interest Group (PC-SIG) of Sunnyvale, CA. This compiler is supplied
in source form only, an executable version is not included. Additionally, the
standard C I/O library is missing from this distribution. This version is
close to RC Small C v1.0 since it lacks switch/case, for, and goto/label
statements.
When compiled under Microsoft C 5.1, this file produced four errors and 53
warnings. All of these problems were the result of the assumption that
integers are interchangeable with pointers.
No documentation is included with this compiler.
True to its name, the Miscellany V disk offers over 20 files of C functions.
Some of the other offerings on this disk include Life and Towers of Hanoi
games, a binary to Intel HEX format converter, and several keyboard utilities.


CUG204: 68000 C Compiler (UNIX System V)


The 68000 C Compiler (PC-DOS) was completed by Matthew Brandt of Norcross, GA.
This compiler is intended as an instructive tool for personal use. Any use for
profit without the written consent of the author is prohibited. As stated
earlier, this is the only C compiler offered by CUG which is not derived from
RC or JH Small C. This is an optimizing C compiler which generates assembly
language for the Motorola 68000 processor. This system also requires a 68000
assembler and linker which the user must supply. It has successfully compiled
itself on UNIX System V running on a Motorola VME-10. Since this code was
written for a machine with long integers it may exhibit some irregularity when
dealing with long integers on the IBM-PC.
This compiler vies with Small C w/Floats (CUG #156) for the best
implementation of C. Although the 68000 C Compiler does not support floats, it
does have features not found in any other CUG C compiler: longs, structures,
unions, complex types (e.g. char **argv), enumerated types, and functions
which return pointers to structures.
The disk includes one page of documentation outlining the limitations of the
compiler. Brandt offers the following warning: "The author makes no
guarantees. This is not meant as a serious development tool although it could,
with little work, be made into one." The preprocessor does not support
parameterized macro substitutions, only #include and #define macros are
supported. Brandt advises that function arguments declared as char may not
work properly and should be changed to int. When the compiler encounters a
syntax error, an error number is printed but no descriptive text is provided.
Lastly, the size of functions is slightly limited due to the fact that the
entire function is parsed before any code is generated.
The compiler can be compiled by Microsoft C v3.0 or higher. MSC will issue
many warnings but they can be ignored. The file MAKE.BAT may be used to
rebuild the compiler.


CUG221: 6809 C Compiler (FLEX OS)


This implementation of Small C for 6809 (FLEX OS) was completed by Dieter H.
Flunkert. The author has made several improvements to RC Small C v1.1 plus
most of the features of JH Small C v2.1 except goto/label. Small C for 6809
(FLEX OS) has all other C control statements including switch/case, do/while,
and for. Additionally, all C operators are supported including the elusive
comma (,), ternary (?), and assignment operators (+=, -=, et. al). However,
like most other Small C implementations, the data types for float, double,
long, structures, and unions are not present.
An executable version of the compiler is not provided on the diskette. This
system requires the TSC relocatable assembler, library generator and linking
loader which the user must supply. The standard C I/O library is included in
both C source and assembly language formats. The compiler has seven pages of
documentation detailing the grammar and preprocessor commands.
When compiled under Microsoft C v5.1, it was revealed that many of the
#include directives did not have quoted filenames (e.g. #include stdio.h).
Once again, many warnings appeared from the use of integers as pointers.
Proper compilation required adding #define VMS to every module.


CUG243: DECUS C Preprocessor (PC-DOS)


The DECUS C Preprocessor (CPP) was originally implemented by Martin Minnow.
CPP was subsequently ported to PC-DOS by Ted Lemon and Jym Dryer. CPP reads a
C source file, expands macros and include files, and writes an input file for
the C compiler. If no file arguments are given, it reads from stdin and writes
to stdout. If one filename is given, it will be the input file. If a second
filename is given, it will be the output file. The full command line format
is:
cpp [-options] [infile [outfile]]
The DECUS C Preprocessor has been updated to meet the specifications of the
Draft ANSI C Standard. However, this C preprocessor is not designed to handle
floating point expressions. An experimental floating point source file is
provided for those who wish to experiment with it.
The following options are supported. Options may be given in either case.

-I directory
Add this directory to the list of directories searched for
#include "..."
and
#include ...
commands. Note that there is no space between the -I and the directory string.
More than one -I command is permitted. On non-UNIX systems -I directory is
forced to upper case.
-D name=value
Define the name as if the programmer wrote #define<name><value> at the start
of the first file. If is not given, a value of 1 will be used. On non-UNIX
systems, all alphabetic text will be forced to upper case.
-U name
Undefine the name as if #undef name were given. On non-UNIX systems, name will
be forced to upper case.
-X number
Enable debugging code. If no value is given, a value of 1 will be used. (For
maintenance of CPP only.)
The preprocessor will look for an environment variable INCLUDE if include
files cannot be found in the -I directories. Unfortunately, only a single
search directory can be specified in the INCLUDE path (e.g. SET
INCLUDE=\MSC\INCLUDE;\MY\SRC will fail).
CPP has been successfully built with Lattice C v2.00 and Microsoft C v3.00.
The distribution disk contains four pages of documentation detailing how to
prepare CPP under several different memory models.
Bibliography
Cain, Ron. "A Small C Compiler for the 8080s." Dr. Dobb's Journal, April-May
1980, pp. 5-19.
Cain, Ron. "A Runtime Library for the Small C Compiler." Dr. Dobb's Journal,
September 1980, pp. 4-15.
Hendrix, J. E. "Small-C Expression Analyzer." Dr. Dobb's Journal, December
1981, pp. 40-43.
Hendrix, J. E. "Small-C Compiler, v.2." Dr. Dobb's Journal, December 1982, pp.
15-63. and January 1983, pp. 48-64.
Hendrix, J. E. and Payne, L. E. "A New Library for Small_C." Dr. Dobb's
Journal, May 1984, pp. 50-81, and June 1984, pp. 56-69.
Hendrix, J. E. "Small-C Update." Dr. Dobb's Journal, August 1985, pp.84-91.
Hendrix, J. E. The Small C Handbook. Redwood City, CA: M&T Publishing Inc.,
1984.
Hendrix, J. E. A Small-C Compiler: Language, Usage, Theory, and Design.
Redwood City, CA: M&T Publishing Inc., 1988.
Volkman, Victor R. "Revised Handbook Details Small C Innards," The C Users
Journal, February 1989, pp. 9-10.
Ward, Robert and Donna, Ed., The C Users' Group Library, McPherson, KS: R&D
Publications, Inc., 1986.
Figure 1 Summary of CUG C Compilers

 Target Implementation
 CUG Target Operating Based on Port Date of Last Overall
Disk # CPU System From Revision Rating
------------------------------------------------------------------------
 104 Z-80/8080 CP/M 80 v2.2 RC Small C v1.1 06/28/1981 ***
 132 6809 0S-9 RC Small C v1.1 10/18/1983 **
 146 6800 FLEX v2.1 RC Small C v1.1 09/09/1982 **
 156 Z-80 CP/M RC Small C v1.2 08/02/1984 ****
 163 8086 PC-DOS 1.1 JH Small C v2.0 01/14/1984 ***
 170 8086 PC-DOS 1.0 RC Small C v1.0 06/01/1982 *
 204 68000 Unix V N/A 01/01/1986 ****
 221 6809 FLEX RC Small C v1.0 11/15/1986 ***
 243 8086 PC-DOS 2.0 DECUS 12/01/1985 N/A

The overall ratings were based on my perception of the documentation,
completeness, and usability of the implementation.

Listing 1
atan(), /* arc tangent */
sin(), /* sine */
atan2(), /* atan2(a,b) = arctan of a/b */
sinh(), /* hyperbolic sine */
cos(), /* cosine */
sqrt(), /* square root */
cosh(), /* hyperbolic cosine */
tan(), /* tangent */
exp(), /* exponential */
tanh(); /* hyperbolic tangent */
log(), /* natural logarithm */
pow(), /* pow(x,y) = x**y */
log10(), /* log base 10 */


float(x); double x; /* integer to floating point
conversion */
fmod(x,y); double x,y; /* mod(x,y) /
if 0 < y
then 0 <= mod(x,y) < y and
x = n*y + mod(x,y)
for some integer n */
fabs(x); double x; /* absolute value */
floor(x); double x; /* largest integer not greater
than */
ceil(x); double x; /* smallest integer not less than */
rand(); /* random number in range 0...1 */


















































Standard C


Wha Gang Agley




P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standards committee.


Nothing is perfect. A document produced by a committee is certainly no
exception. It is hardly surprising, therefore, that people have found much to
criticize in the ANSI standard for C.
Most of the imperfections can be chalked up to political compromise. Some are
existing practices that are too deeply entrenched to change, no matter how
strong the current consensus against them. A few are simply things that the
standards committee arguably got wrong and didn't fix. A few more are
important additions that somehow never garnered enough concerted support to
make it in.
Preprocessing, for example, was in the worst shape of any part of the C
language. The committee did rather a good job of tidying up several messes in
this area. Just defining the preprocessing phases more precisely was a major
contribution. Still, there were a few botches and omissions.
I have been one of the strongest defenders of the ANSI C standard produced by
committee X3J11. As an active participant, I saw the need for compromise and
the need to retain backward compatibility even when it hurt. I also know
intimately how much work went into producing the standard. If a few areas
couldn't get cleaned up in time, so be it. The ANSI C standard is still one of
the best language standards I have ever encountered.
Nevertheless, I am not blind to the shortcomings of the document we produced.
We missed a number of opportunities to make the language better in small ways.
We committed the sin of inconsistency more times than I care to admit. We left
out all sorts of clever improvements to the C language. I have my own list of
gripes about the C standard.
I figured that it was time for a change of pace in these pages. After a couple
of years of explaining and defending the C standard, I plan to take a few
potshots at it. What follows is a weakly ordered collection of observations.
Each describes some way in which I feel the standard could have been better.
For now, I confine my remarks to the language proper. I plan to devote
considerable attention to the Standard C library in the months to come.


What Didn't Get Cleaned Up


We missed several opportunities to tidy up the language proper. Here are a few
of them.
Historical usage prevented us from making floating literals type float by
default. It makes more sense to add a prefix to get type double. Sadly, you
must add an F to get the former, since C has traditionally considered floating
literals to have type double.
Similarly, the committee had to back off from making string literals type
array of const char. Too many existing programs have code such as
char *p = "abc";
which would require a cast to avoid a diagnostic. So string literals have the
curious property of being semantically const (for a portable program) without
having the type that goes with the semantics.
The French standards committee, AFNOR, wanted to put the null pointer constant
NULL into the language. So did a few other people. It has the same slippery
semantics that nul enjoys in Pascal, but without the same full language
support. As a consequence, different implementations must define it as a macro
in different ways. That invites its misuse, which in turn makes it harder to
write portable programs.
Several people proposed various schemes for making enumerations more strongly
typed. Most were too scary to adopt. The rest failed to garner enough support
even for extended debate. What we ended up with is somewhat better than using
preprocessor macros to name constants, but not much.
Each enumeration you write becomes a synonym for one of the integer types that
promotes to type int. (An implementation can tailor the storage it uses to
represent an enumeration.) As far as type checking goes, however, an
enumeration constant or data object simply has an integer type. You can mix
apples and oranges.
We talked more about making bitfields better, but in the end we didn't do
much. What you want, at the very least, is the ability to declare the size of
"storage unit" that you are carving up into bitfields. You want eight
different base types, the signed and unsigned flavors of char, short, int, and
long.
The standard provides only three base types, plain int, signed int, and
unsigned int. The plain flavor has special meaning in this context (and only
in this context). It lets the implementation define whether the component
bitfields have values that are signed or unsigned. That wart was added to be
nice to existing implementations, not to make bitfields any more usable.
We talked at great length about value preserving versus unsigned preserving
arithmetic. (It is more fair to say that we fought tooth and nail.)
Nevertheless, none of us tried to fix a closely related problem, the surprises
that abound when you mix signed int and unsigned int operands.
C traditionally calls for the signed operand to be converted to unsigned,
which is the type of the result. To get a sensible value in many cases,
however, you should convert both to a slightly larger signed type. We
shuddered to think what changing this rule might do to existing programs, so
we left the problem alone. I wish we could have fixed it.
When I wrote my first C compiler many years ago, the first thing I found
myself hating was the unrestricted goto statement. You can write a goto that
transfers control into a block from somewhere outside. You can even jump to
the statements controlled by if, else , while, and other flow-of-control
keywords. What that does to code optimization is beyond belief. Either you
despair of doing many optimizations or you write a much larger translator.
We discussed restricting goto statements on several occasions. What prompted
us to leave them alone was the protests of an important constituency. More and
more people write applications that generate C code to be compiled, as a sort
of universal assembly language. A number of existing applications depend on
the ability to write ugly goto statements that no human being need ever see.
Were we to tidy up the semantics of control flow, we would require serious
restructuring of these applications. With no little sadness, we left the goto
alone.
There was one area that even our extensive cleaning could not rescue
completely. It was simply too dirty. I refer to the whole business of
declaring and naming external variables. The problem is that C must work with
many existing assemblers and linkers built to ancient specifications. That
severely limits the length of external names. The committee had no serious
problem increasing internal names to 31 significant characters. But we balked
at requiring more than the worst-case six characters (and single case of
letters) required by the stupidest of existing linkers. Despite heated debate,
the majority did not want to add to the difficulty of linking C with other
languages.
Another aspect of this problem affects how you write multiple declarations for
the same external variable. C programmers need reliable methods for ensuring
that each variable has a definition, and that none has a multiple definition.
Linkers vary all over the map in the kind of machinery they provide. As a
consequence, C developed several dialects in this area. I believe the
committee did an admirable job of embracing all these dialects and
accommodating the varied linker technologies. It's too bad, however, that we
couldn't just throw it all away and do it over properly.


What Went In Wrong


In some cases, what we added to the language proper wasn't exactly right.
We botched things a bit when we introduced preprocessing numbers. These are
tokens that subsume all valid numeric C tokens. We defined them to clarify
what intermediate forms can occur during preprocessing while you endeavor to
paste together valid numeric C tokens. The only problem is, 0X12E+3 now looks
like a single preprocessing number (which becomes an invalid numeric C token).
In the past, most translators knew to parse it as a hexadecimal literal, a
plus operator, and a decimal literal. We must now learn to be wary of
hexadecimal literals that end in E.
The include directive had to compromise between two rather different
implementation styles. One approach is to parse just enough of each C source
line during preprocessing to decide what to do with the rest. In this case,
angle brackets and double quotes parse as special delimiters within the
include directive. The other approach is to parse every line into
preprocessing tokens, then decide what to do. That makes it very exciting to
parse directives such as
#include </*.h>
If you see that you are building an include directive soon enough, you know to
ignore anything funny before the closing angle bracket. If you first tokenize
and then look, you may decide that the /* signals the start of a comment.
The committee endeavored to describe preprocessing in such a way that either
approach is acceptable. Sadly, the words were reworked several times by
editors with conflicting views. I can't honestly report that the
pre-tokenizers were well treated in the end. You can still pre-tokenize each
line when parsing C, but you have to indulge in a few heroic measures to
rescue include directives.
Another example also has to do with how you write declarations, but you can't
blame any problems on existing linkers. The difficulties are purely internal
to C.
I refer to the outrageous overloading of the storage class keywords. What you
mean by static or extern (or by writing no storage class at all) can have
three different meanings, depending upon where you write the declaration. And
if another declaration for the same name is in scope, each of these meanings
can change again. C has always been messy in this regard, but the committee
made it even messier with one or two arbitrary decisions.
I have tried to tabulate the semantics of storage class keywords several
different ways. (See, for example, "What's in a Name?" CUJ February 1988, and
Standard C by P.J. Plauger and Jim Brodie, Microsoft Press, Redmond WA, 1989.)
None of the presentations have a compelling logic, because the underlying
machinery is not entirely logical. It could have been made much cleaner.
Another thing we got wrong was allowing the sizeof operator to accept an
rvalue operand. I suspect most people who voted for the extension assumed you
could make useful tests with it. For instance, you might think that sizeof
(x+y) would tell you whether two floats are added in double precision on a
particular implementation. Not so. The type of the expression is float even if
the intermediate representation happens to be double.
The extension was worse than useless, however, because it caused trouble.
People started asking all sorts of embarrasing questions about the types of
various rvalues. And the committee started deciding answers all sorts of
different ways. We now have the situation that sizeof 'a' can be larger than
sizeof <'a' even though sizeof (char) is less than sizeof (wchar_t). Yuk.

There is only one other thing in the C language proper that I think we got
really wrong -- the semantics of pointers to constant data objects. What I
wanted was a fairly serious promise. The data object pointed to by any pointer
to const type should be truly constant, at least for a while. ("A while"
should be from the time execution enters the function containing a reference
using the pointer until the function returns.)
What this restriction provides is much of the semantics you need to safely
parallelize C code automatically. What it evidently costs you is additional
subtle compatibility problems with C++. At least that was the strongest
argument I heard against the stronger semantics.
So we settled for a fairly wimpy position. All that a pointer to const assures
you is that you can't alter the value stored in a data object by using that
particular pointer. You can't optimize much, however, because some other
agency might be changing the stored value.
I backed the addition of the notorious noalias type qualifier in large part
because of the differences over pointers to const. I identified five or six
desirable sets of semantics for accessing data objects. Three type qualifiers
gives you eight possibilities. When noalias got shot down, we had to settle
for only four. They weren't the four I wanted.


What Didn't Get In


Lots of things didn't get into the language proper. Here are a few whose loss
I lament.
Our failure to solve the non-ASCII character set problem still haunts us at
the international level. We need alternate spellings of the operators and
punctuators that use the more esoteric ASCII characters, since these are often
recycled in ISO 646 or even absent in EBCDIC. Trigraphs such as ??< just don't
cut it for readability. Sadly, the committee could never agree on a particular
set of more readable operators.
All sorts of clever additions were suggested to make macros more powerful.
Most I cheerfully helped beat down, but two failed suggestions I miss. One is
for some form of conditional macro, such as
#define ptc(f,c)
eq(f,stdin,putchar(c),putc(f,c))
If the first two arguments to eq match (after expansion) then the third is
retained, otherwise the fourth. With recursion, you can write wondrous macro
definitions.
The other thing I miss is some way to create character literals. You can now
create a string literal from argument X by writing #X within a macro
definition. It would be nice if you could create a character literal by some
similar mechanism. Since the next obvious operator ## is already defined,
however, that suggests a rather odious ### which few people could swallow.
Dave Prosser suggested a rather nice notation, but not until well after the
committee (and several implementations) got settled with the current one.
A typeof operator would also help make more powerful macros. It would let you
declare temporary data objects having the same type as one of the arguments to
a macro. You could then write a generic "swap" macro, as in:
#define swap(x, y)\
{ typeof (x) t;\
t = (x);\
(x) = (y);\
(y) = t; }
Of course, swap can only take the place of a statement. It cannot yield a
value. That's what you need to write a safe macro for, say, the maximum value
of two arguments. Otherwise, it is hard to avoid evaluating an argument
expression twice, side effects and all. To get temporaries inside a
subexpression, you need some way to delimit a local scope. Several schemes
were proposed, none were adopted.
A similar but somewhat different need is the ability to construct a structure
on the fly. More than one existing implementation lets you write something
like (struct complex){cos(th), sin(th)} within an expression. C is certainly a
more attractive language, at least to some constituencies, with such
expressive capabilities.
The last thing I really miss is some form of repetition counts within data
initializers. The Whitesmiths C complier let you write things like:
char pattern[1000] = {
[100] '.',
[800] 'X',
[100] '.'};
which is much easier to type, and maintain, than spelling out all the data.
Beyond this point, my wish list dribbles off with items I find less important.
Many of my customers loved the case ranges we added to Whitesmiths' C. Unnamed
unions within structures can eliminate the need for dummy member names.
Arbitrary rvalues in initializers for auto arrays and structures can have
their uses. All of these features I can take or leave, however.
I would like to have seen arrays become first class objects in Standard C.
Array assignment and functions returning arrays have always been expressible,
despite what many people think. The advent of function prototypes gave us a
way to pass functions as arguments. Nevertheless, the confusion surrounding
arrays as lvalues in C is so widespread that even I must acknowledge the
dangers. I remain a minority of one in this area, I fear, in being willing to
face those dangers and fix array handling in Standard C.


Conclusion


Having said all this, I now feel moved to make a few disclaimers. First, I
acknowledge that everyone has a list of grievances about the current C
standard. I don't presume that my list is more important or (much) more wisely
considered than all others. It just happens to be my list, and this is my
soapbox.
Second, I do not feel ill used that my list of grievances is so long. I got
plenty of opportunity to mouth off during the committee meetings. (Many
witnesses can attest that I got more than my share of opportunities.) I felt
well heard and was pleased to see any number of issues go the way I hoped.
Last and most important, I don't even want most of these grievances satisfied.
(I argued against fixing many of them when they were debated.) I respect the
need to satisfy diverse constituencies. If I got my way on many of these
issues, I would feel duty bound to accept the strong desires of others in
similar areas. I far prefer a compromise language with widespread support to
one that meets my needs but alienates many others.
Even if I were the sole arbiter, I still would not make many of the changes I
outlined here. Why? Because the language would be too different from the C we
know and love. And it would get that much bigger for a questionable increase
in value.
Standard C is essentially twice as big as the C described by Kernighan and
Ritchie. Admittedly, complexity is hard to quantify, but I arrive at that
number through three telling metrics. The size of the Whitesmiths C compiler
doubled in lines of source by the time we achieved full compliance with
Standard C. It also doubled in bytes of executable code. And the size of the
reference manual that went with it doubled in pages. I believe Standard C is
still intellectually manageable, but is beginning to strain the bounds of a
"small" language.
Think how big the language would have gotten had committee X3J11 tried to
please everyone. Or even just me.
Standard Finalized
The ANSI C standard has been adopted! The ANSI Board of Standards Review (BSR)
voted unanimous approval at their December meeting of the draft developed by
committee X3J11 and approved by X3.
BSR was meticulous in informing the complainant who had delayed progress of
the standard for the past year. He was given a generous period of time to file
a further protest with BSR. The time period expired, however, with no protests
filed.
The official designation if the new C standard is ANSI X3.159-1989. It came in
just under the wire, but it did earn a 198X designation.
ISO Update
The C standard commenced its six-month balloting period as a "draft
international standard" (DIS) in December 1989. That is normally the final
approval process before SC22 sends the draft on for mechanical review and
adoption by ISO. It is widely understood, however, that both the United
Kingdom and Denmark are determined to make changes in the C standard at the
ISO level.
A meeting of the ISO C committee WG14 will be held in London in late May or
early June 1990 to commence work on two "normative addenda." These were
approved by the parent committee, SC22, at a recent meeting. One addendum is
an attempt by the British to make the language of the standard more precise.
The other is expected to add machinery for writing C source files more
readably in European character sets.
Once these normative addenda are developed and approved by WG14, they must
follow the same approval path through ISO as the standard developed by X3J11.
It remains to be seen whether the DIS will be held up pending approval of the
addenda. It also remains to be seen how much support exists within ISO for
amending the ANSI C standard.








































































Dr. C's Pointers(R)


Error Handling In C




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091
or via UUCP at uunet!aussie!rex.


Handling errors in programs is easy. You just don't make any! Well, it's not
quite that simple since every now and then your programs must deal with input
provided by a human, and humans make mistakes. (Who was it that said
"Computing would be real fun if it wasn't for users."?)
Certainty, it is possible to validate data before attempting an operation but
it's also common to assume that the routine receives valid data, and design
the routine to recover if faulty data causes a process to fail. That is, don't
pay the price of validation every time, only when invalid data is detected.
However, this approach can break down, particularly if it is impossible,
difficult, or expensive to recover from certain errors. And the earlier you
trap bad data, the more information you will have about its origins and what
to do next.


Approaches To Error Handling


Unlike other mainstream languages, most of the things that can fail in a C
program are library functions. Since C has no I/O statements, there are no
equivalents to END= and ERR= in FORTRAN's READ and WRITE statements. There is
also no equivalent to BASIC's ON ERROR GOTO. About the only kind of errors
that can be generated in the C language itself are things like arithmetic
over- and underflow, memory access violations (either by attempting to
dereference a pointer not pointing to an object or function or by a pointer
cast to an unaligned type), and stack overflow. All of these are design issues
and will not be discussed here.
Since much of the "real" work in C is done via functions, any error
information must be communicated between the function detecting the error and
that function's caller. This is typically done either by returning an error
indicator value or by initializing an error variable passed in by address, or
by a combination of both. For example:
status1 = f(arg);
if (status1 != 0)
/* handle error */
Here, the function returns zero on success and a specific error value on
failure. In the next case:
status2 = g(arg, &errorcode);
if (status2 == ERROR)
/* handle error */
the function reserves one return value only to indicate an error. The variable
errorcode (passed in by address) contains the actual reason if ERROR is
returned.
Unfortunately, none of C's standard library functions uses either of these.
(Well certainly not the second approach anyway. You could argue that malloc
and friends use the first approach since the only "real" reason they fail is
not that enough memory is available, regardless of what they were attempting
to do.)
C has its own approach; inter-function error communication is done via a
global variable, an approach that most structured programmers are strongly
warned against for a number of very good reasons. However, that's the way it
is so I won't philosophize about it here.


errno To The Rescue


Of course, the global keeper of the error number is our dear friend errno.
Historically, errno has been a global int in every program we've written
whether we have used it or not. It's really been like a reserved word in the
namespace of external identifiers. And since one of ANSI's jobs is to
consolidate existing practice, errno survived the ANSI C standardization
process pretty much intact.
To help get you into the spirit of things, here's an example of using errno
(Listing 1).
It is the programmer's responsibility to clear errno (a zero value means "no
error") each time before calling a function that may set it. No library
function is required to clear errno explicitly. You must also test errno or
store its value for later testing, immediately after the library function in
question returns. If you do not, any other library routine (or user-written
routine for that matter) might overwrite errno in the meantime. That is, just
because a library function is not documented as setting errno, doesn't mean
that it doesn't use it for a scratch variable. Messy, but that's the case.
In the example above, the first occurrence of errno = 0 is unnecessary since
at program startup errno is supposed to be cleared.


ANSI C And errno


The proposed ANSI C Standard pins down a number of things regarding errno. The
header errno.h was invented as a home for the definition of errno itself and
various macros of the form E* that relate to reporting error conditions. errno
is allowed to be either a global int or macro that expands to a modifiable
lvalue having type int. That is, it could be a macro that expands to something
to like *_ _errno().
Only two error value macros are defined by ANSI C: EDOM for domain errors and
ERANGE for range errors. However, an implementer is permitted to provide their
own E* value macros in this header.
The library functions that are documented as setting errno are: acos, asin,
cosh, exp, fgetpos, fsetpos, ftell, ldexp, log, log10, perror, pow, signal,
sinh, strtod, strtol, and strtoul. Note that fopen (and most other I/O
functions) are not included. As such, you cannot portably recover from a file
open failure (which is not surprising since there can be many system-specific
reasons for such an error).
The library functions perror and strerror can be used to produce formatted
messages corresponding to errno's value. However, the commonly implemented
table of messages, sys_list, and its associated machinery are not part of
Standard C.


An Error Handling Envelope


Rather than explicitly clear and test errno all the time, it is much more
elegant to have an error handling interface inserted between your code and
that in the library. Unfortunately, the standard library uses two different
ways to return an error a negative int value or a NULL pointer value. You may
have to have two interfaces, one to handle each.

Calling an extra function for each math library operation, for example, is an
added cost but so too is including the explicit error checking in each place.
It's the old speed versus code size tradeoff.
Listing 2 uses the setjmp/longjmp library mechanism to implement recovery from
attempts to take the square root of a negative number.
One problem here is the need to explicitly pass the setjmp context into mysqrt
-- it doesn't really look like a call to sqrt. You could hide this behind a
macro:
#define sqrt(d) sqrt((d), context)
but you would still need to define context yourself. Since ANSI C permits a
macro to expand to its own name without recursive death, all existing calls to
sqrt could be redirected in this manner with intermediate error checking being
added at the cost of recompilation in the presence of this macro. Perhaps a
cleaner approach is to make context a global so it never need be passed in. A
word of caution about redefining sqrt though. ANSI C effectively reserves the
names of all standard library functions and if you invent something of your
own with the same name, the behavior is undefined. However, for a given
implementation the macro approach may work.


The matherr Concept


Many systems provide a cleaner way to trap (and also recover from) certain
kinds of library errors. The idea originated with UNIX systems but has been
widely emulated. It involves a function called matherr. Each library routine
that can detect certain errors calls another library routine, matherr. Now
this default version of matherr may do nothing or it may simply write an error
message to stderr. By writing your own version of matherr and linking to it
instead of the library version, you can take control when one of the trapable
errors occurs. Listing 3 shows a primitive version of matherr. In reality you
would probably try to recover from the error.
When Listing 3 is linked with the first example above, the following output is
produced.
#1 OK
Function sqrt failed with error type DOMAIN
#2 OK
The reason the second call to sqrt does not show errno set is that matherr
returned a non-zero value, indicating that the normal reporting of the error
condition should be bypassed (presumably because the error has been "fixed" in
the userwritten matherr). With matherr you can bypass or follow the default
error handling rules and to a certain extent you can recover from errors and
substitute a value that should be returned by the math function instead.
The exception structure has several other members too and the type member
values are usually macros or enumeration constants defined in math.h along
with the structure template. Check your library manual for more details.
Note that matherr is not included in ANSI C.


Numerical C Extensions Group


This group (abbreviated as NCEG) was formed by me early in 1989. Its purpose
is to publish a technical report on directions for adding extensions to
Standard C, to support such things as complex arithmetic, IEEE floating-point,
vector and parallel operations, and variable dimensioned arrays.
The IEEE floating-point standards deal with a number of interesting things
(such as +/-infinity and not-a-number (NaN)) that need to be supported (and
taken advantage of) in modern C compilers. According to leading IEEE numerical
C implementers, errno gets in their way. Likewise for vendors of C compilers
doing parallel operations. As such, errno might well have to be ignored in
some implementations, simply for the sake of functionality and/or performance.
As I write this (mid-December 1989), the ANSI C Standards Committee X3J11 is
receiving a letter ballot asking members to admit NCEG as a full working group
(tentatively called X3J11.1) within ANSI C. The results of this ballot were 22
for and one against, and will be forwarded to SPARC for their consideration.
Contact me for further information on NCEG.

Listing 1
#include <stdio.h>
#include <errno.h>
#include <math.h>

main()
{
double d;

errno = 0;
d = sqrt(10);
if (errno == EDOM)
printf("#1 domain error\n");
else
printf("#1 OK\n");

errno = 0;
d = sqrt(-10);
if (errno == EDOM)
printf("#2 domain error\n");
else
printf("#2 OK\n");

}

#1 OK
#2 domain error


Listing 2
#include <stdio.h>

#include <setjmp.h>

main()
{

double value, result;
jmp_buf context;
double mysqrt(double value, jmp_buf context);

while (1) {
if (setjmp(context) != 0)
printf("Value is out of the domain for sqrt\n");

printf("Enter fp value: ");
scanf("#lf", &value);
if (value == -1.0)
return;
result = mysqrt(value, context);
printf("sqrt(%f) = %f\n", value, result);
}
}

#include <errno.h>
#include <math.h>

double mysqrt(double value, jmp_buf context)
{
double d;

errno = 0;
d = sqrt(value);
if (errno == EDOM)
longjmp(context, 1);
else
return (d);
}

Enter fp value: 1.234
sqrt(1.234000) = 1.110856
Enter fp value: 12345
sqrt(12345.000000) = 111.108056
Enter fp value: -0.000000
sqrt(-0.000000) = -0.000000
Enter fp value: -0.0000001
Value is out of the domain for sqrt
Enter fp value: -5
Value is out of the domain for sqrt


Listing 3
#include <math.h> /* get struct */

int matherr(struct exception *pe)
{
int retval = 1; /* assume we'll recover */

printf("Function %s failed with error type ", pe->name);
if (pe->type == DOMAIN)
printf("DOMAIN\n");

else if (pe->type == SING)
printf("SING\n");
else if (pe->type == OVERFLOW)
printf("OVERFLOW\n");
else if (pe->type == UNDERFLOW)
printf("UNDERFLOW\n");
else if (pe->type == TLOSS)
printf("TLOSS\n");
else {
printf("UNKNOWN\n");
retval = 0; /* can't handle here */
}

return retval;
}
















































Questions & Answers


More On Passing Arrays And Precedence Rules




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


You may fax questions for Ken to (919) 493-4390. When you hear the answering
message, press the * button on your telephone. Ken also receives email at
kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP).




Announcing The Great Name/Obscure Code Contest


Based on a reader's response later on in this column, it appears reasonable to
launch a new contest.
Send examples of the worst names or abbreviations that you have seen in other
people's programs (or even your own). Include both the name and a description
of what it is supposed to represent. The best (or worst) examples will be
published here, with credit for your submission. The name of the programmer
who actually wrote the code in which the name is used will not be mentioned
without his/her express permission.
Q
When we met at the Triangle C Users' Group meeting, you invited questions at
any difficulty level on 'C'. So, here is one:
I try to pass a char array to a function. In the function, I use sizeof to get
the array's allocated size. It doesn't work. The function is returning the
sizeof passed_array as if it were a pointer, two bytes long. It must be a
pointer with the location of the beginning of real_array in it. Right?
I asked our instructor about this, and he said the code in Listing 1 would
work. At least, that's how I understood what he said. It produces the results
shown below it when compiled with Power C from MIX or Instant C 3.0 from
Rational Systems, and fails with these compiler messages under Turbo-C 1.5:
(marker between char and passed_array[]; in
getarray()'s formal parameter list)
" Error 10:Type mismatch in redeclaration of
'getarray' "
(then, at the end of function getarray(), it complains:)
" Warning 13:Parameter
'passed_array' is never used in
function "
How can I get an array's allocated size within a function it has been passed
to?
What is bothering Turbo-C, and why don't the other compilers complain
similarly?
If an array's name is really a pointer to the array's beginning, why doesn't
sizeof(real_array) also return a 2 when called in main()? (Not that I WANT it
to ... :-)
Glenn Jordan
RTP, NC
A
The declaration of an array in a function as a local variable actually sets
aside storage for the array. In this sense the sizeof(real_array) is 20,
because that is how much storage is set aside for it. The name of an array
(one declared as local variable) (or a static/external) represents the address
when passed to a function,
When you declare a parameter to be an array (e.g. passed_array), you are not
really declaring an array at all. You are really declaring that the parameter
is a pointer. You pass an address in the call (i.e. real_array), and the
function receives that address in passed_array.
The sizeof passed_array is the size of a pointer (two or four bytes, depending
on the memory model). Alternatively, you could have declared it as int
*passed_array;. For parameter declarations, both int *passed_array; and int
passed_array[]; are equivalent. The compiler interprets both as meaning that
the parameter is a pointer. Your instructor may not have mentioned that you
can reference an individual int with a pointer by using either:
passed_array[i]
or
*(passed_array + i)
The compiler treats both declarations identically.
Instead of using sizeof, you could pass both a pointer to the array and its
size in either bytes or in elements. Usually the element count is more useful
than the byte count:
function(real_array, 20);
.....

function(passed_array, size)
char passed_array[];
int size;
{

for (i = 0; i < size; i++)
{
......
You can avoid passing the size by designating a unique value for the end of
the valid elements in the array just as strings (character arrays) are
terminated with the NUL (zero or all bits off) character. Remember, the
terminator must be some unique value that will never appear as a valid value
for the type you are manipulating.
Q
My instructor says that (ch)++ evaluates as increment by one type-size-length
the value found in address ch. No quarrel there.
Being curious, I asked him how the expression would evaluate without the
parenthesis (). He said that since ++ and * are unary operators, and that *
had higher precedence than ++, the following was true:
*ch++ is evaluated identically to just ch++ without the *
and that in both cases, you would just increment the address ch by 1.
I disagreed (never disagree with your instructor). I said that if * had higher
precedence than ++, like he insisted, that
*ch++ should be exactly the same as (*ch)++
He said no, since ++ is a unary operator, it could only see the ch, not *ch,
even if * had already operated on ch. I told him I thought that was really
crazy, and he got upset...
Now, actually, as you experienced programmers know, * and ++ have the exact
same precedence, and are evaluated right-to-left when there is an
associativity question, as above. So, he was right, the parentheses are
required to make (*ch)++ increase the value held in address ch. But the stuff
about ++ acting only on the adjacent operand must be wrong, right ???? I mean,
wouldn't:
++*ch do exactly the same as (*ch)++
He claimed that in the case, ++*ch, the ++ would not know what to do with the
operand *, while I claimed that *ch would already be evaluated to the
single-value contents of address ch when ++ attacked. He responded that he had
been programming in C for years, and knew what he was talking about.
Comments? Perhaps I am the one who is misunderstanding...
Glenn Jordan
RTP, NCA
A
You are correct in your interpretation. By associativity and precedence rules:
ch++ equals *(ch++)
Both use the address contained in ch as a pointer and then increment the
contents of ch using pointer arithmetic.
++*ch equals ++(*ch)
These forms use the address in ch as a pointer and increment the contents at
that address.
*++ch equals *(++ch)
These forms increment the contents of ch using pointer arithmetic and then use
that new value as a pointer.
In order to post-increment the contents of a target location, you need to use
explicit parentheses to overcome the precedence, yielding:
(*ch)++
This combination uses the address in ch as a pointer and increments the
contents at that address.
For example, if we assume that doubles are eight bytes long, then incrementing
a pointer to a double increases that pointer by eight. The comments in Listing
2 detail the behavior of various pointer/increment combinations.


Reader Responses:




Character Constants


This letter is in response to your discussion of character constants on pages
113 and 114 of The C Users Journal, January 1990. I believe that your
discussion is flawed and that the Microsoft C and Quick C implementations do
not comply with the draft standard.
You say that the character Š (where e is replaced by the accented e, code 138
decimal) is not part of ASCII and so the compiler could do with it what it
wants. I refer you to the following items in the December 7, 1988 draft C
standard: Section 2.2.1, Page 11, Line 12:
Both the basic source and the basic execution character sets shall have at
least [emphasis added by RHG] the following members ...
Section 3.1.3.4, Page 29, Line 16: c-char:
any member of the source character set except the single quote ', backslash \
, or new-line character escape-sequence
Section 3.1.3.4, Page 30, Line 33:
If an integer constant contains a singlecharacter or escape sequence, its
value is the one that results when an object with type char whose value is
that of the single character or escape sequence is converted to type int.
Given that Microsoft's C and Quick C compilers accept character 138 without
any diagnostic, I think it is safe to assume that they consider the character
to be part of the source character set. Therefore, the literal is indeed a
legal character constant and so should be treated like (int) (char) Š which
has the value -118 if characters are treated as signed. Hence, your example
demonstrates a bug in the compilers, not an implementation dependency.
I have copied this note to the postmaster at Microsoft in the hope that he
will forward it to the appropriate person in the C compilers development group
for comment. You may also use this letter in a future column if you see fit.
Richard H. Gumpertz
Leawood, Kansas
A
You (Mr. Gumpertz) are correct, I believe. The sample program in the article
(pg. 114) shows a bug that has actually been in MS C since well before the
ANSI standard (I duplicated it all the way back to C v4.0).
We do appreciate you bringing this to our attention. This bug will be fixed in
our upcoming version of MS C.
Thanks.
Dave Weil
Group Development Mgr.,
System Languages
Microsoft Corp.

Thank you. I stand corrected on this technicality. You are correct if a
character representation is accepted in a character constant, then it should
act according to the rules for characters. Non-ASCII characters are not in the
ANSI standard source character set that must be supported by a conforming
compiler.
I strongly urge against using non-ASCII characters as character values. You
can always use a #define in their place. Not only do you avoid the inherent
non-portability of such a program, you also avoid word processing problems.
For example, I was porting a program somebody had written with a word
processor that accepted non-ASCII characters. My word processor does not
accept them. It uses the high order bit as an internal designation of the end
of a word. It read the program, but the non-ASCII characters appeared as the
ASCII value with the high-order bit off.
If you do use characters with the high-order bit on, then you could declare
the variables that use them as unsigned chars preventing sign extension when
the char is expanded to an integer. --KP


Naming Conventions And Indentation


Everyone knows the "CMP" stands for corrugated metal pipe, not compare or
compute. --Marcus Russell, West Berlin, NJ
This comment refers to a previous response I had given to a question regarding
naming conventions. I suggested that one should adopt some standard
abbreviations, if one did not spell out names in full.
Comments from other people have suggested that there is a widespread
distinction between the vowel droppers and the first few letter users.
"compute" could be abbreviated as "cmpt" or "comp", depending on your
preference. In my earlier days I used to use "cmp" as a shortening of "cmpt".
This always caused conflict when "compare" got shortened to "cmpr" and then to
"cmp" also.
I find it interesting reading listings in this and other magazines. I believe
that a program should be almost as readable as a book. Using fully spelled out
variable names contributes as much as any other factor to easier understanding
of a program.
This leads me on to another topic of readability -- the great brace debate.
Brace alignment of compound statements seems to be a topic that provokes a
variety of opinions. I think that, like taste in art, each person's view is
different and sufficient justification can be developed to support any
particular stand.
A recent article in the C Gazette had some words to say about indentation
styles. I recommend the magazine for those who like reading C code in order to
learn about it. There are a lot of source listings in that magazine.
There are many possibilities for brace alignment. If braces are placed on
lines by themselves, then either or both can be aligned with the enclosed
statements or one tab stop to the left of the statements. Alternatively, the
opening brace may be on the same line as the controlling statement. The
closing brace might be on the same line of the last enclosed statement. This
yields a number of possibilities. In Chapter 14 of All on C, I listed four
common formats. Here are those with a few more. I've left off several
variations which appear rather ugly and of no use.
Braces on separate lines and aligned with enclosed statements.
if (x)
{
...
}
Braces on separate lines and aligned with controlling statement.
if (x)
{
...
}
Opening brace on same line as controlling statement, closing brace aligned
with enclosed statements [my preference -- rlw].
if (x) {
...
}
Opening brace on same line as controlling statement, closing brace aligned
with controlling statement (Kernighan and Ritchie style).
if (x) {
...
};
Opening brace on same line as controlling statement and closing brace on same
line as last enclosed statement.
if (x) {
...}
Luckily there are "pretty-print" programs that you can use to alter the style
of the indentations, for programs you have written or that you have received
and are trying to alter or maintain. However it's usually wise to adopt one
style and use it faithfully. I originally adopted the style:
if (x) {
...
}
The initial choice was arbitrary. Later I reviewed my usage and found a few
compelling reasons to switch to:
if (x)
{
...
}
This appearance is more consistent with the use of indentation for
non-compound statements. Those looks like:
if (x)
statement
It also makes it easy to match up braces. The other styles which have
unaligned braces make it more difficult to do this.
Those of you who submit code for this column will find that I have reformatted
the listing for the sake of consistency within the column. --KP

Listing 1
#include <stdio.h>

main()
{
char real_array[20];
printf("Realarray can hold %d chars.", sizeof(real_array));

getarray(real_array);
}

void getarray(passed_array)
char passed_array[];
{
printf("\nPassedarray can hold %d chars.",
sizeof(passed_array));
}

---------------------------------------------------------------
Results : (under Power C and Instant C)

Realarray can hold 20 chars.
Passedarray can hold 2 chars.

---------------------------------------------------------------


Listing 2
double d[5] = {1., 2., 3., 4., 5.};
/* Assume this starts address 100 */
double *ch;
double e;

ch = d; /* 100 placed into ch */

*(ch++) = 5.; /* 5. placed in d[0]
ch incremented to 108.

++(*ch); /* Contents of d[1] (at address 108)
incremented by 1, to 3. */

*(++ch) = 7.; /* ch incremented to 116
7. placed in d[2] (at address 116) */

(*ch)++ /* The 7. at d[2] is incremented to 8. */

e = ++(*ch); /* The 8. at d[2] is incremented to 9.
9. is placed in e */

e = (*ch)++; /* The 9. at d[2] is placed in e
d[2] is incremented to 10. */




















How To Do It... In C


Practical Schedulers For Real-Time Applications




Robert Ward


Robert Ward is president of R&D Publications and author of Debugging C, an
introduction to scientific debugging. He has done consulting work in software
engineering and data communications and holds an M.S.C.S. from the University
of Kansas.




What Is Real Time?


Real-time is not a synonym for "real-fast". Contrary to popular opinion,
making everything "real fast" won't necessarily make a real-time program work
correctly. A much better synonym is "on time" since, in a real-time program,
certain events must happen at a specific time.
Making input and output events happen "on time" is pretty straightforward if
you have only one I/O path to worry about. But real-time programs, especially
embedded real-time systems, are often also multi-tasking programs. Most
real-world, real-time programs are expected to simulate several pieces of
simultaneously operating hardware.
When analyzing a project that is both multi-tasking and real-time, the
designer must recognize that some tasks are less urgent than others. For each
real-time task, "on time" may have a different meaning, depending upon the
time constraints associated with that particular task. A continuously running
built-in self-test, for example, usually runs without any time constraints,
even if it is testing a real-time system. Some real-time events may need to
happen at a specific wall-time; others at a specific interval from some
external event. The real-time program must properly balance these varying
needs at every instant, under every imaginable combination of input
conditions.
This article will show how an appropriate general purpose scheduler can
significantly reduce the design complexity in such programs and also
significantly increase your confidence in the feasibility of the design even
before you write any significant amount of code.


What Is A Scheduler?


A scheduler is simply code that decides which task to perform next. Thus a
scheduler can be as simple as the loop in Listing 1. This "slop-cyclic"
scheduler cycles repeatedly through each task (cyclic) at a rate that may vary
depending upon the time required to execute each task (slop). A "rate-cyclic"
scheduler can be almost as simple, as shown in Listing 2. The rate cyclic
scheduler cycles through all the tasks at a constant rate of once per clock
tick.
If you are accustomed to writing real-time systems as one large loop with
input polling and capture code sprinkled throughout the system, it may seem
pretentious to describe Listing 1 as a "scheduler." After all, one could
argue, the execution sequence is, like the polling loop, just a hard-wired
loop -- the so-called scheduler just adds calling overhead.
Even this trivial scheduler, though, offers several important advantages.
First, all the scheduling information is in one place. If you must alter the
code (and the timing relationships between the pieces), you know just where to
look to make the necessary scheduling adjustments. For example, if after
writing the project, you found that task3() and task4() didn't execute as
rapidly as expected, causing task1() to "miss events", you might solve the
problem by making the change in Listing 3. Now task1() gets scheduled more
than once during the cycle. You can even drop one task into the "middle" of
another by breaking one task into several subparts (see task2a() and task2b()
in Listing 4).
With the scheduling code isolated in a single module, the designer also
reserves the option to completely change the scheduling mechanism. For
example, instead of splitting task2 into two parts (as in Listing 4), you
might obtain the same effect by installing a more sophisticated pre-emptive
scheduler like the one we'll develop later in this article.
A distinctly separate scheduler can also make development and testing much
easier. You might plan to use a simple loop like Listing 1 to perform initial
testing and characterization of your task code and then install a more
sophisticated scheduler for final test and production.


Priority & Pre-emption


In the trivial schedulers of Listing 1 - Listing 4, all tasks are of equal
importance and the order of task execution is statically determined by how the
code is written. This egalitarian approach forces the programmer to adjust for
differences among the tasks by adjusting the code, for example by making
multiple calls to task1() and by splitting task2() into two parts. If the
scheduler were more competent, we wouldn't need to make these coding
compromises. What we need is a scheduler that can recognize that some tasks
(task1() for example) are more important than others (have higher priority)
and that sometimes long tasks like task2() may need to be interrupted (be
pre-empted) so that some shorter, more important task can run "on-time". The
scheduler should not only recognize these differences, it should be able to
dynamically adjust the execution order to accommodate them.
Prioritized scheduling can be accomplished by augmenting the simple slop
cyclic scheduler so that it uses a different control structure driven by
several "ready" lists. Listing 5 presents the basic structure for a very
simple environment where each task is assigned to a different priority level
and each ready list consists of a separate flag in the structure ready.
The ready flags are set by an interrupt service routine that captures related
input, or by some other task (for example, task2a() would set task2b()'s flag,
thereby "scheduling" task2b()).
Listing 5 will schedule events dynamically based on their readiness, but it
still lets each task run to completion. The next level of scheduler
sophistication, pre-emption, complicates matters considerably, but is still
easy to implement once the calling conventions are understood.
The pre-emptive scheduler pre-supposes an environment where virtually all
events are serviced by interrupts. These interrupts create natural "break
points" at which other tasks can be pre-empted. (Actually, you can get the
same effect by sprinkling special calls, gotos and stack manipulations
throughout each task, but you won't want to maintain the result.) Each
interrupt service routine ends with a call to the scheduler. The scheduler
then examines all higher priority ready lists to see if some more urgent task
needs to run. If not, the scheduler simply returns, allowing the interrupted
task to continue. If there is a more urgent task waiting, the scheduler calls
it (see Listing 6).
This scheduler treats LEVEL1 as the highest level of priority. IDLE tests
greater than all other levels.
Listing 6 assumes several conventions and data structures not explicitly
shown. The functions push() and pop() manipulate a stack of current priority
levels. If you are willing to pass the current priority level as a parameter
to every task, and to accept responsibility for always calling the scheduler
with the current level's priority as a parameter, you can stack this
information implicitly as function parameters.
The function getnext() searches a linked ready list for actions more urgent
than the action just interrupted. If a more urgent task is found, getnext()
copies its descriptor from the ready list to the next structure. The task is
then invoked via a pointer to function.
The assignments marked /*lock*/ in Listing 6 must be executed with interrupts
disabled to avoid potential synchronization errors. Before using this code,
you must at least bracket these lines with code to disable and enable
interrupts. The scheduler must always be called with interrupts enabled --
otherwise it will provide only one level of pre-emption and may cause some
interrupts to be missed.
This implementation is very stack-intensive. Each interrupt can potentially
generate three stack frames for each interrupt (interrupt, interrupt's call to
scheduler, scheduler's call to task). In an environment where many interrupts
can arrive simultaneously, the stack may expand very rapidly.


Some Design Advice


A real-time design should begin with a careful analysis of the possible events
and the relationship among them. The goal is to decouple (with respect to
time) as many actions as possible. Decoupling will often greatly increase your
ability to service critical events, by allowing the great bulk of the
processing to occur in the time between critical events. This analysis should
identify the time-critical events (those that really must be done NOW), and
prioritize the other events according to their relative criticality.
Generally, actions subject to similar constraints should be made members of a
priority class and broken into execution units that are small relative to the
time-tolerance of the next, more urgent class.
Early in the design analysis, you should compute the probable CPU utilization.
If the tasks assigned to the system will consume more than 70 percent of the
CPU throughput, you should probably consider the design impractical and either
find faster algorithms for certain modules, add additional hardware, or run on
a faster CPU. In truly asynchronous environments, a processor utilization of
greater than 70 percent greatly increases the likelihood that one of your
ready queues will grow to an unmanageable length. You can make an exception to
the 70 percent rule if you can prove that your waiting lists will never grow
beyond some small fixed length.
The structure of the program will mirror the classification of events.
Critical events will be serviced by interrupt handlers, non-critical events
will be processed according to their priority by a general-purpose scheduler,
and queues will handle communication among the pieces. Since these systems are
almost always concurrent, it is imperative that the programmer be comfortable
with the issues of deadlock avoidance and shared resource management.
With careful analysis of events and adequate throughput, a simple cyclic
scheduler is often adequate. In some applications where some actions consume
very large amounts of time compared to the time-tolerance of higher-priority
tasks, it may be necessary to implement a pre-emptive scheduler.



A Case Study


Suppose you are to build a real-time system with four major functions:
Process Control. This function consists of monitoring a sensor on a production
line and adjusting a control output to keep the process within acceptable
limits. The sensor is to be sampled every 100 ms (ñ1 ms) and necessary output
corrections must be made within 100 ms. Statistical quality control methods
are to be used to decide if the sample input represents an unacceptable
deviation. Test programs have shown these calculations to require 7 ms. A
programmable internal timer is to be used to control the sampling interval.
The timer can generate interrupts.
Manual Override. The system should accept human specifications for the control
output from a keyboard. This keyboard debounces inputs, but once the character
is validated, it must be read within 70 æs. The keyboard's "character ready"
status line is connected to the CPU's interrupt line.
Time-of-day Clock. An hour and minute time-of-day display. Presumably it will
take its timing from the internal timer. The display is mechanical, each digit
on a "flip board", driven by a stepper motor. The motor must be pulsed through
60 steps to change a digit. Each step takes 10 ms, to complete. Pulses are
directed to the appropriate digit position by a multiplexor, thus the position
must be selected and then the pulses sent to change a digit.
Synchronous Communications Support. The system functions as a "repeater" in a
communications network. Supporting hardware captures data a block at a time
and requests your system to perform a crc-16 on the data. If the data is
correct, your system must initiate a write 75 ms (ñ 100 æs) after the block
was marked received. A failure to meet this requirement will result in the
subordinate hardware missing a polling cycle and loss of the block. The
subordinate hardware's "block ready" shares the interrupt line with the
keyboard. The block ready signal remains set until reset by the CPU.
Table 1 summarizes these specifications and adds estimates for each task's
execution time. These execution time estimates can be based on expected code
size for each task, on prior experience with similar problems, on padded
measurements of execution speeds for certain critical inner loops, or on
measurements taken from "prototype" implementations for each task (perhaps
written in a high-level language). Since adding a scheduler to the design
makes each task a piece of stand-alone code, time spent coding each task for
these measurements isn't just wasted. Most of your characterization code can
be used in the finished design.


Critical Operations


Capturing a keystroke and (because it shares an interrupt with the keyboard)
capturing a block ready indication are the only critical tasks in the system.
These will be processed by an interrupt handler with interrupts turned out
throughout the service.


Priorities


Level 1. Capturing a data sample, capturing a clock tick and initiating a
block write are "nearly" critical. It also makes sense for all three to be
handled in the same interrupt service routine. Since they have lower priority
than the critical events, interrupts will be enabled during as much of the
service routine as possible. Thus the data sampling interrupt routine could be
interrupted during its execution. We'll assume that the first 15 æs of this
process can't be interrupted. Note that this priority level isn't recognized
in the scheduler because it is fully processed in the interrupt handler -- I
just wanted to show that even interrupt handlers differ in their urgency.
Level 2. The block checksum and sample analysis will be grouped at the next
level of priority. The checksum has been broken into several short parts so
that it can't "block out" the input analysis for more than a few milliseconds.
Each part will schedule its successor after it completes. This ensures the
sample analysis will be able to "sneak" in between two parts (the sample
analysis is scheduled by an interrupt routine, possibly while the checksum is
executing).
Level 3. All clock control and keystroke parsing will be performed at level 3
(or background) activities. Notice that even though none of these events
consumes more than 10 ms, if three such events were allowed to be interspersed
with level 2 events, the block check would miss its "output deadline."
Level 4. This level is reserved for pure "waiting" activities, such as waiting
for the clock stepping motor to time-out.
Listing 7 presents the structure of the entire application in a C-like psuedo
code. This code would use the scheduler of Listing 6.


Throughput Requirements


Processor utilization is computed by combining the frequency estimates and
time consumed estimates from Table 1. Table 2 shows that this design falls
well within the 70 percent rule, and should probably be feasible.
Total utilization isn't the only prerequisite to feasibility, however. The
design must also meet the response time restrictions. Response time
performance is evaluated by calculating the worst time performance for each
event. Worst case analysis should always include the possibility that the
program has just responded to some interrupt or that multiple copies of the
analyzed interrupt arrive at the closest interval possible. Table 3 analyzes
the design's latency when performing a process control cycle.


Additional Ideas


When a variety of events happen at non-harmonic intervals, consider
implementing a timer scheduling queue. Events can specify the timing of other
events by putting a timer programming request in a special queue.
If your system has multiple interrupting events and no vectored interrupts,
restrict the interrupt handler to just capturing the interrupt and queueing
it. The highest priority task then examines the information in this queue and
schedules other work.
To make certain two tasks of equal priority get fair scheduling, partition
them into pieces (as with the checksum above) and let each piece upon
completion schedule its successor. This scheme will allow the shorter tasks to
be scheduled. This trick can often eliminate the need for a pre-emptive
scheduler.


Conclusion


An appropriate scheduler can greatly simplify real-time designs by allowing
the individual task modules to remain ignorant of their interaction with other
real-time tasks. A distinct scheduler also simplifies debugging and
performance analysis. If you aren't comfortable with the concurrency issues
implicit with handling the ready queues and other shared resources in the
dynamic schedulers, you can still use the static versions and preserve the
option of incorporating a more complex scheduler when the project eventually
demands it.
Little schedulers like those developed here are usually all the real-time
support a controller needs. They offer distinct advantages over a commercial
real-time kernel: the scheduler is smaller, simpler to understand, comes
complete with source code, and is much less expensive.
Table 1
Events Trigger time latitude freq
 consumed

capture sample timer interrupt 70 us +-500 us 10/s
analyze sample input available 7 ms -0,+92 ms 10/s
output correction analysis complete 35 us n/a 10/s

capture keystroke G.P. interrupt 35 us -0,+35us 5/s
parse input keystroke stored 1 ms 500 ms 1/s


clock tick timer interrupt 15 us +-500 us 10/s
minute change clock tick 150 us ? 1/60s
digit change minute change 50 us ? 4/60s
digit step digit change 10 ms ? 320/60s

capture block ready G.P. interrupt 35 us -0, +35us 7/s
check 1st q capture block 13 ms +- 3 ms 7/s
check 2nd q 1st q checked 13 ms +- 3 ms 7/s
check 3rd q 2nd q checked 13 ms +- 3 ms 7/s
check 4th q 3rd q checked 13 ms +- 3 ms 7/s
initiate block write 4th q checked 35 us +- 100us 7/s
Table 2
Events factors time used capacity used

capture sample 70 us * 10/s 700us/s .0007
analyze sample 7 ms * 10/s 70 ms/s .07
output correction 35 us * 10/s 350us/s .000350

capture keystroke 35 us * 5/s 175us/s .000175
parse input 1ms/s .001

clock tick 15 us * 10/s 150us/s .000150
minute change 150us * 1/60s 150us/60s .000003
digit change 50 us * 4/60s 200us/60s .000004
digit step 10 ms * 320/60s 53.3ms/s .053300

capture block ready 35us * 7/s 245us/s .000245
check 1st q 13ms * 7/s 91ms/s .091000
check 2nd q 13ms * 7/s 91ms/s .091000
check 3rd q 13ms * 7/s 91ms/s .091000
check 4th q 13ms * 7/s 91ms/s .091000
initiate block write
 Total Utilization .489927
Table 3
Start checksum 0
Receive GP interrupt 35 us
Receiver Timer interrupt 15 us
Receive second GP intr. 35 us
 complete timer intr 55 us
 complete chksum part 13 ms
Perform analysis 7 ms
 ----------------------------
 Total 20.14 ms

Listing 1
while (FOREVER) {
task1();
task2();
task3();
task4();
}


Listing 2
while (FOREVER) {
/* sleep until awakened
by clock interrupt */
sleep(clock_tick);

task1();
task2();
task3();
task4();
}


Listing 3
while (FOREVER) {
task1();
task2();
task1();
task3();
task1();
task4();
}


Listing 4
while (FOREVER) {
task1();
task2a();
task1();
task2b();
task1();
task3();
task1();
task4();
}


Listing 5
while (FOREVER) {
if (ready.task1) task1();
else if (ready.task2a) task2a();
else if (ready.task2b) task2b();
else if (ready.task4) task4();
else if (ready.task3) task3();
}


Listing 6
This scheduler treats LEVEL1 as the highest level of priority.
IDLE tests greater than all other levels.

struct actions {
int priority;
void (*action)();
char * arg;
struct actions *nxtptr;
} ready [MAX_WAIT], next;

main ()
{
...
/* initiate interrupt handlers */
while (TRUE){
clevel = IDLE;
do_loop();

}
}

void do_loop()
{

scheduler:

if (clevel == LEVEL2) return;
else if ((clevel > LEVEL2) && (getnext(LEVEL2) != EMPTY)){
push (clevel);
clevel == LEVEL2; /*should be locked */
(*next.action)(next.arg);
clevel=pop(); /*lock*/
goto scheduler;
}
else if ((clevel > LEVEL3) && (getnext(LEVEL3) != EMPTY)){
push (clevel);
clevel == LEVEL3; /*lock*/
(*next.action)(next.arg);
clevel=pop(); /*lock*/
goto scheduler;
}
else return;
}


Listing 7
Psuedo code for interrupt handler:

On keyboard interrupt do {
save context
if keyboard has input, save with work request to level3 queue
if block is ready {
reset indicator
add check_1st_blk to level2 queue
}
restore context
enable interrupts
return from interrupt
}

On timer interrupt do {
save context
if write ok flag set{ /*done with interrupts off to avoid clash */
initiate write
clear flag
}
enable interrupts
capture sample, save with work request to level2 queue
step minute counter, on overflow {
reset counter
put minute change work request in level3 queue
}
return from interrupt
}

Psuedo code for tasks


Analyze sample {
perform statistical analysis
if out of bounds {
compute correction
output correction
}
return
}

Check Block Pt 1{
compute partial checksum
save result with pt2 work request in level2 queue
return
}

Check Block Pt 2{
continue checksum
save result with pt3 work request in level2 queue
return
}

Check Block pt3 {
continue checksum
save result with pt4 work request in level2 queue
return
}

Check Block pt4 {
complete checksum
if ok, set write ok flag
return
}

Parse input {
save input parameter in command line buffer.
If input keystroke is a terminal symbol{
parse buffer;
output manual correction;
clear buffer;
}
return;
}

Minute change{
increment minutes-ones
add work request for digit change to minutes-ones to level4 queue
add work request for digit step level3 queue
on overflow {
add work request for digit change to minutes-tens to level 4 queue
add work request for digit step level3 queue
}
on tens overflow {
add work request for digit change to hours-ones to level 4 queue
add work request for digit step level3 queue
}
on hours ones overflow {
add work request for digit change to hours-tens to level 4 queue
add work request for digit step level3 queue
}

on hours-twelve overflow {
add work request for digit change to hours-ones to level 4 queue
for (i=0; i<8; i++)
add work request for digit step to level3 queue
add work request for digit change to hours-tens to level 4 queue
add work request for digit step to level 3 queue
}
}

digit change {
set multiplexor to select requested digit
}

digit step {
for (i=1; i<60; i++) {
add work request for one_pulse to level4 queue
}
}

one_pulse {
pulse stepping motor
busy-wait for 10 ms
return
}







































On The Networks


A Perl Of Great Price




Sydney S. Weinstein


Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and president
of Datacomp Systems, Inc., a consulting and contract programming firm
specializing in databases, data presentation and windowing, transaction
processing, networking, testing and test suites and device management for UNIX
and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron
Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the
Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do
Internet addressing).




Administrivia


As I have said in prior columns, I am willing to forward a list of Usenet
sites near you for access to Usenet, Netnews and E-mail. However, I can only
provide this service to those who send a self-addressed stamped envelope.
Also, please include your area code in the request. An area code gives me a
greater chance of finding a site that might be a local call for you.
Note, however, I do not contact these sites for permission. All I am doing is
extracting the names and contact information from the Usenet mapping
information and sending you that printout. It is up to you to contact the
sites listed in the maps. Remember, they are doing you a favor if they let you
connect.


Pearl Of The Month: Perl


One of the most respected freely distributed software authors on the net is
Larry Wall of JPL-NASA. He has written many software tools including the
popular netnews reader RN, the source language patching program Patch, and a
software configuration and distribution support toolset Dist. His latest large
effort has been Perl -- Practical Extraction and Report Language, or
Pathologically Eclectic Rubbish Lister. Perl was first released as version 2.
This review is of his new release, version 3.
To quote the manual page: "Perl is an interpreted language optimized for
scanning arbitrary text files, extracting information from those text files,
and printing reports based on that information. It's also a good language for
many system management tasks. The language is intended to be practical (easy
to use, efficient, complete) rather than beautiful (tiny, elegant, minimal).
It combines (in the author's opinion, anyway) some of the best features of C,
sed, awk, sh, so people familiar with those languages should have little
difficulty with it. (Language historians will also note some vestiges of csh,
Pascal, and even BASIC PLUS.)"
That paragraph was true for version 2, but is an understatement for version 3.
Perl can now handle binary files, network sockets, and even dbm database files
with ease.
Perl runs counter to the typical UNIX tool philosophy of "do one item in a
tool and hook many tools together with shell scripts and pipes." Perl allows
you to combine all the sections together in one efficient script. Perl's two
great claims to fame in my opinion are its ability to provide the right set of
features for writing useful tools for systems, and its interpretative nature
to allow for easy debugging and development.


Installation


Perl is not a small program, so if snarfed off the network it comes in many
parts. Perl also, as of this writing in mid-December (Now you know, these
columns have about a four month lead time) six patches have been issued to
Perl, making the current version Release 3.0 Patchlevel 6. After unpacking all
the parts and applying the six patch files (using Larry's patch utility, of
course), the instructions say to run a shell script called Configure. It's
worth obtaining Perl just to see how this is done. Configure, a giant shell
script, written by the Perl program metaconfig from Larry's Dist package
analyzes your system and determines what features of Perl your system can
support, where things are located on your system, and a great deal of
additional information to make Perl install correctly on your system
automatically. If only other packages used this method (Note, Elm does).
After the Configure is run, a special version of the make script automatically
figures the dependencies for each C file as you have configured them and
adapts the Makefile. Then the system is compiled three times, once for normal
Perl, once for a version called taintperl, and lastly for a setuidperl
version. The tainted versions prevent any command line argument, environment
variable, or input nor any result of operations on these values from being
used in subshells, system calls, or for modifying files or directories. This
is used for setuid scripts.
Now, another thing I wish more authors provided: Perl has a rather complete
regression test suite to validate Perl's configuration and compilation. This
test suite may not catch all problems, but it goes a long way towards
providing confidence that a package as large as this one was configured
properly and compiled without compiler induced errors. Perl runs the
regression test automatically after the system has been built, and performs
over 850 separate tests.


Features


As a combination of the shell, C, sed and awk, Perl has a syntax close to C,
with most of its operators, plus the ability to process variables and run
subprocesses like the shell, perform pattern matching and substitution like
sed, and report generation features similar to awk. A couple of the more
interesting features include:
Associative arrays: In addition to scalar variables (single numbers or text
strings) and normal arrays (vectors of numbers or strings), Perl also provides
an array concept called an associative array. This array is a mapping of
tuples. Thus the array index, called a key, is itself just a number or text
string. The difference between this and an array where the index is an
enumerated type is that the index is dynamic and includes any values desired
at run time. Thus you could say
$balls{'red'} = 7;
$balls{'green'} = 34;
$balls{'blue'} = 12;

while (($color, $number) = each
%balls) {
print "I have $number
$color balls\n";
}
Variables are preceded by a $ and the { } array indices are for associative
arrays. Standard arrays use [] for their indices. The % prefix indicates the
entire associative array. Thus this program initializes the associative array
and then loops using the each function to return each tuple of key and value.
These tuples are assigned to the scalar variables color and number and then
used in the print statement. An easier way to initialize the array would be to
use the list construct of Perl:
%balls = (
'red', 7,

'green', 34,
'blue', 12');
Open function: Perl's open call can also open pipes to or from other
processes. Thus Perl can start other processes and either read their results
(very useful for letting Perl figure out the SQL to run and then running SQL
to obtain the data) or for passing Perl's output to another program (such as
the print spooler). Of course, Perl can read, write, and append to files with
the open function.
Formats: Perl supports a BASIC-like format option for output files in addition
to the print and printf constructs. The Perl program in Listing 1 converts
UNIX System V type df (disk free) listings into the BSD type of report.
Listing 1 shows several of the features of Perl, as well as demonstrating the
format capabilities.
The first three lines make sure that Perl is running the program, allowing a
plain "executable" file to automatically be a Perl script. If the system
supports the #! notation, then the kernel will spawn Perl to handle this file
automatically, and not the shell. Otherwise the second line causes the shell
to execute Perl on the script. When Perl does see the script, it treats the
first line as a comment, and the second and third lines as a valid Perl
statement. Since the variable running_via_sh is not non-zero (it isn't even
defined yet), the eval statement is not executed and Perl just continues on in
the script.
he array (@ is the symbol for an entire regular array) of the arguments passed
to the Perl script, not counting the name of the script. Thus the join line
makes a text string of all the arguments separated by spaces. This string is
used in the open call to the df process, causing it to output only the
requested file systems (if arguments are given), or all the file systems (if
no arguments are given). Note that if the df fails, the shell construct is
honored to allow the error message to be output when the open fails.
The formats could appear anywhere in the script. The default top of page
format for the STDOUT file is called top, but that association can easily be
changed. In this case the top format is used to provide column headers. Note
that format continues across lines until a line with just a period is
encountered, thus outputting multiple lines. Both top of page and file formats
may also include variable substitutions. The STDOUT format "writes" to the
standard output file and have three types, <, , and > for left justify,
center, and right justify respectively. All variable substitution formats
start with a @ character and take as many spaces as are desired. Each line
with @s in it is immediately followed by a line listing the variables to print
on that line. It is not necessary to space variables as I did, but I think the
spacing improves readability.
The while loops over the lines read from the Df file. The special symbols <>
mean read a line from the file. The if block uses regular expression matching
on the line just read and only performs the then clause of the if when the
line contains the text string total blocks. In the else clause the s commands
are string substitution, again based on regular expression mapping. These
commands work on the input line by default, however the =~ operator is used a
couple of lines later to specify that the substitute should be performed on
the $name variable instead of the input line. The write function is used to
output a line using the format specified earlier.
Finally, the last if block uses the special variable syntax $#name, which
references the subscript of the highest element. Since this Perl script
origins arrays at zero (the default), a less than zero check tells whether any
arguments were passed to the script. As a result of this test, the total line
is only printed if no arguments were passed and the df is for all file
systems.
Perl scripts are also easy to debug, in part because of the debugger imbedded
within Perl. Adding a -d argument to the invocation line tells Perl to run the
script in debug mode. Debug mode supports breakpoints, single stepping program
browsing and "immediate mode" execution of any valid Perl statement. Thus the
contents of variables can be examined or changed at any time.
I didn't even describe directory processing, BSD socket access, subroutines or
much on regular expression processing. Perl does come with a complete
reference manual, although a tutorial manual is not provided.
Perl, of course, is most useful on UNIX (or Xenix) systems. However,
restricted portions of Perl have been compiled on VMS and on MS-DOS. Perl has
gotten so popular there is now a Usenet news group comp.lang.perl. But Perl is
not a small program, and its load size causes a sizeable overhead at startup.
Of course, for longer scripts this delay is not a problem, but the overhead is
enough to keep Perl from replacing the shell for all scripts.


There's More


comp.sources.unix was active for a short while and has again gone quiet.
During its active time, Rich Salz, the moderator of comp.sources.unix, did
provide some unusual postings.
From Harold Walters at Oklahoma State University came a set of 109 functions
called xxalloc providing dynamic array manipulation in one, two and three
dimensions. xxalloc includes routines for allocating, initializing, printing,
renumbering and fleeing both arrays of structures and arrays of simple types.
An "edge-vector" approach is used for two- and three-dimensional arrays to
allow for development of reusable subroutine libraries without regard to some
"maximum" dimension. The package includes installation instructions, a test
program to exercise most of the package, and manual page. It has been tested
on System V, BSD and MS-DOS machines and is available as Volume 20, Issue 28.
Chin Huang has written a program to automatically generate C function
prototypes and variable declarations from C language source code. It differs
from other similar programs in that it doesn't parse the function body. This
package needs FLEX, which is also available from the archive sites. Cproto is
Volume 20, Issue 29.
For those still using curses instead of bit-mapped screens, John Lupien at
AT&T has published a curses-based digital clock for VT100 and compatibles.
It's a small, simple program and is Volume 20, Issue 45.
Plum-Hall has placed into the public domain a simple set of benchmarks
intended to give programmers timing information about common C operations.
They were designed to be short enough to type while browsing at trade shows,
and are protected from overly aggressive compiler optimizations. The
plumbenchmarks are Volume 20, Issue 47.
David Curry at NASA Ames Research Center has posted Index, a program to allow
you to maintain multiple databases of textual information, each with a
different format. For each database Index allows insertion, deletion, edits on
existing entries, searches using full regular expressions, restricted
searches, pattern matching and arbitrary formatting. Index is Volume 20, Issue
56 and 57.
Richard O'Rourke of Microplex Systems, Ltd. provided a pegboard program which
keeps track of who is in and out of the office, and when they are due back.
The program is designed for Xenix, but should work on other flavors of UNIX
and is in Volume 20, Issue 76.
For those running Xenix or UNIX V3.2.1, Volume 20, Issue 81 and 83 from Eric
Raymond has provided an editor/minilanguage to rebind the keyboard on the
console. Useful for Emacs users and for changing the virtual terminal selector
keys.
One of the stranger programs in comp.sources.unix in that last spurt of
postings was the "Reactive Keyboard". Mark James of the University of Calgary
has augmented a general-purpose command line editor with predictive text
generation. The program interfaces with a standard shell, allows simple
editing of input lines, and will predict input lines based on previous input.
It's weird to type an edit followed by a compile and have the command
processor provide the file name for the compile, and then after you edit the
file again, have it predict another compile. The Reactive Keyboard is Volume
20, Issues 29-32, but it requires BSD-style ptys to work properly.
Chip Salzenberg of AT-Engineering has posted his latest version of Deliver, a
program which delivers electronic mail once it has arrived at a given machine.
Deliver extends inflexible E-mail delivery systems to allow complete control
over mail deliver through the use of delivery files. Delivery files are shell
scripts which are executed during message delivery. These scripts control
which people or programs get each E-mail message. Look for Volume 20 Issues
23-27.
Pcomm, v1.2, is a UNIX telecommunications program made to look like Datastorm
Technologies ProComm for MS-DOS. New in v1.2 is BSD support, auto-login
scripts (using shell scripts), imbedded external file transfer programs, and
faster operation via I/O buffering. Emmet Gray from the US Army submitted this
as Volume 20, Issues 67-75.
For those stuck with the old troff, and wanting to deal with printers other
than the Wang C/A/T phototypesetter, Chris Lewis of Elegant Communications,
Inc. has provided psroff. It converts the output of standard troff to
postscript, di-troff format, and a partial attempt at HP-LJ family of
printers. Several patches are also available to further enhance this package
which is Volume 20, Issues 33-38.


And Still More:




comp.sources.misc


Although Rich Salz has been intermittent with postings, Brandon Allbery, the
moderator of comp.sources.misc has been providing plenty to write about.
In Volume 8, Issue 99, Paul Blackburn of the Open Software Foundation provided
a script for keeping track of changes to files. Useful to system
administrators who need to detect unwanted or unexpected changes to files.
A UNIX make work-alike, Make v1.5 was posted as Volume 8 Issues 104-106 by
Greg Yachuk of Informix Software. This make is very close to the make provided
on Sun systems and runs under MS-DOS or UNIX. New features include the -k, -S
and -q options, supporting the $(MAKE) macro, and several bug fixes.
A 16-bit MS-DOS compress is also available. Most versions for MS-DOS cannot
handle 16-bit compression tables (the default on most UNIX systems). This
version can, and is based on, the Compress 4.0 UNIX sources. It requires about
400K to run. Posted as Volume 9 Issue 5 by Doug Graham.
Steve Tynor has his head in the clouds to provide us with FPLAN, a flight
planning program intended for use in general aviation. It reads a file
consisting of departure and destination airports, navigation aids,
intermediate checkpoints, fuel consumption rates and winds aloft and produces
a flight plan with wind corrected heading, fuel consumption for each leg, vor
fixes for each checkpoint (Volume 9, Issues 11-16).
Lastly is popi, a program to perform interactive digital image
transformations. Based on the program described in the book Beyond
Photography--The Digital Darkroom by Gerald J. Holzmann, this implementation
by Rich Burridge consists of an interactive previewer and a digital matrix
transformation system. Popi can perform transformations on arbitrary images in
grey scale. A sample image is included to show how to invert the grey scale
(make it a negative), frame it (crop), and solarize it (fancy signal
processing). Popi includes a postscript printing facility and modules to allow
it to work for Amiga, Apollo, Atari, IBM PC, Kermit, MGR, NeWS, SunView, X11
and XView systems. The nine parts are Volume 9 Issues 47-55.

Listing 1
#!/usr/local/bin/perl
eval "exec /usr/local/bin/perl -S $0 $*"
if $running_via_sh;
# A cute Berkeley style df formatter for those running USG df
# Do what you want with it; it's yours.
# R. Craig Peterson, N8INO
$fs=join(' ',@ARGV);
open(Df, "df -t $fs ") die "Can't run df.";

format top =
Filesystem kbytes used avail capacity iused ifree %iused Device

format STDOUT =
@<<<<<<<<<<<<<< @>>>>>> @>>>>>> @>>>>> @>% @>>>>> @>>>>> @>% @<<<<<<<<<<<<<<
$fs $kbytes $used $avail $capacity $iused $inodes $piused $name

while (<Df>) {
if (/total blocks/) {
($d,$tblocks,$d,$d,$tinodes,$d) = split(' ');
$tinodes *= 8;
$kbytes = $tblocks / 2;
$used = ($tblocks - $blocks) / 2;
$avail = $blocks / 2;
$capacity = int(100 - ($blocks / $tblocks * 100));
$iused = $tinodes - $inodes;
$piused = int($iused / $tinodes * 100);
write;
$tot_kbytes += $kbytes;
$tot_used += $used;
$tot_avail += $avail;
$tot_iused += $iused;
$tot_inodes += $inodes;
$tot_tinodes += $tinodes;
} else {
s/\(\s*/ \(/;
s/\s*\)/\)/;
($fs,$name,$blocks,$d,$inodes,$d) = split;
$name =~ s![(): \t] /dev/dsk/!!g;
}
}
if ($#ARGV < 0) {
$kbytes = $tot_kbytes;
$used = $tot_used;
$avail = $tot_avail;
$capacity = int(100 - ($avail / $kbytes * 100));
$iused = $tot_iused;
$inodes = $tot_inodes;
$tinodes = $tot_tinodes;
$piused = int($iused / $tinodes * 100);
$fs = 'Totals:';
$name= '';
write;
}






















Code Base 4


Darren Forcier


The author is a consulting dBase and Clipper programmer. He holds a bachelor's
degree in computer science from Central New England College of Technology in
Worcester, Mass. He can be contacted at 253 Main St., Cherry Valley, MA 01611
(508) 892-3351.


Code Base 4 from Sequiter Software, Inc., is a set of C routines which allow
you to access dBase III+ and IV files and perform screen I/O similar to
dBaseIII+. Code Base 4 includes support for dBase .NDX index files, Clipper
.NTX files, memo fields, and networked applications. The new dBaseIV .MDX
(multiple index) files are not yet supported. Code Base 4 also adds some new
levels of functionality with special routines for windows, menus, and memory
handling.
Code Base 4 supports dBase/Clipper functionality with 12 categories of
routines:
conversion routines
database routines
expression evaluation
field routines
get routines
memory handling routines
index file routines
memo routines
menuing routines
utility routines
windowing routines
extended routines
The conversion routines convert data from one format to another. For example,
c4dt_dbf() converts Julian dates from the internal double representation to a
string formatted CCYYMMDD (century, year, month and day).
The database routines are the main meat of the Code Base 4 package. They allow
C programmers to access and store information in the well-established dBase
.DBF file format and include C function equivalents of many dBase commands.
Due to the more stringent nature of compiled C programming as opposed to the
dBase interpretive environment, some differences exist. Table 1 compares dBase
commands to their Code Base 4 function counterparts.


Expression And Indexes


With the dBase LINK_DEST 125 command, one may use any valid dBase expression
to interactively build an index. For example, to create an index for an
accounts receivable file based on invoices sorted by invoice date, you could
use the following command:
INDEX ON STR(YEAR(invc_date))+; STR(MONTH(invc_date))+STR(DAY(invc_date)); TO
INVOICE
The STR(YEAR(invc_date.... portion is an "index expression". dBase evaluates
this expression from left to right at runtime, and builds an .NDX file based
on the expression. Code Base 4 uses an internal expression parser similar to
dBase's. Several of the expression functions are available to advanced
programmers who might wish to build an SQL-like front end to their
application.
The field routines access individual fields and information pertaining to the
fields (e.g., their name, length, and data type).
The get routines form the basis for data entry. The get routines arrange field
data entry blocks and control field-to-field navigation within that block. As
with dBase, you can control the get field in various ways, through pictures
and validation clauses. Unlike dBase, Code Base 4 also allows you to access
the individual attributes of each get field, allowing each Code Base 4 field
to have its own color, brightness, or other attribute setting.
Programmers with strong memory management background can use the Code Base 4
memory handling routines to optimize their applications memory usage. Routines
are included to allocate, reallocate, and free memory from the internal Code
Base 4 structures.
Index files are used to access data in sorted order. With Code Base 4, either
dBase .NDX files or Clipper .NTX files can be used. Like dBase and Clipper,
Code Base 4 automatically updates all open index files when a record is
written. However, with Code Base 4 you can use lower-level routines to access
and update the index files directly.
The memo routines access dBase variable length memo fields (pointers to a
separate text file containing free format text).
Unfortunately, Code Base 4 doesn't provide an editor for memo fields. A small
Wordstar-compatible editor window function should have been included in the
package, with definable screen coordinates. Nantucket's Clipper provides a
MEMOEDIT() function which takes screen coordinates as its parameters, and
allows you to edit memo fields in a word-wrapping window.
The menuing routines include pulldown, popup, vertical, horizontal, and Lotus
1-2-3 style menus.
The utility routines provide some low-level functions for parsing and
validating file names, sorting arrays, and locking and unlocking regions of a
shared file.
The windowing routines display and manipulate regions of the I/O screen. The
windowing routines form the basis for the menuing routines.
The extended routines support some of dBaseIII and dBaseIV's extended
functions, including record filtering (a subset view of a file based on some
selection criteria), dBase relations (conditions which synchronize the
movement of record pointers in separate files), an edit function
(interactively change records), a record listing function, and an insertion
function (adds blank records at any point).
Code Base 4 comes with complete source code and conditional compilation
switches for Clipper .NTX compatibility, no screen management, OS/2 support,
Turbo C compatibility and Microsoft Windows support. A batch file is supplied
for performing customized compilations of the library.
dBase programs ported to Code Base 4 should be faster than the interpreted
dBase original and smaller than a version compiled by Clipper. The Clipper (t)
compiler from Nantucket Software speeds up program execution greatly, but due
to library overhead, Clipper applications have a minimum size of 150K.
Complete applications are typically about 300-350K. This makes it difficult to
run a Clipper program with many TSRs loaded or in a multitasking environment
such as DoubleDos. With Code Base 4, the linker only links in exactly what it
needs from the library, so the application is much smaller.
Code Base 4 assumes the user has a lot of C and dBase experience and a good
command of C pointers, memory allocation, and structures. Beginning C
programmers may quickly find themselves lost. The package makes heavy use of
linked lists to model database, field, window, and menu structures. Code Base
4 is primarily a tool for the programmer who has mastered C and dBase and
wants to transcend the limits of dBase and Compiled dBase (Clipper,
Quicksilver).
The wire-bound, 200-page Code Base 4 manual, while comprehensive, has no
tutorial or examples section. Each chapter is broken down into a functional
area of the Code Base 4 library, and each library function is documented for
that section. Some function descriptions have small source code examples;
others do not. Several sample programs are supplied on diskette. The samples
are well written, and illustrate examples of how to build menus, windows, and
database applications. I liked the manual's organization but often found the
function descriptions a little thin. A front-end tutorial section with plenty
of examples would help this manual greatly.
Technical support is available by phone.
Overall, Code Base 4 does the job for which it is intended: it brings a
dBase-like programming environment to C, and allows you to share data between
dBase and C programs. It does have some limitations. Using an external editor
to access dBase memo fields is a little awkward.
I have worked with several other dBase C libraries. APEX ADL from Apex
Software offers similar support for dBase file and index creation and access,
but doesn't include any menuing, windowing, or get functions.
I would recommend Codeview4 to any programming shop that needs to transcend
the limits of dBaseIII+ and Clipper. Code Base 4 can address speed, memory
requirements, or specialized dBase compatible applications (such as a memory
resident TSR). For straightforward applications development, dBase and Clipper
are still pretty powerful programming environments.
The MS-DOS version is listed at $295 and the UNIX version at $495. For more
information contact Sequiter Software Inc., P.O. Box 5659, Station L,
Edmonton, Alberta T6C 4G1 (403) 439 8171; FAX (403) 433-7460.
Table 1 Dbase Commands & Their Codebase4 Equivalents
Dbase Codebase4
Command/Function Function Description
--------------------------------------------------------------
go bottom d4bottom() Go to last record

go top d4top() Go to first record
use d4close() Close individual .DBF file
close all d4close_all() Close all open database files
create d4create() Create a new .DBF File
delete d4delete() Mark a record for deletion
deleted() d4deleted() Return TRUE if record deleted
go <record#> d4go() Go to a specific record #
rlock()/flock() d4lock() Lock a portion or all of file
pack d4pack() Remove deleted records
recall d4recall() Undelete marked records
reccount() d4reccount() # of records in file
recno() d4recno() Current record number
seek "key-val" d4seek() Look up record in index key
select d4select() Make database area current
skip <#records> d4skip() Move record pointer #records
unlock d4unlock() Unlock File/Record
use <Filename> d4use() Open a database file
replace d4write() Update database record
append blank d4write(0) Append a blank record
zap d4zap() Delete all records & pack

Listing 1 test.c
#include <d4base.h>

#define SAFETY_ON 1 /* d4create will return -1 if database already exists.. */
#define SAFETY_OFF 0 /* d4create will overwrite database if it exists... */


/* Declare Field Structure for database */

static FIELD FIELDS[] =
{
/* Field Name, Type, Width, Dec, Offset */
{"FIRST_NAME", 'C', 25, 0, 0 }, /* Char 25 */
{"LAST_NAME", 'C', 25, 0, 0 }, /* Char 25 */
{"COMPANY", 'C', 30, 0, 0 }, /* Char 30 */
{"TELEPHONE", 'C', 12, 0, 0 }, /* Char 12 */
{"LAST_SALE", 'N', 12, 2, 0 }, /* Numeric */
{"LAST_DATE", 'D', 8, 0, 0 }, /* Date */
{"GOOD_CUST", 'L', 1, 0, 0 } /* Logical */
} ;

int create_name( void ); /* Prototype for CUSTOMERS.DBF creation function */

main()
{
int rc; /* Return Code */
rc = create_name();
printf("\n rc return code was %d",rc);
return;
}

int create_name( void )
{
int rc; /* Return Code */

rc = d4create("CUSTOMER.DBF",7,FIELDS,SAFETY_OFF);
return rc;
}
































































Advanced C: Tips And Techniques


Randy Hohl


Randy Hohl is a consultant with Interactive Systems Corporation. He has worked
as an industry programmer for six years and has bachelor's degrees in computer
engineering and psychology. He can be reached at (708) 505-9100 or randy@ i88.
isc. com.


Advanced C: Tips and Techniques is the third installment in a series of four
books on C Programming in the Hayden Books C Library. The first two books
present the fundamentals of C. Advanced C emphasises portability, compiler
code-generation, and execution speed. The book is authored by the team of Paul
and Gail Anderson. The authors discuss some of the more difficult components
of C using a variety of clever programming techniques.
The book is organized into six chapters and five appendices. Chapter one
serves as a C refresher, the remaining chapters each cover a set of related
topics. Every chapter concludes with a number of programming exercises. Each
appendix details the features of a particular C compiler.
Chapter one introduces some portable techniques for swapping variable
contents, ASCII-to-integer and decimal-to-hex conversions, and bit-level
operations. The techniques recur throughout the book and are presented as
program examples employing an advanced usage of unsigned variables, unions,
casts and macros. This chapter sets the theme for the authors' low-level
approach to writing efficient and portable C; some of the examples are simply
assembly-language tricks converted to C.
Chapter two provides a comprehensive explanation of the C runtime environment.
The compiler's distribution of program statements and variables into the text
area, data area, the stack, and the heap is examined. This chapter answers
questions like: "Where does a literal string live?" and "Is it faster to
initialize static or automatic variables?" Examples are provided which
demonstrate the compiler-mapping of program variables by printing the hex
address of variables at runtime. I bought the book for this chapter and was
thoroughly pleased. A complete source-code solution to the fragmentation of
heap memory resulting from numerous runtime allocations is developed in
chapter six.
You may not find a need for arrays of two or three dimensions very often, but
if you do, chapter three, the longest in the book, is probably the most
complete treatment of the subject available. Portions of this chapter read
like formula derivations and proofs in a math text. This method is used to
derive equivalent pointer expressions for multi-dimensional array references.
The derivations are used to demonstrate the storage map equations used by a C
compiler to evaluate array references. Other compiler formulae are also
presented. I found this chapter to be rather complicated, but still valuable.
The authors show two methods to increase the speed of array referencing.
Method one uses compact pointer expressions, such as *ptr++, rather than
sequential pointer offsets, such as array[offset]; offset++. The argument here
is that a compact pointer expression maps directly to a single assembler
instruction. Method two uses pointer array declarations rather than
multi_dimensional arrays. The methods are incorporated into a suite of ten
benchmark programs which the authors have timed on 286, 386 and 68020
machines. The results show an average performance improvement of 27 percent
using the optimized methods across the three architectures.
Ever been faced with a C declaration like, char *(*(*buf[20]) ())[10]? Complex
declarations are succinctly deciphered in chapter four. The authors use a
presentation mode developed in the preceding chapter, namely the repeated
application of a rule, in this case the "Right-Left" rule. The Right-Left rule
is simple and works nicely for both creating and reading complex declarations.
This chapter also explains how to use varargs.h to create portable routines
that accept a variable number of arguments.
Debugging techniques in C is the topic of chapter five. The authors develop
six categories of custom debugging tools. Each category is predicated on
surrounding utilities, including the C preprocessor, the assert() library
routine, and the UNIX signal() system call. Some of the tools have the
advantage of providing a variable degree of error-checking without the need to
re-compile the original source, a big advantage in large systems. Where
significant compile- or run-time overhead is required to implement a custom
technique, the authors are quite frank about the tradeoffs involved. The value
of some of these techniques took a while to sink in.
The appendices of the book are a well-written description of the AT&T C
compiler and four different compilers targeted for Intel processors. All the
options and all the memory models provided by each compiler are described.
Chapter two's discussion of the C runtime environment is prerequisite
knowledge for understanding the memory models. New constructs from the
proposed ANSI C Standard are discussed for compilers which are supportive.
The book includes an order form for a set of two floppies, one containing the
C source code for all the program examples, the other containing solutions to
all the exercises. The price is $39.95. I examined the contents of both disks
and was impressed. The examples disk contains over 150 .c files; each file
contains a stand-alone C program drawn from the book. For book examples which
are only code fragments, the fragment is expanded into a complete program.
Nearly all examples make judicious use of printf statements to illustrate the
subject matter. The solutions disk contains over 70 .c files, each an
individual program. The solutions are complete and contain ample explanatory
notes as C comments. Two of the chapter three solutions are timed on the 286.


Conclusions


I have mainly bravos for this book. Each topic is covered from both the
compile-time and runtime perspectives. The authors incorporate portability and
efficiency throughout, using the proper features of C. The presentation is
well-paced and properly organized. All major points are illustrated with
appopriate-length program examples. Some program examples are reused and
enhanced in the light of the current topic. Advanced usage of some features of
C, such as macro definitions with arguments, typedef variable types and
compact conditional operator expressions, are shown implicitly in the program
examples. Proven techniques, such as the benefit of compact pointer
expressions, are used in subsequent programs. The use of customized standard
header files is especially creative.
Each chapter can be digested as a single entity, allowing the reader an
arbitrary perusal. The content of the exercise questions follows logically
from each chapter's points; for instance, some of the chapter three exercises
ask you to extend program examples to arrays of four dimensions.
The book's preface lists the eight combinations of five machines and seven C
compilers in which all program examples were compiled and executed. The
execution results of program examples in specific combinations is a regular
part of the narrative. I was able to successfully compile select program
examples and exercise solutions on an Amdahl 580 under UTS and an AT&T PC 6300
under MS-DOS.
Chapter two's discussion of the runtime environment is valuable for
experienced programmers who are learning C; it may aid in preventing dangling
pointer bugs. The remainder of the book is primarily useful for experienced C
programmers, particularly those who are beyond "make it work" and are into the
"make it clean" and "make it fast" stages. A working knowledge of the routines
in a standard C library and their interfaces is also required. Proficiency in
binary arithmetic is needed to work through the bit-level operations. I would
also recommend a familiarity with UNIX shell commands and the UNIX System V
system call, signal(). (The authors neglect to identify the signal() in the
book as the System V, rather than Berkeley, version). I would have preferred
some header file examples from an operating system library other than UNIX.
Despite this bent toward the UNIX/XENIX family, the tips and techniques
presented in the book are beneficial across operating systems.
Advanced C: Tips and Techniques
Paul and Gail Anderson
Howard W. Sams & Co. 1988
$24.95, 446 pages.





























Publisher's Forum
In this issue we again present -- as part of our continuing public service
program -- the winners of the International Obfuscated C Code contest (see Don
Libes' Column). As always, the winning entries are perversely clever and
entertaining -- and frightfully effective demonstrations that "clever and
entertaining" don't build understandability.
Least there be any misunderstanding, we do not encourage these obscure
programming practices. We publicize the results only as a public service -- to
give this dangerous form of expression a safe outlet. "Please, please, boys
and girls, don't attempt this trick at home. Remember these are highly trained
professionals." (We hope.)
I suspect an eastern mystic would take satisfaction from the pre-processor's
critical role in many of the winning entries. When used simply and directly,
the pre-processor contributes as much to understandability and maintainability
as any language component. But when purposely exploited -- when the
pre-processor's latent powers are stretched to achieve non-obvious ends --
understandability and maintainability are seriously damaged.
"Too much of a good thing" and all that.
We'll continue our public service campaign in the next issue by announcing the
winners of our Bad C Pun Contest. Like the obfuscated code contest, the bad
pun contest is intended as a "safety valve". Through this contest we were able
to capture and destroy several hundred distressingly bad C puns, hopefully
removing them from circulation, at least for a while.
An independent judge selected the most nearly humorous groaners for prizes. We
saved only the winners, and will share those with you in the next issue. But,
please don't expect much -- remember this was a "bad" pun contest, and bad
they were.
Sincerely yours,
Robert Ward
Editor/Publisher



















































New Products


Industry-Related News & Announcements




Expert Analyzes Code Quality


Conley Computing is shipping Codecheck, a rule based expert system that checks
C and C++ source code for maintainability, portability and compliance with
in-house style guidelines. Codecheck evaluates the portability of C to various
dialects, including ANSI, K & R, Harbison and Steele and C++. Codecheck has
been designed to target code for compatibility between PC-DOS, OS/2,
Macintosh, UNIX and VMS.
Codecheck also provides a statistical analysis of code complexity and style.
Versions are available for MS-DOS, Macintosh, OS/2, AIX, PC/IX, and QNX.
Priced from $495. Contact Conley Computing Corp., 7033 S.W. Macadam Ave.,
Portland, OR 97219. (503) 244-5253; FAX (503) 244-8375.


ProtoView Development Tool Produces Windows Code


ProtoView Development Corporation has released an application development tool
for C programmers working in the Microsoft Windows environment. The Protoview
Screen Management Facility includes a WYSIWYG screen painter and code
generator that produces source code, header, resource, definition and make
files for a complete, executable version of each screen. ProtoView contains a
dynamic link library of over 150 C language functions and a second dynamic
link library that contains nine types of editing controls.
With a single function call, table-driven, pop-up windows can be incorporated
into an application, including calculator-style data input, date, money,
string, real and security field objects. All fields can be edited. Fields can
be mandatory, protected, alphabetic, numeric, uppercase, range checked, choice
checked and even looked up in a table.
ProtoView works with any database or communications package that can be linked
under Windows. the package includes a 350-page manual, source code for all
field controls, and the source for building the dynamic link library of
controls.
ProtoView applications are SAA/CUA compatible and carry no runtime or
licensing fees. ProtoView requires the Microsoft Windows SDK, Microsoft
Windows 286 or 386 and Microsoft C v5.0 or later.
Price $595. Demo versions for $15. Contact ProtoView Development Co., 162
Kingdom Ave., New York, NY 10312 (718) 948-5195.


Panel Plus License Revised


Roundhill Computer Systems Limited have announced a new licensing policy for
their PANEL Plus II screen manager and screen library package. The new license
will include source code for the screen design editor and code generators.
Pricing for multi-user systems will be based upon the number of programmers
instead of the system architecture. Single user MS-DOS versions remain $495.
Contact Steve Hersee (USA) at (708) 690-3737, FAX (708) 665-9841 or Tim Frost
(UK) at 0672-84535; FAX 0672-84525.


C++ Class Library Supports Matrices


The M++ Matrix Class Library from Ansys Software Co., Inc., allows C++ users
to declare dynamic matrices or arrays. M++ matrix objects support the direct
manipulation of arrays, matrices or other groups of data, much as do symbolic
matrix languages like Matlab, GAUSS or APL, but retain all the portability and
speed advantages of C++.
M++ implements generalized submatrices, allowing a programmer to write and
manipulate general subsets of a given matrix or array, a feature particularly
useful to those developing systems utilizing time-series data. Submatrices
allow the programmer to view and manipulate data in alternate ways without
physically reordering, restoring or rereading the data from disk.
The class library handles and isolates memory allocation and provides
compile-time selectable bounds checking, aiding in the debugging of complex
programs.
The M++ library provides int, float, double, complex, submatrix, index and
decomposition matrix classes which can be extended, modified or limited via
inheritance. Also, full C++ source code is available to support direct
customization.
The M++ Matrix Class Library is available for MS-DOS, OS/2 and UNIX C++
compilers conforming to C++ versions 1.2 or 2.0. Prices start at $195. Contact
Ansys Software Co., Inc., 16950 151st Ave. SE, Renton, WA 98058 (800) 366-1573
or (206) 228-3170.


CASE Tool Adapted To Nets


Syscorp International has released MicroSTEP 1.4, a network version of its
STEP CASE tool. MicroSTEP produces networkable applications for Novell, IBM
token ring and Netbios networks. MicroSTEP's mouse-driven, graphic
specification environment includes integrated design tools to: build data flow
diagrams, specify data structures, layout screens, format reports, and
describe an application's computations and logic. The $6000 system (training
included) generates C. Contact Syscorp International, 9420 Research Blvd.,
Suite 200, Austin, TX 78759 (800) 727-7837.


LALR Updates Parser Generator


LALR research has released LALR v4.0, featuring: extended BNF with regular
expressions, operator precedence, and tree building notation; derivation
tracing of grammar conflicts; automatic symbol-table and abstract-syntax-tree
construction; smaller parser tables and faster parsing times; and fast
generated scanners. Contact LALR Research, 1892 Burnt Mill Rd., Tustin, CA
92680. (716) 832-2274.


Expert Packaged As C Function



Hy-phen-ex, a hyphenation expert packaged as a C function, is now available
from GeoMaker Software. Hy-Phen-Ex applies over 4800 rules to (American)
English text to identify places where a word may correctly be divided.
Hy-Phen-Ex will even rank alternatives, if a word has more than one acceptable
dividing point.
The price is $89. Contact Geomaker Software, P.O. Box 273124, Concord, CA
94527 (415) 680-1964.


Borland Opens Paradox Engine


Borland International has opened the Paradox Architecture to C programmers
with a new C library product, the Paradox Engine, that enables programmers to
build applications that create and access Paradox data. The Paradox Engine API
includes more than 70 functions (for single and multi-user environments) to:
create, read and write Paradox tables, records and fields; support multi-user
concurrency control; access tables sequentially or via indexes; and handle
security tasks.
The $495 package is expected to ship during first quarter and will be
available for $195 during a 90 day introductory period. Contact Borland at
1800 Green Hills Rd., Scotts Valley, CA 95066.


Case Tool Supports Serial Terminals


CASET Corporation has released a new version of its Software Engineering
Toolkit (SET) that supports multiple, overlapping windows, buttons and dynamic
menus for color or monochrome serial devices. This release supports DEC VT and
Tektronix type terminals on Apollo, Digital, Hewlett-Packard, Silicon
Graphics, Sony and SUN CPUs, running Aegis, Ultrix, UNIX, or VMS.
SET supports prototyping, development and management of the user interface and
dialog portion of an application. Version 3.6 supports graphical interactive
user interface development, including complete window (stack, pop, move,
resize, copy, scroll, and delete) and dialog management, interface layout, and
code generation facilities on serial terminals. Form support includes
scrolling lists, buttons, toggles, and input type and range validation. User
interactions are managed through a context sensitive command interface which
can include a command line, pop-up, pull-out, or static menus, buttons, forms,
prompts, input validation, hierarchical help text, and a configurable
keyboard. An optional 2D/3D graphics system integrates into the windowing
system with zoom and pan functions managed by SET. All color, graphics, and
text features supported by the terminal can be accessed via SET.
SET generates C or Fortran code. Price $925. Contact CASET Corporation, 33751
Connemara Drive, P.O. Box 939, San Juan Capistrano, CA 92693 (714) 496-8670;
FAX (714) 661-5463.


Abraxas Releases More Toolkits


Abraxas Software is now shipping toolkits that allow COBOL, ADA, and FORTRAN
to be embedded into applications. The toolkits run in conjunction with Abraxas
Software's existing PCYACC and MACYACC products. Contact Abraxas Software,
7033 S.W. Macadam Ave., Portland, OR 97219 (503) 244-5253; FAX (503) 244-8375.


Peritus Releases C++ Compiler


Peritus International has released an ANSI C and C++ compiler for 386/486 UNIX
systems.
The compiler offers switch-selectable support of K&R and ANSI dialects as well
as C++. Peritus C++ is implemented as a "true" compiler, rather than as a
pre-processor pass. Available code optimizations include: global register
allocation, constant propagation and folding, backward code motion with loop
invariant removal, induction variable elimination, redundant store and dead
code removal, and constant elevation.
Peritus has recently licensed its compiler technology to Amdahl Corporation
for use with Amdahl's UTS operating system. Peritus technology has previously
been selected by Apple, Control Data Corporation and Concurrent Computer
Corporation for use in proprietary compilers.
The Peritus C++ Compiler is currently available for 386/486 systems under SVR3
UNIX and SunOS 4.0 UNIX for $1000. Contact Peritus International, 10201 Torre
Ave., Suite 295, Cupertino, CA 95014 (408) 725-0882.


Avocet Integrates Embedded Tools


Avocet Systems, Inc., has announced AvCase, an integrated development
environment for embedded systems. AvCase includes an editor, C compiler,
assembler, linker, and simulator/source level debugger.
AvCase is scheduled to ship in February 1990. It runs on PC-clone and requires
no special hardware. The first release will target the Intel 8051 family.
Later releases will target 68HC11, 6801, Z80 and 68000 products.
Price for the entire package is $1895. Modules are also available separately.
Contact Avocet systems, Inc., 120 Union St., P.O. Box 490, Rockport, ME 04856
(800) 448-8500; FAX (207) 236-6713.


Computer Innovations Upgrades QNX Compiler


Computer Innnovations, Inc., has released a major upgrade of their compiler
for the QNX operating system. C86 v3.10 for QNX now includes dynamic linked
libraries, a source-level execution profiler, a quick make utility, an
intelligent diff file comparator, and a strip executable field minimizer. This
release also improves compiler efficiency, sourcel level debugging facilities,
documentation, and the libraries.
Dynamic linked libraries allow users to build programs that are smaller,
require less memory and load faster. Users can build their own dynamic
libraries as well as using C86 supplied libraries.
The source level profiler tracks source level constructs, including files,
modules, functions and lines.
Contact Computer Innovations Sales, 980 Shrewsbury Ave., Tinton Falls, NJ
07724 (201) 542-5920.


Microsoft Ships OS/2 v2 SDK


Version 2.0 of the Microsoft OS/2 Software Development Kit (SDK) with
Presentation Manager is now being shipped. Though developed as a joint IBM and
Microsoft product, the pre-release is available through Microsoft. The $2,600
kit may be ordered directly from Microsoft by calling (800) 227-4679.


CSL Ported To SCO UNIX



CSL, a scientific programming library, is now available for SCO UNIX System
V/386. CSL includes linkable modules for linear algebra, eigensystems, matrix
computations, time series, smoothing, filtering and prediction, statistics,
regression, linear and integer programming, optimization, differential
equations, interpolation and curve fitting, and solutions for nonlinear
equations. Licensing options include single end-users, multi-users,
professional developers and site. Prices start at $295. Contact Eigenware
Technologies, 13090 La Vista Drive, Saratoga CA 95070 (408) 867-1184.


Planned Lattice Compiler Packages Will Include Dos Extender


Lattice, Inc., plans to release its new 80286 and 80386 C Development Systems
for MS-DOS and OS/2 on March 1 and April 1 respectively.
According to Dave Schmitt, Lattice president, "These new C compiler packages
will feature a complete programming environment, including a DOS extender,
compiler, assembler, debugger, editor, global optimizer, programming
utilities, and nearly 800 library functions."
The Lattice 80286 C Development System runs under MS-DOS, Extended DOS, or
OS/2 to create a single executable program which can run under MS-DOS,
Extended DOS, or OS/2. The package includes a royalty-free DOS Extender which
developers may include with their software at no charge. Lattice's DOS
Extender allows users to run programs of up to nearly 16 megabytes. An
included configuration optimizer performance tunes the DOS Extender.
Lattice's 80386 C Development System takes advantage of the 80386 and 80486's
32-bit processing both for the compiler and generated programs. The system
runs under Lattice's 80386 Extended DOS, PharLap's Extended DOS, or OS/2 v2
and generates programs which run under Extended DOS or OS/2 v2. The compiler
is upwardly compatible so it will accept source code written for MS-DOS or
OS/2 v1.
These new compilers introduce Lattice's "Extended Family Mode." In this mode,
Schmitt explains, "a single program can run under either Extended DOS or OS/2.
Software developers only need to maintain one program even though their
customers use the program under several different operating systems."
The 80286 C Development System is priced at $495; the 80386 version at $900.
Contact Lattice, Inc., 2500 South Highland Ave., Lombard, IL 60148 (708)
916-1600; FAX (708) 916-1190.


Library Processes 'Live Video'


Victor, a new C library from Catenary Systems, supports image processing
applications. The package operates on gray scale images from any source.
Victor includes image processing functions like sharpening filters, outline,
linearization, and matrix convolution. Among the video digitizer support
functions are functions to display 'live video' on a VGA adapter at rates
varying from two to 15 frames per second.
Victor also includes several resize functions, enabling applications to resize
images directly to a VGA, to the digitizer display, or to an image buffer.
Prices begin at $195. Contact Catenary Systems, 470 Belleview, St. Louis, MO
63119 (314) 962-7833.


Debugger Works With Archimedes


A new version of Softaid, Inc.'s source level debugger now supports the
Archimedes v 3.0 C compiler. The debugger also supports Softaid's line of
in-circuit emulators. Price $795. Contact Softaid, Inc., 8930 Route 108,
Columbia, MD 2104 (800) 433-48812.


Tool Directs Methodology


Silico-Magnetic Intelligence has introduced Better-C, a coding methodology
manager and program generator.
According to the vendor, Better-C was developed with the assistance and
cooperation of C, structured programming, and AI experts. The Better-C
methodology incorporates complexity management, natural language naming,
top-down design, and object-orientation.
Objects in Better-C are created by an "open" and referenced via a handle, much
as are C files. Objects may be arbitrarily complex structures, such as trees,
lists, databases, or windows.
Better-C is compatible with all major compilers and runs on a PC-clone under
MS-DOS v2.0 or better. Price $98. Contact Silico-Magnetic Intelligence, 24
Jean Lane, Chestnut Ridge, NY 10952 (914) 426-2610.


Mi-Shell Sports Own Debugger


OPENetwork is set to release Mi-Shell, a configurable MS-DOS shell on April
15, 1990. Mi-Shell's "point and shoot" interface is accompanied by a
FORTH-like script language (complete with debugger) which allows users to
define the display and actions to be executed when a key is pressed in a
certain environment. Price $89. Contact OPENetwork, 215 Berkeley Place,
Brooklyn, NY 11217 (718) 638-2240.


JDYX Releases UNIX Graphics


JDYX Enterprises is now shipping v3.0 of their 80386 UNIX Graphics Library, a
source code library supporting EGA/VGA/SVGA graphics on 80386 UNIX systems
including Interactive 386/ix, AT&T System V/386, and Xenix 386 v2.3.
The JDyx library supports twelve video modes -- through 800 x 600 x 16 and 360
x 480 x 256 on all VGA cards and through 640 x 400 x 256 on cards with the
Paradise chip-set. The routines support concurrent graphics applications on
different virtual terminals.
These routines do not use the BIOS or Xenix CGI interface, but directly access
the video card. Primitives such as point, line, solid, bibblt, ellipse and
clipping are supported, and all sixteen color routines have 12 different alu
operations. A bus mouse software cursor is also implemented. The library is
designed so that one binary can run on different adapters as well as in
different video modes.
Source licenses are $199, binary licenses $99. Contact JDyx Enterprises, 907
Tuxworth Circle, Decatur, GA 30033 (404) 320-7624.


Oregon C++ Now On Tower


NCR Corporation, Europe Group, and Oregon Software, Inc., have signed an
agreement to port Oregon C++ to the NCR Tower 32 Series. NCR will refer
customers for the $3000 package to Oregon and its distributors. Contact Oregon
Software, 6915 S.W. Macadam Ave., Suite 200, Portland, OR 97219 (503)
245-2202; FAX (503) 245-8449.



Solution Systems Releases Brief 3.0


Solution Systems has released Brief v3.0, which includes a C-like macro
language and a translator to convert macros from the original LISP-like
syntax. Registered owners of Brief v2.1 can update for $70 plus shipping.
Contact Solution Systems, 541 Main Street, Suite 410, So. Weymouth, MA 02190
(800) 821-2492.



























































New Releases


CUG307 ADU & COMX (Device Driver)


Submitted by Alex Cameron (Australia), ADU is a disk utility program designed
to work with both the IBM PC standard and non-PC disk formats. By choosing an
option from the main menu, you can analyze the disk format, then read and
write the contents of the disk, sector by sector. The menu is also
user-configurable so that the disk parameters can be adapted to almost any
conceivable disk format. The initial alien disk parameters are derived by
scanning the disk and building up a disk_base table, which may then be
modified by the user. The disk includes C source code and well-written
documentation revealing the low-level detail of the PC's disk drive
configuration, not available anywhere else. The program is compiled under
Turbo C v2.0 or v1.5. No assembly is required.
Submitted by Hugh Daschbach (CA), COMX, an MS-DOS communication port device
driver, is an answer to a question posed by Jose Alfonso Corominas (Question &
Answers, CUJ November 1989, page 52). COMX provides buffered I/O to a serial
port with optional XON/XOFF flow control through standard read/write requests
or interrupt 0x14. The program uses mixed memory models. COMX.C is compiled
under the small model with explicitly declared far pointers and a front end
program forces the linkage editor to produce a tiny model executable. This
program is specifically written for Microsoft C (v5.0 or later) and some
assembly code comes with the C source code.


CUG308 MSU, REMZ & LIST


Dinghuei Ho (WA) has submitted MSU, an educational simulation of simple
computer organization and operation. MSU can simulate a computer that has a 4K
word memory space (each word is 32 bits), a CPU that includes four segment
origin registers (code segment, input segment, output segment, and workspace
segment), instruction register, program status register, a card reader and
line printer for input/output, and a clock. Using merely 10 basic
instructions, you can operate this computer and derive output. The program
runs under VMS on the Dec VAX 8820, but you can port it to other environments
by modifying the code.
Bob Briggs (CA) has submitted REMZ, the classic Parks-McClellan-Remez FIR
filter design program based on the FORTRAN version appearing in Theory and
Application of Digital Signal Processing by Rabiner & Gold (Prentice Hall).
The program compiles under Turbo C or Quick C.
Michael Kelly (MA) has submitted LIST, an object-oriented implementation of a
linked list using C. LIST is able to imitate C++ notation
(address_list.sort()) by defining a general structure whose fields are
pointers to functions, each corresponding to the operations of an object.


CUG309 6809 C Compiler for MS-DOS


Brian Brown (New Zealand) has ported CUG221 6809 C for FLEX to MS-DOS.
Modifications allow the program to run with ASxxxx assembler (CUG292), as well
as with Motorola AS9 assembler. The program also generates ROMmable code. The
disk includes a complete set of C source code, well-written documentation, and
a run-time library such as routines for controlling the ACIA serial port,
functions for character handling and data conversion between character strings
and integers, routines for controlling a Hercules card, routines for a
magnetic card reader, memory manipulation routines, PC serial card functions,
and string handling functions.







































We Have Mail
Dear Mr. Ward,
I felt that some of the points raised by Phil Cogar in the letter published in
the Jan. '90 issue deserved a response, although I don't know whether such a
response fits within the subject range of your magazine.
Evidently, there is a market for language translation tools, at least three of
which are advertised in this issue at relatively reasonable prices. I hope to
get an opportunity to try some of them myself.
Rex Jaeschke has pointed out that C may not be the most cost-effective
language for development or maintenance of software which fits the design of
other languages, such as Fortran. Assuming that one has decided to use
software developed in another language as part of a system written in C,
several courses of action are available. These might include translating once
and discarding the original, modifying the original to the extent necessary to
maintain satisfactory parallel versions in two languages, or building a system
in more than one source language. Any combination of these methods might be
valid, and either of the first two would benefit from translation tools.
Considering the difficulties of translation and the undesirable practices
found in most freely available code, no translator can pretend to be able to
generate bug-free code. It is usually easier to compile the code in the
original language and verify operation with a few test cases, and maybe even
clean it up and retest it before translating.
One of the problems with a multi-language system is that the interfaces
between languages are not always satisfactory, never covered by any
nonproprietary standard, and unlikely to be subject to any of the usual safety
checks, such as lint. This often leads to errors, like my forgetting that I
was writing about a matrix set up by C rather than one set up by Fortran.
A compromise which often works well is to choose one language as the primary
one, with only low-level functions with simple calling interfaces written in
the other language. This may be no more than a minor extension of the way the
language system is actually written, as in the case of a Fortran runtime
library much of which is written in C.
On many modern systems, the C compiler has received the most attention of the
various languages, and it may generate more efficient code. In particular, the
amount of code required to set up loops seems to be consistently less in C,
and operations such as those required in searches and sorts are unlikely to be
optimized in early versions of compilers for other languages.
For some examples which do not particularly favor C, we may look at some old
Fortran coded problems. On the Multiflow Trace computer, 22 of the 24
Livermore loops run up to 10% faster in C than in Fortran, with the other two
running much faster in Fortran only through the use of compiler directives
(pragma) or in-line compilation of math functions. The Sun 4.0 C compiler is
able to compile linear searches through floating point arrays with code which
runs 40% faster than under their optional Fortran.
For those who wish to know what axes I am trying to grind, I am on the verge
of embarking on a project to support the SLATEC mathematical library in C and
Fortran in a way which should suit the needs of those who need source code at
a fair price to run on a variety of platforms. If we don't get approval from
the owners of the rights, we will be looking for alternatives. I am working
also on a series of hands-on learning seminars which will likely be presented
in 6 hour increments, beginning with application performance tuning for
pipelined architectures in C and Fortran, and UNIX familiarization for Fortran
programmers. All in addition to my job in aerodynamics design and computation.
Tim Prince
39 Harbor Hill
Grosse Pte Fms, MI 48236
Just for your peace of mind, you are not alone. Researchers in the Advanced
Computational Resources Lab (I think I have the name almost right) at Argonne
National Labs are also interested in persuading scientists to develop
numerical applications in C -- in part because the most advanced parallel
hosts are first programmable in C. You might find their book Portable Programs
For Parallel Processors interesting (Boyle, Butler, et. al.). --rlw
Dear Mr. Ward,
I have my problems with recommending The Awk Programming Language by Aho,
Kernighan and Weinberger. It is an excellent reference to Awk, but is
confusing when one is working with the older versions (older than System V
v3.1). It brought tears of frustration until I happened to do a
tail /usr/bin/awk od -c
and came up with "(Berkeley) 9/16/83.". Clearly, an early version.
There is a very good simpler explanation of awk in the chapter "The Awk Power
Play" in UNIX Papers for System Developers and Power Users by the Waite Group,
Howard W. Sams & Co. Learning regular expressions at the awk level is best as
the regular expressions of "sed, grep, egrep, and fgrep" are subsets of this.
One cannot be sure exactly what Dr. Whitaker is trying to do, but I have found
that awk is ideal for extracting ASCII information from tables and that little
language is all that he might need.
UNIX tools are in a sense all "little languages" and this can explain their
lack of coverage in the literature. I doubt if an author could convince his
publisher that it would pay to cover these. So one must find information
wherever he can; appendices, mixed in with other coverages and "between the
lines." I have found that Howard W. Sams and The Waite Group are excellent in
their coverage of "UNIX-oriented tools". Advanced UNIX -- A Programmer's Guide
has an appendix on UNIX tools; other recommended are UNIX System V Bible,
Tricks of the UNIX Masters, and The UNIX Shell Programming Language give
innumerable examples.
Personally, I enjoy using UNIX tools in developing applications and turn to
the C language to do whatever I cannot accomplish otherwise. In other words
using other people's ideas before reinventing the wheel. However if somebody
can tell me how to document such a mixture, I would appreciate it.
Yours sincerely,
Alan E. Ternstrom
5321 Perkins Rd. #122
Oxnard, CA 93033
The UNIX tools are undeniably neat; I just developed a full-function mailing
list package for my wife in about 30 lines of shell script (with about six
four- or five-line awk scripts). Unfortunately, documentation is only half the
problem. We're finding that since programmers must master six or seven fairly
complex and independent syntaxes that developing skilled maintenance
programmers for mixed tool applications is especially difficult -- the
increased "bump" at the beginning of the learning curve can easily frustrate a
newcomer. Anyone have some suggestions? --rlw
Dear Editor:
I would like to ask you to look at the opening two paragraphs of the Doctor
C's Pointers in the February issue.
I like the first paragraph. It matches my experience: Try a few things to see
if the concept will work and then, because of lack of time, build the rest of
the program in a stepwise fashion. It prepares me for what I would consider an
excellent article: How to pull hard coded definitions out of a program into
headers.
The second paragraph is in total opposition: "Headers must be done before any
code is written."
In fact, the article is rather good in its information while having almost
nothing to do with either paragraph. It certainly does not assist people
working per paragraph one, while not being as rigid as paragraph two.
I think the editor needed to do some editing here.
The article "Tools For MS-DOS Directory Navigation" by Leor Zolman contains at
least one serious error and a couple of misapprehensions.
The serious error is "there is no facility for viewing all active assignments"
with SUBST. This is simply wrong. Typing SUBST with nothing after it instantly
lists all current substitutions. I use it constantly.
The primary misapprehension is that it is desirable to change default drive
when CD is used. About half the time, I would then have to change back to my
original default. I get the feeling that Zolman works in an environment where
CD supports only one directory at a time (Apple ProDOS does this, maybe UNIX
does also), not the one per drive that MS-DOS provides.
While his choices are interesting, I am not sure a C program is needed, since
most of what he does can be done with a batch file in far fewer lines. I can
do his previous in the rather stable environment I work in most of the time by
using SUBST to simply create a new drive. I typically have 8 or 10 "drives"
specified on my system (118 directories.)
Mike Firth
1019 Martinique
Dallas, TX 75223
Leor responds:
Yes, typing "subst" by itself does indeed display all active assignments. The
DOS manual for my system didn't happen to mention that little feature. I still
dislike "subst" for several reasons, however:
1. After selecting a virtual drive defined via "subst", there are two
different notations for specifying the full pathname of any file on the
virtual drive (one using the "real" drive, one using the virtual drive) but no
way to access those portions of the file tree that reside "above" the base of
the virtual drive without reverting to "real" drive notation. This leaves
"subst" looking adequate to specify data paths for applications programs, but
not too intuitive for general-purpose file system navigation.
2. After a virtual drive has been assigned with "subst", any redefinition of
that drive is prohibited by DOS. This makes the implementation of a
generalized "return to last directory" mechanism using "subst" seem impossible
(at least if you want it to be able to work more than once.)
3. Finally, to be able to use "subst" at all, CONFIG.SYS must be changed and
the system re-booted.
Regarding my alleged "misapprehensions", please realize "cde" and "ret" are
meant to work in conjunction with the built-in "cd", not necessarily as
absolute replacements for it... "cd" may still be used anytime to change any
drive's default directory without disturbing the operation of cde/ret.
The philosophy behind cde/ret is simply to reduce the number of keystrokes
needed to move between directories, and to make DOS behave a little bit more
like UNIX; whether or not that is better, of course, boils down to a matter of
personal preference.
Dear Mr. Ward:
I would like to advise CUJ readers of a potential problem that may occur when
using Turbo C's integrated environment. I encountered this problem when I
combined completed modules of the system I worked on into one library. I used
the TLIB utility, and the resulting file was named CHELIBS.LIB. But after I
replaced numerous C-source and OBJect file names in the PRJ file with
CHELIB.LIB, I encountered a surprising reaction: my Turbo C integrated
environment (v2.0, tc.exe dated 8-29-88) failed to link the executable file
due to numerous unresolved external references (_setargv, _setenvp, and _exit,
among others).
After careful study, I learned that this error occured due to a bug in Turbo
C. Turbo C does not correctly handle LIB file names listed in the project
file. My library's name began with "CH" (like CH.LIB, Borland International's
library of huge memory models), and this file was interpreted by Turbo C as
its own library. Hence, CS.LIB (I used a small memory model) was not linked at
all. When I renamed my library to LIBCHES.LIB, all was well.
My research has shown me that users' libraries can't use titles beginning with
any of Borland's library names (i.e., CS..., CM..., CC..., CL..., or CH...).
I hope my report will be useful to Turbo C users, and I am happy to make a
contribution to CUJ, however small it may be.
Sincerely,
Alexander Vladimirovich Pavlov
Poste restante,
Central Telegraph Office

Moscow K-9
103009 USSR
Thanks for the information. It's neat to get letters from the USSR. --rlw
Dear CUJ,
Simon Wheaton-Smith's letter (February 1990 CUJ) deserves patience and a
response. While his tone is fanatical, even fanatics have been known to have
insights.
I have found the C source code he placed on Compuserve (GO CLMFORUM, in the
OOP Alley library as OBJECT.C). In this example, he demonstrates that it
doesn't require extensions to C or a new language to provide encapsulation,
dynamic allocation or to place panels and buttons on the screen. I hope we're
ready to agree that these characteristics are not unique to object-oriented
programming.
There are additional features of C++ which Simon admits are absent from C but
which Simon suggests are best provided by a more robust preprocessor. These
features are overloaded functions and inheritance. One could argue that, in
fact, C++ is just such a preprocessor. Or, one could argue that Simon is
suggesting a new language, since a preprocessor can be considered a language.
We might call this language "C with overloaded functions and inheritance"
instead of C++ (formerly known as "C with objects").
I would argue against preprocessors in general. If Simon has developed in C
for MVS environments, then he is probably familiar with the C compiler IBM has
been marketing. I believe this product is still a preprocessor, producing
assembler code. C's limited I/O features have not been extended to match IBM's
access methods and so, more often than not, the assembler code is doctored to
produce the desired application. The original C code ends up being discarded
and you're developing an assembler application. As you would expect, there's
not much serious C development on MVS environments. (This situation should
eventually change now that IBM has endorsed C as one of the four SAA
languages, as Simon points out.)
This is the danger with preprocessors. Just as with C macros, only the
simplest of functions belong in preprocessor macros.
I am not particularly a fan of C + +. I agree with Simon, that C + + is "a
random collection of items". I've been pleased with its cautious reception.
But C + + does have a certain amount of promise -- given C + + and a robust
class library, we should be able to quickly produce terse programs.
This seems to be what we want from a development environment, and the
direction that modern languages should take -- rapid development in a brief
and clear style that produces efficient code. The language should be suitable
for group development. We want mechanisms which encourage (perhaps enforce)
reusable code design -- it's difficult to say that software can currently be
characterized in "generations". It remains to be seen if C++ can deliver on
these promises.
A note on efficiency -- Simon would lead us to believe that the programmer is
responsible for optimization. I, on the other hand, believe this to be a
cooperative effort between the programmer and his compiler. When we have a
language which permits a programmer to briefly and clearly describe what needs
to be done and a compiler which determines the efficient way to get it done,
we'll have a development environment we can stay with for a while.
Russ Klanke
6840 Oswego Place NE #306
Seattle, WA 98115
Thanks for writing such a reasonable response.
Personally I don't agree that C++ is a random collection of items. I was
fortunate enough to sit in on a two-day C++ seminar by Bjarne Stroustrup a
couple of years ago. I was impressed with his justification and rationale for
the features included in C++. I think he's done a remarkable job of adapting
language features invented in a "protect the programmer from himself"
environment so that they fit reasonably in C's "you'd better know what you're
doing" world. -- rlw
Dear Mr. Ward:
When I was a child it was rather common to hear from one of your playmates(?)
the taunt "I know something you don't know." I am surprised to find it
continued in The C Users Journal in Mr. Brannigan's article on "Fitting Curves
to Data".
Mr. Brannigan states: "It is not difficult, for example, to input data for a
linear regression routine to a well known statistical package (which I shall
not name) used on micros and mainframes for which the output is incorrect."
Either name the offender or do not make the accusation. In the context I am
familiar with professionally, if you disagree with something that has been
claimed, you state your disagreement and give supporting evidence for that
disagreement but you do not make the type of accusation Mr. Brannigan has. In
my opinion Mr. Brannigan has by such behavior damaged only his credibility.
Sincerely,
Morton F. Kaplon
1047 Johnston Dr.
Bethlehem, PA 18017
When I edited that story, I almost deleted the parenthetical remark to which
you refer, just because it wasn't really necessary. Perhaps I should have.
Had I read it as you did, I certainly would have deleted it. For me, as a
small publisher, I brought other assumptions to the table. I figured Brannigan
was just trying to spare me the wrath of some major software vendor. Had he
named the vendor, I would have been forced to verify the error before
publication or run the risk of being without defense against a potential libel
suite. Since the error itself wasn't critical to the story, I would probably
have deleted the comment instead of investing resources in doing "quality
control" for a product of minimal interest to my readers. Which is better,
having readers be placed on notice (albeit vague notice), or saying nothing at
all? --rlw
Re: Passing and Returning Objects in C++
In his article in the August, 1989, issue, Bruce Eckel gives an interesting
description on how values are passed to C++ functions. Unfortunately, he makes
a few errors which detract from his presentation.
First, he incorrectly describes passing arguments to a function by name. He
states that this is when a pointer is passed to a function. In fact, passing
an argument by name uses Algol's copy rule: the entire text of an argument is
reevaluated each time the name appears in the called function. This somewhat
resembles how arguments to C macros are handled.
C does not have call-by-name or call-by-reference. In C, only values are
passed. When a pointer is passed, it is passed by value. The programmer must
account for the fact that a pointer value is needed when the function is
called and that the value of the argument received is a pointer. This means
that an expression that is not an lvalue cannot be used as an argument for a
function expecting a pointer value.
C++ did extend C to add call-by-reference (but not call-by-name). While this
is most often implemented by passing a pointer to the function, it is not the
same. The programmer need not know whether a function is called by reference
or value: the call is the same and the argument need not be an lvalue.
He makes another error in stating that structure assignment is limited. The
example he uses
A=B=C;
works in ANSI C both for simple data types (int, float, etc.) and for
structures.
He also describes a method of returning a structure which can only be used
when the called function is not recursive. When the function is recursive,
space for the returned value of the structure is usually allocated from a
stack. Implementations of C++ can use this same method of returning objects.
Both C and C++ are free to pass "hidden" addresses for a return value, as Mr.
Eckel describes for C++.
Inaccuracies like these in describing C cast undeserved doubt on the remainder
of the article.
Sincerely,
Michael J. Eager
Eager Consulting
481 Century Dr.
Campbell, CA 95OO8
The readers will have to judge this dispute -- I don't know enough about C++
to make a call. For the record, we did have Eckles' article reviewed by a
second competent C++ programmer before using it. So, there are at least two
programmers in at least one other environment who have a different
understanding. Perhaps someone can explain why the confusion? --rlw




















Complex Arithmetic And Matrices In C


Louis Baker


Louis Baker has a Ph.D. in astronomy and has written books with code in C and
Ada for McGraw-Hill. He may be reached at Mission Research Corp., 1720
Randolph SE, Albuquerque NM 87106.


FORTRAN has long been the language of choice in the scientific and engineering
community. (The name FORTRAN -- short for FORmula TRANslator -- betrays its
origins.) Over the years, FORTRAN has been extended to meet the needs of its
user community. Today's FORTRANs routinely handle complex arithmetic,
multi-dimensional arrays, and sophisticated graphics.
C, on the other hand, has evolved to support its user base, systems
programmers who benefit from C's easy access to low-level constructs such as
register variables and absolute addresses. Unfortunately, there doesn't seem
to be much effort to incorporate features like complex numbers into the ANSI C
standard. Most FORTRAN programmers also decry C's inability to pass a
subroutine both the dimensions and values of a multi-dimensional array. Yet,
there are compelling reasons for writing "engineering" code in C, especially
for real-time control or data acquisition applications. C, after all, makes it
easy to access ports, take advantage of DMA, and so on.
In this article I'll show you one approach, namely the use of header files,
that adds complex number and matrix support to your C programs.


Roots


The header files that I'll discuss here were originally developed for a book
recently published by McGraw-Hill, C Tools for Scientists and Engineers [1]. I
found them quite useful for the problems addressed in that book.
"Why bother with C?" you may ask, evoking a mental image of Louis Baker as a
Neanderthal. "Why not just use C++ and make things a lot simpler?" Although
C++ provides no more support for complex numbers or multi-dimensional matrices
than C does, you can at least define classes of objects such as complex
numbers or matrices. In fact, new versions of C++ that support multiple
inheritance make it relatively easy to define complex matrices in terms of
other C++ definitions. However, C++ is less widely available than C and is
even less standardized. One of C's claims to fame is its portability, but
today's C++s don't insure that a port to another host or compiler will work
the same way. But yes, C++ is certainly a viable option.


Complex Numbers For The Uninitiated


Complex numbers are generally represented in the form
x + iy
where i is the square root of -1. The x is usually called the "real" part of
the complex number and iy the "imaginary" part. Complex numbers crop up in
mathematics, for example, as the roots of polynomial equations. Every
polynomial has a complex root. This means that you can plug the complex root
into the polynomial for x and make the polynomial evaluate to zero. Remember
factoring polynomials like x2 - 4x + 4 into (x - 2)(x - 2)? Factoring
represented an easy method for solving polynomials. Factoring a higher order
polynomial f(x) often involves rewriting it as (x-r)g(x) where r is a root and
g(x) is a polynomial of lower degree. However, the root r may very well be a
complex number -- not just a "real" number such as 4.67.
Complex numbers appear in engineering mostly in equations that use the complex
exponential. The relationship
e(ix) = cos(x) + i sin (x)
where e is 2.718281828459045... is often used to analyze AC circuits and other
systems with periodic behavior. A quick glance at the equation should convince
you that it enables the cosine function to be written as the real part of
e(ix). So what? Well, it's often simpler to use the properties of exponents to
manipulate equations than to rely on trigonometric functions.


Complex Arithmetic In C


The rules for adding, subtracting, multiplying, and dividing complex numbers
can be found in most elementary algebra texts. Examine Listing 1 to see how
you might write a header file to define functions that perform these common
operations on complex numbers. Note that addition and subtraction of complex
numbers just requires that you treat the real and complex parts of the numbers
separately. Multiplication and division, as you might expect, are a bit more
involved. The product
(a+bi)(c+di) = (ac-bd)+(ad+bc)i
for example, can be implemented either in a straightforward manner by CMULT()
or in a more involved manner by CMLT(). The latter saves a multiplication, but
the price is more additions and more storage of intermediate results.
Actually, life's even more complicated than that. On some machines,
particularly those that have floating-point operations performed by software
emulation, CMLT() may very well be faster than CMULT(). But even if your
system has a math coprocessor, where multiplication is only slightly more
costly in time than addition, CMULT() should still be faster than CMLT().
My method for determining the magnitude or "absolute value" of a complex
number is also straightforward:
x+iy = sqrt(x2+y2).
Some packages minimize the possibility of roundoff error by calculating this
as
x+iy = a sqrt( 1 +(b/a)2)
where
a = max(x,y)
and
b = min (x, y)
Listing 2 shows a C program that uses the header file definitions to calculate
the Kelvin functions from approximations presented in the Handbook of
Mathematical Functions [3]. The Kelvin functions are Bessel functions for
which the argument is a complex number of the form sz, where z is real and s
is a square root of i. The imaginary unit i has two square roots,
(1+i)/sqrt(2) and (1-i)/sqrt(2). The Kelvin functions are useful, for example,
in analyzing the distribution of alternating current within a wire. Listing 2
also contains two auxiliary routines, printc(), for printing complex numbers,
and cexp(), for finding ez where z is a complex number.
If this were C++, you could overload the "+" operator for the class of complex
objects rather than use something like Listing 1's definition for complex
addition, CADD(). You could use overloading in a similar fashion for most of
the other operators, making complex arithmetic look much more like ordinary
arithmetic. C++ would also allow you to specify that the code should be
in-line, avoiding the overhead of subroutine calls.
My C Tools book uses an expanded version of this complex header file to
implement root finders and an FFT (Fast Fourier Transform) package. It
contains functions for the complex logarithm, conversion between the
rectangular form for representing complex numbers (x + iy) and polar form
r eiq
where r and q are real numbers, r the magnitude and q the "argument" of the
complex number x+iy. I've also used the package in a number of other routines,
some of which will appear in a forthcoming book called More C Tools. My pet
project is to implement all the functions discussed in Abramowitz and Stegun
[3] in C.


Matrices


If you use the header file given in Listing 3, you'll discover that it makes
using two-dimensional arrays, or matrices, much more akin to their use in
mathematics and FORTRAN. Recall that for a matrix A(m,n), the notation A(i,j)
conventionally refers to the element in the i-th row and the j-th column. The
matrix A declared in FORTRAN as

DIMENSION A(2,3)
would have its elements stored sequentially by columns
A(1,1) A(2,1) A(1,2) A(2,2) A(1,3) A(2,3)
In C, however, the declaration
double a[2][3];
would give us a representation in storage of the form
a[0][0] a[0][1] a[0][2] a[1][0] a[1][1] a[1][2]
The matrix in C is stored by rows, and the indexing is based at zero instead
of one as the index of the lowest element. The header file supplies the
INDEX() definition, among others, to allow matrix a to be referenced with a
mechanism somewhat like the mathematical or FORTRAN notation, but with C
conventions as to indexing starting at base zero and storage by rows. The
example in Listing 4 illustrates how INDEX is used to multiply a vector by a
matrix.
Obviously, you can adapt INDEX to suit your taste. If you want to use indexing
that starts with first rather than zeroth elements, all you have to do is
replace i and j by (i-1) and (j-1). To store by columns, interchange i and j
and replace the variable coln -- the maximum number of columns of the matrix
-- by rown, the number of rows of the matrix.
The header file also supplies some constructs that make translating do loops
from FORTRAN into C a little less painful.


Closing Remarks


I hope these header files and examples are of some use to you. I don't pretend
that they're perfect solutions to some of the complex problems your
applications need to address, but they work in most cases. Maybe they'll give
you the ammunition you need to convince colleagues that C deserves a chance.
[1] W. H. Press, B. P. Flannery, S. A. Teukolsky, W. T. Vetterling, Numerical
Recipes In C, Cambridge University Press, 1986.
[2] L. Baker, C Tools for Scientists and Engineers, McGraw-Hill, 1989.
[3] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, U. S.
Dept. of Commerce Applied Math Series Publ. 55 (also available as a paperback
from Dover Publications), 1964.

Listing 1
/* COMPLEX.H header file
* use for complex arithmetic in C
* see MULLER.C for support functions such as
csqrt(),clog(),cexp(),argmt(),polarxy()

from C Tools for Scientists and Engineers by L. Baker

*/

struct complex { double x;
double y;} ;
static double TP,T2,T3;/* dummy*/
double sqrt();
/* for below, X,Y are complex structures, and one is returned*/

#define CMULTR(X,Y) ((X).x*(Y).x-(X).y*(Y).y)
#define CMULTI(X,Y) ((X).y*(Y).x+(X).x *(Y).y)
/*
#define CMLT(Z,X,Y) {TP=(X.x+Y.x)*(X.y+Y.y);T2=X.x*Y.x;\
T3=X.y*Y.y;Z.y=TP-T2-T3;z.X=T2-T3;}
*/
#define CDRN(X,Y) ((X).x*(Y).x+(Y).y*(X).y)
#define CDIN(X,Y) ((X).y*(Y).x-(X).x*(Y).y)
#define CNORM(X) ((X).x*(X).x+(X).y*(X).y)
#define CDIV(z,nu,de) {TP-CNORM(de);z.x=CDRN(nu,de)/TP;\
z.y-CDIN(nu,de)/TP;}
#define CONJG(z,X) {(z).x=(X).x;(z).y=-(X).y;}
#define CMULT(z,X,Y) {(z}.x=CMULTR((X),(Y)); (z).y=CMULTI((X),(Y));}
#define CADD(z,X,Y) {(z).x=(X).x+(Y).x;(z).y=(X).y+(Y).y;}
#define CSUB(z,X,Y) {(z}.x=(X).x-(Y).x;(z).y=(X).y-(Y).y;}
#define CLET(to,from) {(to).x=(from).x;(to).y=(from).y;}
#define cabs(X) sqrt((X).y*(X).y+(X).x*(X).x)
#define CMPLX(X,real,imag) {(X).x=(real);(X).y=(imag);}
#define CASSN(to,from) {to.x=from->x;to.y=from->y;}
#define CTREAL(z,X,real) {(z).x=(X).x*(real);(z).y=(X).y*(real);}
#define CSET(to,from) {to->x=(from).x;to->y=(from).y;}



Listing 2
/* programs for computations of kelvin functions and their
derivatives

DEPENDENCIES:
header file complex.h required

*/

#include "complex.h"
#include <stdio.h>
#define max(a,b) (((a)>(b))? (a): (b))
#define abs(x) ((x)? (x):-(x))

/* test driver*/
void main (argc,argv) int argc; char **argv;
{
int i, maxit=50;
double x,ber,bei,beip,berp,kei,ker,keip,kerp;
FILE *fileid;
fileid=fopen("PLOT.KE","w");
fprintf(fileid," 4 \n");
for(i=1;i<=maxit;i++)
{x=i*.25;
ke(x,&ker,&kei,&kerp,&keip);
fprintf(fileid," %f %e %e %e %e\n",x,ker,kei,kerp,keip);
};
fclose(fileid);
fileid=fopen("PLOT.BE","w");
fprintf(fileid," 4 \n");
for(i=1;i<=maxit;i++)
{x=i*.25;
be(x,&ber,&bei,&berp,&beip);
fprintf(fileid," %f %e %e %e %e \n",x,ber,bei,berp,beip);
};
fclose(fileid);
exit(0);
}

/* print a complex number*/
printc(x) struct complex *x;
{printf(" %e %e ",(x->x),(x->y));return;
}

/* complex exponential*/
cexp( x,ans) struct complex *x,*ans;
{
double y,exp(),sin(),cos();
struct complex c1,c2;
y = exp (x->x);
c2.x= cos (x->y);
c2.y= sin (x->y);
CTREAL(c1,c2,y);
CLET(*ans,c1);
return;
}

/* Kelvin functions and their derivatives
Rational approximations from Abramowitz & Stegun

section 9.9 pp.384-5*/
ke(x,ker,kei,kerp,keip)double x,*ker,*kei,*kerp,*keip;
{
double y,z,al,sqrt(),log(),ber,bei,berp,beip;
struct complex CC,cex,cdum,f,phi,theta;
if (x<=8.)
{
y=x*x/64.;
z=y*y;
be(x,&ber,&bei,&berp,&beip);
al=-log(.5*x);

*ker=al* ber+.7853981634* bei
+(((((((-.00002458*z+.00309699)*z-.19636347)*z+5.65539121)*z
-60.60977451)*z+171.36272133)*z-59.O5819744)*z-.57721566);
*kei=al* bei-.7853981634* ber
+(((((((.00029532*z-.02695875)*z+1.17509064)*z-21.30060904)*z
+124.2356965)*z-142.91827687)*z+6.76454936)*y);

*kerp=al* berp- ber/x+beip*.7853981634
+x*((((((-.00001075*z+.00116137)*z-.06136358)*z+1.4138478)*z
-11.36433272)*z+21.42034017)*z-3.69113734)*y;

*keip=al* beip- bei/x-.7853981634* berp
+x*((((((.00011997*z-.00926707)*z+.33049424)*z-4.65950823)*z
+19.41182758)*z-13.39858846)*z+.21139217);
return;
}
/*else*/

CMPLX(CC,-.7071067812,-.7071067812);
CTREAL(CC,CC,x);
CTHET(-x,&y,&z);
CMPLX(theta,y,z);
CADD(CC,theta,CC);
cexp(&CC,&cex);
y=1.253314137/sqrt(x);
CTREAL(f,cex,y);
*ker-f.x;
*kei-f.y;
CPHI(-x,&y,&z);
CMPLX(phi,y,z);/*cex now phi of AS*/
CMULT(cdum,f,phi);
*kerp=-cdum.x;
*keip=-cdum.y;
return;
}

be(X,BER,BEI,BERP,BEIP)double X,*BER,*BEI,*BERP,*BEIP;
{
struct complex CC,cex,cdum,g,theta,phi;
double Y,Z,sqrt(),FKER,FKEI,FKERP,FKEIP;
static double PII=.3183098862;
if(X<=8.)
{
Y=(X/8.);Y=Y*Y;
Z=Y*Y ;
*BER=((((((-.00000901*Z+.00122552)*Z-.08349609)*Z
+2.64191397)*Z-32.36345652)*Z+113.77777774)*Z-64.)*Z+1.;

*BEI=Y*((((((.00011346*Z-.01103667)*Z+.52185615)*Z
-10.56765779)*Z+72.81777742)*Z-113.77777774)*Z+16.);
*BERP=X*((((((-.00000394*Z+.00045957)*Z-.02609253)*Z
+.66047849)*Z-6.06814810)*Z+14.22222222)*Z-4.)*Y;
*BEIP-X*((((((.00004609*Z-.00379386)*Z+.14677204)*Z
-2.31167514)*Z+11.37777772)*Z-10.66666666)*Z+.5);
return;
}
/*else*/
CMPLX(CC,.7071067812,.7071067812);
CTREAL(CC,CC,X);
CTHET(X,&Y,&Z);
CMPLX(theta,Y,Z);
CADD(CC,theta,CC);
cexp(&CC,&cex);
Y=.3989422804/sqrt(X);
CTREAL(g,cex,Y);
ke(X,&FKER,&FKEI,&FKERP,&FKEIP);
*BER=g.x-PII*FKEI;
*BEI=g.y+PII*FKER;
CPHI(X,&Y,&Z);
CMPLX(phi ,Y,Z);
CMULT(cdum,phi,g);
*BERP= cdum.x-PII*FKEIP;
*BEIP= cdum.y+PII*FKERP;
return;
}

CTHET(X,PARTR,PARTI)double X,*PARTI,*PARTR;
{
double Y;
Y=8./X;
*PARTI=((((.0000019*Y+.0000051)*Y*Y-.0000901)*Y-.0009765)
*Y-.0110485)*Y-.3926991;
*PARTR=((((.0000006*Y-.0000034)*Y-.0000252)*Y-.0000906)*Y*Y
+.0110486)*Y;
return;
}
CPHI(X,PARTR,PARTI)double X,*PARTR,*PARTI;
{double Y;
Y=8./X;
*PARTI=(((((-.0000032*Y-.0000024)*Y+.0000338)*Y+.0002452)*Y
+.0013811)*Y-.0000001)*Y+.7071068;
*PARTR=((((((.0000016*Y+.0000117)*Y+.0000346)*Y+.0000005)*Y
-.0013813)*Y -.0625001)*Y+.7071068);
return;
}


Listing 3
/* header file to:
aid in conversion of FORTRAN code to C
perform a few useful functions in-line
from Handbook of C Tools for Scientists and Engineers
by L. Baker

DEPENDENCIES: NONE

*/



/* in-line functions for use with 2D arrays: */

/* row major order as in C indices run 0..n-1 as in C*/
#define INDEX(i,j) [j+(i)*coln]

/*various loop constructors */
#define DOFOR(i,to) for(i=0;i<to;i++)
#define DFOR(i,from,to) for(i=from-1;i<to;i++)
#define DOBY(i,from,to,by) for(i=from-1;i<to;i+=by)
#define DOBYY(i,from,to,by) for(i=from;i<to;i+=by)
#define DOBYYY(i,from,to) for(i=from;i<to;i++)
#define DOV(i,to,by) for(i=0;i<to;i+=by)
/* row major order as in C indices run 1..n */
/*#define INDEX1(i,j) [j-1+(i-1)*n]
*/
/* column major order, as in fortran: */
#define INDEXC(i,j) [i-1+(j-1)*rown]

/* usage: if a(20,30) is matrix, then
a(i,j) in C will be a INDEX(i,j) if n=30. */

/* to index vectors starting with 1 */
#define VECTOR(i) [i-1]

#define min(a,b) (((a)<(b))? (a): (b))
#define max(a,b) (((a)<(b))? (b): (a))
#define abs(x) ( ((x)>0.)?(x):-(x))


Listing 4
mvc(a,x,y,n,m,coln) int n,coln,m;
float x[] ,y[] ,a[];
/* y=ax matrix a(m,n) vector x */
{
int i,j,k;
float sum;
DOFOR(i,m)
{
sum=0.;
DOFOR(j,n) sum+= x[j]*a INDEX(i,j);
y[i]=sum;
}
return;
}

















A Pseudo-Random Number Generator


Robert Fruit


Robert Fruit is a graduate of the Illinois Institute of Technology and is
president of Simulation Rule, which designs and builds actuarial computer
system for microcomputers. He has worked in the actuarial field for 20 years,
most of that time spent on computer systems for actuaries. He can be contacted
at PO Box 295, Clarendon, IL 60514.


The linear congruent method for creating pseudo-random numbers dominates
modern random number generation for three reasons. First, it can create good
random number series. Second, it makes efficient use of limited computer
resources. Third, its mathematical requirements are a particularly good match
to the way electronic computers do arithmetic.
Forty years of mathematics research into the linear congruential method
guarantees that if properly used, it will create good random number sequences.
The theory also predicts that it is easy to create bad random number
generators using the linear congruent method (LCM). In fact, one can create
bad LCM random number generators more easily than good ones.


Linear Congruence


The classic description of a congruent system is the hours on an analog clock
(the ones with the hands). The clock hands point out the hours, 1, 2, 3, ...,
12. After 12 the next number is 1, and the pattern keeps repeating. The
classic description says this is modulo 12. (Modulo is a special number in
congruent mathematics which defines the length of a pattern).
This classic definition is wrong about almost everything, except that the
hours on a clock are modulo 12. The clock does not follow a classic congruent
pattern. For modulo 12, the numbers would go from 0 to 11 (not 1 to 12). One
of the first rules of congruent mathematics is that the numbers always stop
one short of the modulo number.
The minutes on a clock do follow the correct pattern for a congruent system.
If the minute hand is at 59 minutes right now, one minute from now the minute
hand will be at zero minutes. The minutes are a congruent system with a modulo
value of 60.
Congruent mathematics has its own writing style. Using the minutes example
above, the style is
0 = ( 59 + 1 ) mod 60
which, in general, would be written as
A = B mod M
A, B, and M are integers.
This statement reads "A is congruent to B modulo M". From the early rules of
congruent mathematics, A must be between 0 and M-1. There is no like
restriction on what integral value B may have.
A linear congruent function is nothing more than
x' = ( A * x + C ) mod M
It may be little hard to see how this works without an example. Let A = 5, C =
3 and M = 8. Then if x = 0 initially:
x 5*x+3 mod 8
---------------
0 3 3
3 18 2
2 13 5
5 28 4
4 23 7
7 38 6
6 33 1
1 8 0
This example shows all the parts of a linear congruent random number
generator. An initial seed is chosen (zero in the example). The seed passes
through the linear formula ( 5*x+3 in the example), and is "confined" by the
modulo operation (mod 8). Each result (x') becomes the next value (x) to be
fed through the process. The mod 8 column represents the first eight random
numbers created by this process.
In this example, the generator produced every value from 0 to M-1. If the same
linear formula were "confined" by a mod 7 operation, it would create only five
values:
3, 4, 2, 6, 5, 0.
And if this new generator were seeded with a one, it would generate:
1, 1, 1, ...
There are two lessons here. First, the longest run any linear congruent system
can have is M, each number from 0 to M-1 appearing once. Second, the values
for A, C, and M must be selected carefully if one is to achieve the longest
possible run length.
Donald Knuth in his book, The Art of Computer Programming volume 2 /
Seminumerical Algorithms, has the most often quoted set of rules for choosing
the A and C. These rules guarantee that the run length of the linear congruent
series will be M -- that is, every value from 0 to M-1 will appear once. The
rules are:
C must be relatively prime to M.
A-1 must be a multiple of every prime factor of M.
A-1 is a multiple of 4, if M is a multiple of 4.
For C and M to be relatively prime, as required by the first rule, they may
share only the divisor one. The numbers 12 and 19 are relatively prime because
one is the only divisor they have in common. The numbers 15 and 18 are not
relatively prime because one and three are their common divisors.
To meet this requirement, first find all the prime divisors of M and then make
C a prime number that is not a prime factor of M.
To apply the second rule you must break M into its prime factors. If M is 63,
its prime factors are 3 squared and 7. According to the second rule, A-1 must
be a multiple of every prime of M. So, A can be any of three values:
factors of A-1 A - 1 A
----------------------------------
 3 * 7 21 22
 2 * 3 * 7 42 43
 3 * 3 * 7 63 64 mod 63 = 1
The values 1, 22, and 43 are the only values for A that will produce the
maximum run length of 63. Every other value of A will have shorter run
lengths. Notice that 42 has a prime factor that is not a prime factor of M.
That is fine as long as 42 also contains at least one of every prime of M.

The third rule is a special case of the second. If M is divisible by four,
then A-1 must be divisible by four. For example, if M = 56 (prime factors two
cubed and seven), the only values for A that will have the maximum run length
are 1 and 29.
factors of A-1 A - 1 A
----------------------------------
 2 * 2 * 7 28 29
 2 * 2 * 2 * 7 56 57 mod 56 = 1
Notice that 2 * 7 ( A = 15 ) is not in the list. The second rule would include
this factor, but the additional restriction imposed by the third rule excludes
it.
Rule 3 can complicate the selection when two can be part of A-1's factors. For
example, if M = 18 (prime factors 2 and 3 squared) here are 3 values for A: 1,
7, and 13.
factors of A-1 A - 1 A
----------------------------------
 2 * 3 6 7
 2 * 2 * 3 12 13
 2 * 3 * 3 18 19 mod 18 = 1
In this case A = 7 will have the maximum run length since 18 is not divisible
by four. Only the second rule applies in this case.


Choosing M


For electronic computers, if M is equal to the word size of the computer's
arithmetic unit, the modulo operation will be performed automatically for
simple multiplication and addition -- the high order bits lost during
arithmetic overflow are the same bits that would be removed by the modulo
operation.
Generally M should be as large as possible, because it determines how long a
linear congruent random number generator can go without repeating its pattern.
The factors of M will lead to other patterns that must be carefully watched.
For example, the random number generator
#define RAND( seed) (seed = \
((( seed * 41 ) + 13 ) % 100 ))
produces the 100 values in Figure 1. Notice how the one's digits of all these
numbers are all the same for each column.
Similar patterns can occur in generators with a larger M. Consider the
following #define
// all items must be longs

#define RAND( seed ) ( seed = \
((( seed * 101L ) + 13L ) % 1000L ))
Figure 2 gives the upper right corner of the first three pages of the
3-dimensional array (10 x 10 x 10) of its output. Notice that not only is the
column pattern from Figure 1 preserved but the 10 digit repeats in the same
place on each page.
These patterns are characteristic of linear congruent random number generators
and are determined by the factors of M. In particular, you should avoid M's
that are perfect squares or cubes.
Thus, selecting M is fairly complicated: you must choose a number that avoids
all undesirable patterns and still produces the maximum run lengths.


Selecting A


A must also be selected with care. It is not enough that the A be selected for
the greatest run length of M. For example,
#define RAND( seed ) ( seed = \
( ( seed + 1 ) % M ) )
always produces the maximum run length, but simply counts through the
integers:
0, 1, 2, 3, ..., M-1, 0, 1, ...
Run length isn't enough; the order of the numbers is also important.
The A must be selected so that a large amount of mixing occurs between random
numbers. To get good mixing, A must meet two tests. First, A must be at least
as large as the square root of M. LCMs work much like the carnival number
guessing wheel in Figure 3. Choosing a small A is like giving the wheel a
small push -- so small that the wheel might not even make one complete
revolution. Choosing a large A is like giving the wheel a large push -- the
wheel will turn several times before it slows to a stop. When the wheel makes
several revolutions it is difficult to guess, at the time the wheel is pushed,
where it will stop.
In addition to being large, A must also have a characteristic that makes the
next value appear especially random. Knuth suggests calculating a value he
called potency. The potency for A is s, where s is the smallest integer to
satisfy
(A-1)s = 0 mod M
Every A that meets the three conditions given earlier will have a solution for
the potency equation. Listing 1 is a program that will calculate the potency
using one of two functions. The first function requires the modulo size which
must be less than the size of an unsigned long. The second function assumes
the modulo is the size of an unsigned long.
For M = 32, some As have a potency of one (1, 9, 17, and 25). Two of these are
shown in Figure 4, 1 and 9. Some As have a potency of two (5, 13, 21, and 29).
Two of these, 5 and 13, are shown in Figure 5.
Figure 4 and Figure 5 use interpolation differences to determine the next
number in the sequence. When A=1, the results differ by a constant
relationship, when A=9, by a linear relationship. It turns out that A=1 is a
special case of the linear relationship (the multiplier of the linear term is
0). Obviously higher values for A yield greater mixing. Figure 5 shows what
happens when larger values are used. Both of these random sequences need a
quadratic formula to determine the next item in the sequence. It does not take
much effort to determine that the quadratic results in Figure 5 are
significantly better (more random-like) than are the results in Figure 4.
Remember that these sequences are created with a linear formula -- it is the
relationship between that linear formula and congruent arithmetic that gives
the effect of higher powered functions.
Potency is a measure of mixing. The higher the potency, the better the mixing.
Knuth recommends a potency of at least three, and he mentions that others
recommend a potency of at least four.


Applying The Rules


To illustrate how the criteria for M, C and A interact, I'll use them to
select values for a "real" LCM random number generator.
First select an M. Begin by checking some modulo calculations with a pocket
calculator. Let M = 769,387, A = 7,283, and C = 853. Start with x0 = 0
x1 = ( 7,283 * 0 + 853 ) mod 769,387 = 853
x2 = ( 7,283 * 853 + 853 ) mod 769,387 = 57,303

x3 = ( 7,283 * 57,303 + 853 ) mod 769,387 = error
An eight-digit pocket calculator will overflow on that last calculation,
loosing its least significant digits. Without the least significant digits, it
is impossible to calculate the modulo value.
A computer has the same problem. If the calculations are attempted using
standard mathematics, a similar overflow will occur. Thus M2 can never be
greater than the word size of the computer's arithmetic unit. For unsigned int
with Microsoft's QuickC the M2 limit is 65,536 and M can not be larger than
256 -- an unacceptable restriction for M. Using unsigned long int instead
makes the limit on M 65,536 -- still not a great situation.
If you are unwilling to write your own arithmetic routines, even ones in C,
there is another way out. Earlier it was mentioned that the word size for the
arithmetic unit would do modulo calculation automatically for multiplication
and addition. Relying on this "automatic" modulo would let unsigned int
accomodate M values to 65,536 -- still too small -- and unsigned long int
acccomodate M values to 4,294,967,296 -- an acceptable size.
Choosing this M, though, implies a compromise -- many patterns, based on the
powers of 2, will be created, but we'll get the needed run length.
Now we select A. Since M only has 2 as a prime factor, and is divisible by 4,
both rules 2 and 3 apply. A-1 must be large, but still fit within a unsigned
long int, say 220. Large As like this have a low potency, so a small A-1 with
a large potency is also needed, say 23. To keep from having too simple a
multiplier, it is a good idea to also include another power of 2 between the 2
already selected, like 213. Combining these three numbers, and adding 1 gives:
A = 220 + 213 + 23 + 1 = 1,056,777
Because we included 23, the potency of this combined number is 10.
Now select C, preferably a value larger than the square root of M. The only
requirement is that C be relatively prime to M. C = 218 + 1 must be prime to
M.
This makes the definition for this random number generator:
#define RAND( seed ) ( seed = \
( ( 1056777 * seed ) + 262145 ) )
Because M is the word size unsigned long, there is no % operator in this
definition; the modulo operation will occur automatically.
Figure 6 shows the first 100 numbers created by this random number generator.
These numbers are not like the digits found in the random number books like
1,000,000 Random Digits. In books like that, each digit occurs randomly. A
random number generator creates a sequence of numbers that have the wanted
random behavior. Looking at one number does not make it possible to reasonably
guess what number in the sequence will be next.
That distinction between the random digits found in a book or table of random
digits, and a random sequence of numbers, is important. In a table of random
digits, if some random three-digit numbers are wanted, just start selecting
three-digit triples anywhere, and random three-digit numbers will be created.
With a sequence of random numbers, it is necessary to convert from one
sequence into another. This is often done by first converting the random
numbers into numbers between 0 and 1. Those random numbers between 0 and 1 are
then converted into random three-digit numbers.


Conclusion


One thing should be clear now: creating a good random number generator is not
a random process. Hard work goes into choosing an M that best balances the
hardware with the need to suppress patterns and have long run lengths -- and
to choose an A that works best with that M, suppresses unwanted patterns, and
has great mixing.
Bibliography
The Art of Computer Programming -- volume 2 / Seminumerical Algorithms, Donald
Knuth, Addison-Wesley Publishing Company, 1969.
Algorithms, Robert Sedgewick, Addison-Wesley Publishing Company, 1984.
C.R.C Standard Mathematical Tables, twelfth edition, Chemical Rubber
Publishing Company, 1963.
Figure 1
 0 13 46 99 72 65 78 11 64 37
 30 43 76 29 2 95 8 41 94 67
 60 73 6 59 32 25 38 71 24 97
 90 3 36 89 62 55 68 1 54 27
 20 33 66 19 92 85 98 31 84 57
 50 63 96 49 22 15 28 61 14 87
 80 93 26 79 52 45 58 91 44 17
 10 23 56 9 82 75 88 21 74 47
 40 53 86 39 12 5 18 51 4 77
 70 83 16 69 42 35 48 81 34 7

Figure 1 shows the first 100 random numbers
created by the #define

#define RAND( seed ) = ( ( seed * 41 ) + 13 ) % 100

Notice the patterns in the output from this
random number generator.

Figure 2
page 1 13 326 939 ...
 643 956 569 ...
 273 586 199 ...
 ... ... ...
 ... ... ...

page 2 313 626 239 ...
 943 256 869 ...
 573 886 499 ...
 ... ... ...
 ... ... ...

page 3 613 926 539 ...

 243 556 169 ...
 873 186 799 ...
 ... ... ...
 ... ... ...

 ... ... ... ... ...
 ... ... ... ...

Figure 2 shows the upper right corner for the
first 3 pages of the out put from the random
number generator
#define RAND( seed ) = ( ( seed * 101 ) + 13 ) % 1000
Notice the patterns between the columns and
pages.

Figure 3 Figure 3 show a number wheel like those that appear in carnivals. The
wheel is given a spin and the object is for people to guess where it is going
to stop. A good linear congurent random number generators is like this.
Knowing the last number created by the random number generator does not make
it possilbe to guess what number well next appear. A bad random number
generator does make it possible to make reasonable guesses. See figures 4 and
5 where it is shown that it may be easy to guess the next random number in the
sequence.
Figure 4
 mod dif dif mod dif dif
 32 1 2 32 1 2

 0 0
 21 21
 21 0 21 8
 21 29
 10 0 18 8
 21 37
 31 0 23 8
 21 45
 20 4

 multiplier A = 1 multiplier A = 9
 constant, C = 21 constant C = 21

Figure 4. Two different Ax + C MOD 32 number
sequences with a potency of 1. Differencing
techniques are used to determine what polynomial would
duplicate these sequences. For the first sequence the
formula would be:

 x = ( n*21 ) mod 32
 n = 0, 1, 2, ...

For the second sequence the formula is

 x = ( ( n*(17 + n*4 ) ) mod 32
 n = 0, 1, 2, ...

Figure 5
 mod dif dif dif mod dif dif dif
 32 1 2 3 32 1 2 3

 0 0

 21 21
 21 20 21 28
 41 16 49 16
 30 36 6 44
 77 16 93 16
 11 52 3 60

 129 16 153 16
 12 68 28 76
 197 229
 17 1

 multiplier A = 5 multiplier A = 13
 constant, C = 21 constant C = 21

Figure 5. Two different Ax + C MOD 32 number
sequences with a potency of 2. Differencing
techniques are used to determine what polynomial would
duplicate these sequences. For the first sequence the
formula would be:

 x = ( ( n*(49 + n*(6 + n*8) ) ) / 3 ) mod 32
 n = 0, 1, 2, ...

For the second sequence the formula is

 x = ( ( n*(37 + n*(18 + n*8) ) ) / 3 ) mod 32
 n = 0, 1, 2, ...

Figure 6
 262145 2151161866 2258526299 2370880308 1132551381
 1484448638 2963652463 3837106920 1816359465 1597514610
 249216987 2566641436 1300369405 4217778150 2599972631
 1913833424 3634824785 3820837082 979853227 4098604292
 3733857573 3925339726 476389567 2946107064 3636804729
 236077122 3402662483 1456665836 3133938765 3649740470
 3843708007 2666387360 3411317921 44128682 3716507387
 3863215828 2239919477 1469628702 1046907407 2748365448
 2183892681 4059374866 2003078563 2658228924 1763160221
 2981942662 850936247 3961859952 3930106609 12067450
 845968971 3108866212 3205647813 3719388142 3047029599
 3622680152 3014289689 800379362 1706795251 384501900
 3088633069 1632687190 3418765063 1632423744 4187965761
 1138078538 1301319067 2776390260 66170389 982892222
 3003085999 2223307304 3933588329 2921678514 860470339
 3377725020 2038657341 3348798246 171455575 2818094864
 1873047441 1942883354 2506468587 3303306308 3201603173
 2929322382 4185847295 69895672 3586241977 370836354

Figure 6 shows the first 100 random numbers created by using the
random number generator using the techniques covered in this
article. The #define for that random number generator is

#define RAND( seed ) ( seed = ( 1056777 * seed + 262145 ) )

Listing 1
/* Find the Potency of an A for a given M
* The C Users Journal
* Creating Pseudo Random Numbers
* Robert Fruit
*
***********************************************************************/

int potency( long A, long M )
{
long B; /* power of A */

int i;

A--; /* decrement A for potency test */
B = A; /* set initial power of A */
B = B % M;
i = 0;
while( (B != 0) && (i < 25) ){ /* loop while power of A not zero */
B = ( (A * B) % M );
i++;
}
if( i >= 25 ) /* if test failed report error */
i = -1;
return( i ); 
}


/* Find the Potency of an A for maximum unsigned long M
* The C Users Journal
* Creating Pseudo Random Numbers
* Robert Fruit
*
***********************************************************************/

int potencyMaxM( long A )
{
long B;
int i;

A--;
B = A;
i = 0;
while( (B != 0) && (i < 25) ){
B = A * B;
i++;
}
if( i >= 25 )
i = -1;
return( i );
}


/*
/* Test program to find the potency of A
*
***********************************************************************/

#include <stdio.h>
#include <stdlib.h>

int potency( long A, long M );

void main( void )
{
int pot;
long A,M;

printf("\n\n\nFind the potency for A values for a given M\n\n");
printf("What is the M value --> ");
scanf("%ld",&M);

printf("\n\nWhat A value is to be tested --> ");
scanf("%ld",&A);
while( A != 0 ){
pot = potency( A, M );
printf("\nThe potency for A, %ld, given M, %ld, is %d",A,M,pot);
printf("\n\nWhat A value is to be tested --> ");
scanf("%ld",&A);
}
}


Listing 2 includes two potency functions. The first one needs a M value
in order to calculate the potency. The second potency function assumes
that M has a value of 2^32, the maximum value for an unsigned long. The
testing program only uses the first potency function.
















































The Quick Sort For Micros And Pipeline Processors


Timothy Prince


Timothy Prince has worked 25 years in aerodynamic design and computational
analysis. He first wrote software in BASIC in 1967 on a GE225 with KSR32
terminals. His first renovation of old FORTRAN came shortly thereafter,
leading to more such projects, including seminars on adapting code to modern
architectures. He received a B.A. in physics from Harvard and a Ph.D. in
mechanical engineering from the University of Cincinnati. He can be contacted
at 39 Harbor Hill, Grosse Pointe Farms, MI 48236.


A great deal has been written about the quick sort, but without much clear
guidance on useful working versions. In this article, I combine some of the
better features of previously published versions of the classic quick sort and
look into some of the issues affecting quality of compiled code.
I started this round of re-examination of sorts with the thought that I might
uncover some ideas which would essentially optimize the code for a wide
variety of machines. Certain modern computers, although their hardware design
appears useful for sorting as well as numerical analysis, fail to achieve
sorting performance in proportion to their speed in other tasks.
You probably assume that sorting algorithms can be differentiated by how
efficiently they capitalize on inherent strengths of various computers'
architectures, and to some extent, this is true. Nevertheless, the basic quick
sort appears to perform well in a wide variety of architectural environments.
Only in the outer control structures does the programmer face decisions which
affect the ability of pipelining compilers to generate efficient code.


Choice Of Method


For problems where sort time is not a dominant factor, the shell sort is often
used. It represents a good compromise between speed and code size. Assuming
that the compiler doesn't stumble over the slightly more complicated nature of
the shell sort's inner loops, shell sorts will be faster on a certain range of
problems, particularly in comparison with quick sorts that are written with
old-fashioned FORTRAN DO loops.
The quick sort algorithm is basically recursive, but that doesn't necessarily
make a recursive implementation more desirable or clearer. The non-recursive
version requires that the internal stack space be dimensioned according to the
maximum expected problem size. This will almost always result in a smaller
stack than a recursive version would use.
Some authors including Plum propose a general-purpose sort function where the
programmer supplies pointers to basic comparison and swapping functions. I
think that setting up a sorted vector of pointers to the data, which may be of
any type defined by a typedef of ARRTYPE, is a more generally useful process.
Since only pointers are swapped, and the swaps are naturally mixed in with
other code, calls to a separate swapping function aren't needed.
Bear in mind that it may pay to use different data types for some sorts. The
key is determining whether the sorts will be done in the same order. For
example, rather than sorting positive IEEE floats, try sorting on longs --
assuming the byte order is the same -- though the code will be less readable
and portable. Similarly, rather than sort 64-bit positive integers, consider
performing the sort on IEEE doubles. Remember, though, that negative numbers
will probably sort in reverse order, since IEEE floating point is generally
based on the one's complement representation, while int is based on the two's
complement.
The quick sort is deceptively difficult to write from scratch. Perhaps the
variety of published versions should be taken as a warning. The only optional
code in Listing 1 is the special section that applies to segments of length 2.
Sorts for these short segments can be finished off faster by direct attack
than by any of the sorting methods suitable for longer segments. Because quick
sorts typically generate many of these short segments, a special treatment is
worthwhile.
Some quick sorts will occasionally allow the searches to extend all the way to
the boundaries of the data rather than restricting them to the remaining
section of the segment currently being analyzed. These algorithms check each
iteration of the loop against the boundary values. Depending on the system,
the extra code could take over twice as long. This probably accounts for
observations that heap sorts seem to be quicker -- especially for moderate
amounts of data.


Recommended Method


Consider Listing 1, iqsort.c. The downward search, which occurs first, can be
guaranteed to stop within one count of the known limit simply by copying the
pivot element to the beginning of the segment before starting. By doing this,
you don't force the processor continually to test against limits, so this will
generally save time on the simpler processors.
The closest parallel for an upward search requires that a termination element
be placed beyond the actual data. This will cause the search to end if there
are no remaining data to be moved. You can accomplish this by copying the
pointer to the pivotal element, which itself will be chosen the first time
through the loop. This requires that you dimension the pointer array one
larger than you would otherwise.
Another method of enhancing a quick sort is to copy the pointer vector element
to a register variable so that a duplicate memory read won't be needed after
the loop ends. This is an optimization that certain compilers can perform;
others do not perform automatic register allocation across loop exits or a
break. With today's compilers it shouldn't make much difference whether the
local variables are declared register or static.
I recommend using the ? operator in the stack-pushing section because it
permits pipelining compilers to generate more compact code. Non-pipelining
compilers, even if they don't generate optimum code in this section, shouldn't
slow the code significantly, because there are order of n times as many
operations in the inner loops.
Note how I've set up the loops so that the outer loop test executes only when
the stack pointer is decremented, not when the test is unnecessary. This
technique also permits more effective pipelining of the code. The pipelining
compilers I've examined generate excellent code. Since the inner loops are so
short and simple, effective pipelining would require overlapping pointer and
data fetches, without waiting for one iteration to finish before the next
begins. This opens up the possibility of using uninitialized pointers --
unless the pointer array is extended enough to cover the pipeline length.


Conclusion


I don't believe that sorts have been used effectively in engineering and
scientific applications. Nor do I believe that compiler writers have had the
incentive to consider how to generate efficient code for simple search loops.
Still, sorting has always been a job which computers performed more
efficiently than people, and there are a lot of applications where data won't
necessarily always be analyzed in either the original input order or in order
by a single parameter. Sorted indices aren't hard to build and they don't need
to be slow. This should encourage you to write software that lets end-users
perform better data analysis, by allowing them to quickly view data sorted in
a variety of fashions.
References
Knuth, Donald. The Art of Computer Programming, Vol. 3, Addison-Wesley, 1973.
Plum, Thomas. Reliable Data Structures in C, Plum-Hall, 1985.
Sedgewick, Robert. Algorithms, Addison-Wesley, 1984. Good treatments of sorts
and searches. Many of the other topics in this book have weak coverage. Easier
to read than most such texts.

Listing 1
#include <stdio.h>

#define NSTACK 30 /* 2 log2 (max n) */
typedef unsigned int ARRTYPE; /* type of data to be sorted */

void iqsort( n, arr, indx) /* 1 based arrays */
unsigned int n; /* sort arr[1..n] */
ARRTYPE arr[], *indx[]; /* return indx pointers to sorted arr[] */
{
static unsigned int stack[NSTACK+1], llim, rlim, iq, i, j;
int jstack = 1; /* "unsigned int" may be faster... */
static ARRTYPE *a, *b;

static int swap;

/* initialize pointers to input order */
for( i = 1; i <= n; ++i)
indx[i] = &arr[i];

/* terminate with first pivot element (chosen same as below) */
indx[n + 1] = indx[((rlim = n) +(llim = 1)) >> 1];
do
{ /* while any unsorted segments remain */
while((int)(rlim - llim) > 1 )
{ /* 3 or more elements */
/* quick sort: split segment into 3 segments */
/* pick guess of median element to make segments near equal */
a = indx[iq = ((i = llim) + (j = rlim)) >> 1];
indx[iq] = indx[i];
indx[i] = a; /* for left terminator */
for( ; ; )
{
/* search for small element to be moved to left */
while( *a < *(b = indx[j]))
--j;
if(j <= i)
break;

indx[i] = b;

/* search for large element to be moved to right */
while( *(b = indx[++i]) < *a )
;
if(i >= j)
{
i =j;
break;
}
indx[j--] = b;
}

indx[i] = a; /* new middle segment, length 1 */
if( (jstack += 2) > NSTACK )
printf("\nNSTACK exceeded");

/* work shorter segment next to limit stacking */
stack[jstack] = (swap = (rlim-i >= i-llim)) ? rlim : (i-1);
stack[jstack - 1] = swap ? (i+1) : llim;
rlim = swap ? (i-1) : rlim;
llim = swap ? llim : (i+1);
}

/* finish off short segments directly -- optional case for 2 elements */
if(rlim - llim == 1 && *(a = indx[llim]) > *(b = indx[llim + 1]) )
{
indx[llim] = b;
indx[llim + 1] = a;
}

/* get next segment to sort from stack */

rlim = stack[jstack--];

llim = stack[jstack--];
} while(jstack != -1);
}


Listing 2
/* test driver for iqsort */
/* do not use -02 or higher with Sun C */

#include <stdio.h>

#define NSIZE 1000
#define IA 171
#define IC 11213

void iqsort();

main()
{
#ifndef __STDC______LINEEND____
unsigned int fx;
#else
unsigned short fx;
#endif
unsigned int a[NSIZE], *indx[NSIZE+1];
unsigned int i, j;
for( i = 0; i < NSIZE; ++i) /* not a serious random number generator */
a[i] = (fx = fx * IA + IC);

for( i=1; i <= NSIZE / 12; ++i)
{
printf("\n");
for(j = 1; j <= 12; ++j}
printf("%6u", a[12 * (i - 1) + j - 1]);
}

iqsort(NSIZE, &a[-1], &indx[-1]); /* expects 1 based arrays */
printf("\nSorted array:\n");
for( i=1; i <= NSIZE / 12; ++i)
{
printf ("\n");
for(j = 1; j <= 12; ++j)
printf("%6u", *indx[12 * (i-1) + j - 1]);
}
}


















An Adaptive Data Analyzer


Michael Brannigan


Michael Brannigan is president of Information and Graphic System, IGS, 15
Normandy Court, Atlanta, GA 30324 (404) 231-9582. IGS is involved in
consulting and writing software in computer graphics, computational
mathematics, and data base design. He is presently writing a book on computer
graphics algorithms. He is also the author of The C Math Library EMCL.


This article extends the methods discussed in "Fitting Curves To Data" (The C
Users Journal, Jan. 1990). That article presented an algorithm which, given a
set of data points, produced a piecewise cubic approximation to solve the data
fitting problem. However, that algorithm required the user to supply the
number and position of the knots. This article describes an algorithm that
chooses, adaptively, the number of knots and their position.
Like the earlier algorithm, this adaptive approach will require some linear
algebra. However, the adaptive algorithm will also draw upon statistical
theory (to analyze errors), information theory (for the adaptation mechanism),
and non-linear optimization methods (to compute knot positions).


The Problem


Consider the basic problem. Given data points (xi, yi) i=1,2,...,n we suppose
there exists a relation
yi = F(xi) + ei, i=1,2,...,n
where F(x) is an unknown underlying function and ei represents the unknown
errors in the measurements yi. The goal is to find a function f(ak,x) with
parameter vector ak, of length k, which is a good approximation to F(x).
The earlier non-adaptive algorithm solved this problem for a fixed value of k
(set by the user). Treating k as a variable creates a more complex problem. If
we go even further and ask the computer to choose the best value for k and ak,
then the algorithm bocomes adaptive.
In curve-fitting applications, the underlying function F(x), often represents
some (often disjointed) relationship in the physical world. Splines, or some
other piecewise function, have the advantage of matching this disjointed
behavior, while retaining global approximation properties. Piecewise functions
also exhibit an advantageous flexibility -- behavior in one region doesn't
significantly effect their behavior in another.
Piecewise functions are also important to the analysis of time-dependent data,
or time series analysis, where the periods determine the pieces of the
piecewise function.
Piecewise functions depend on certain points called knots. Between each knot
we compute some function, usually a cubic polynomial, with parameters
depending on the data. For time series the knots occur at the end of the
periods, and the piecewise function is a Fourier series.
No matter how you choose the piecewise function f( ), the unknown parameter k
depends on these knots, how many knots and where the knots are placed. This
article considers the problem of how to automatically find the optimum number
of knots and the optimum placement of the knots, that is, to find k.


Statistics


An "optimum" approximation must fit the experimental data with a minimum of
error. But whether a given approximation's errors are minimum, compared to a
competing approximation's, depends on how we measure the error and what
assumptions we make about how "minimum" errors are distributed. In every
fitting operation, there are potentially two independent sources of error:
poor matching between the approximating function and the real-world
phenomenon, and measurement errors in the experimental data. Analytically it
is convenient to contribute all errors to the experimental data.
If we assume the errors in the data are uniformly distributed, with
distribution function:
Click Here for Equation
then we need only find the value of the parameter s. Given the n errors
ei,i=1,2,...,n from an approximation, then the maximum absolute error is
called the maximum likelihood estimation of the value of s. Approximations
generated with the minimax norm from the non-adaptive program [1] will
minimize the maximum likelihood estimate of s. Unfortunately, errors in data
are uniformly distributed only when all errors can be attributed to errors of
computation (that is, the data is exact up to the last few bits), thus one can
seldom use the minimax norm.
The most popular alternative is to assume that the errors have a normal
distribution given by:
f(e;m,s) = exp(-(e-m)2/2s2)/((2p)1/2s)
(recent research, however, does not uphold this assumption). Given the errors
ei, i=1,2,...,n the maximum likelihood estimates of m and s are respectively
the mean and standard deviation. If the mean is zero, then the standard
deviation s is given by
Click Here for Equation
This standard deviation can be minimized by approximation using the L2 or
least squares norm approach from [1].
However, modern statistical research indicates that the best approximations
are produced by assuming that errors are described by the less frequently used
Cauchy distribution, given by:
f(e;s) = s/(p(s2+e2))
Statisticians call approximation driven by this assumption about errors
"robust" estimation, because the estimates are not influenced by an occasional
large error, or "outlier".
For this distribution, the maximum likelihood estimate s* of the parameter s,
given the errors ei,i=1,2,...,n is
Click Here for Equation
Which is a definition of the L1-norm, another approach supported in the
earlier non-adaptive program [1].


Information Theory (The AIC)


The minimum amount of information which one can prescribe to a set of errors
is the form of the distribution function. Thus, the user, having chosen a
suitable set of approximation functions, decides which distribution function
to expect from the actual errors. An adaptive algorithm needs only the form of
the distribution and not the distribution parameters; these parameters become
an integral part of the output for subsequent analysis.
To select the distribution form, we need some way to compare the errors of a
single approximation with the unknown, underlying errors of the data. The
'Mean Information for Discrimination', an information theoretic concept
related to the 'Maximum Entropy Principle', forms the basis for such a
comparison.
Using these concepts, theoreticians have shown that
-2 † ln f(ei;s*) + 2k
(where f(e;s) is a selected distribution function, s* is the maximum
likelihood estimate of the distribution parameters given the n errors
(observations) ei, and k is the number of unknown parameters.) Thus, by the
maximum entropy principle, we seek a value of k and a set of errors ei that
minimizes this expression.
This information theoretic criterion was first used in the analysis of time
series and is known in the statistics literature as the AIC. If you assume
that the errors are normally distributed, then this method is equivalent to
the method known as 'Generalized Cross Validation'. The method described here
is superior because it relaxes the restrictive assumption about normally
distributed data errors. The mathematically inclined reader can find all the
relevant proofs in [2].
For each of the distributions considered above, we get the AIC from the
maximum likelihood estimates of the parameters. Thus:
Click Here for Equation

The goal of the algorithm is to minimize the appropriate AIC with respect to
k.


Optimization


To optimize the knot positions for each value of k, the algorithm must
determine the best position for the knots by optimizing the AIC function. This
is a highly non-linear function of these knot positions, which are constrained
to lie within the bounds of the knots. In other words, we have a constrained
non-linear optimization problem, which is at best a difficult computational
problem. The problem becomes more tractable if we get rid of the constraints
and transform the problem to an unconstrained one as follows.
Given the knots Xo,...,X k define
hi = Xi - Xi-1
and
si = ln(hi+1/hi)
This gives k-1 unconstrained variables and an unconstrained non-linear
minimization problem. I strongly suggest you use a Quasi-Newton algorithm to
solve this optimization problem. Also, make sure that the computer program
that performs the optimization uses a rank-one update to keep the matrices
symmetric positive definite, otherwise convergence is most unlikely.


Algorithm


The flow chart in Figure 1 describes a general algorithm, useful for any
fitting function.


Figure 1


In this flow chart the knots represent unknown parameters. For a time series
adaptive fit, the knots are the new periods. For piecewise polynomial fits,
the knots are the points where the pieces fit together. As the flow chart
shows, the algorithm converges when the value of the AIC stops decreasing. In
each of the AIC functions, the first part is strictly decreasing, while the
second part is strictly increasing. The minimum occurs when the increasing
part, which begins at zero, takes over the decreasing part.
The program in Listing 1 utilizes the piecewise cubic approximation described
in my earlier non-adaptive solution [1] to create an adaptive solution to the
data fitting problem.


Listing 1


The main routine adapt( ) follows the algorithm using the approximation
routine Hermite( ) from [1]. First we fit the data to two, then three knots to
give the first values of AIC for comparison. If a minimum AIC is not found, we
consider each interval to see if it is a candidate for the inclusion of an
extra knot.
This step involves some heuristic reasoning. We say that an interval is a
candidate for a new knot if the second moment of the errors in that interval
is larger than the global second moment. If the interval is a candidate for a
new knot, then we find the best position for the knot using the routine point(
). If the new point is close to an old knot, we exchange the knots and do not
add another. If, on the other hand, the new point is not close to an old knot,
then we add the new point as a new knot. This testing (to see whether the new
point is close to a knot or not) is performed in the routine test_point( ).
The routine AIC( ) computes the value of the AIC for different distributions.
The utility routine numpoints( ) computes the number of data points in a given
interval. After the final fit is achieved, we then optimize the position of
the knots using the routine QN_BFGS_F( ), which is a Quasi-Newton algorithm
using a Broyden, Fletcher, Golfarb, Shanno formula. This algorithm is
numerically stable if rank-one updates of the Cholesky factors are
implemented.
Figure 2 and Figure 3 show the results of a demonstration run in which some
error was added to 200 points generated by the function x2/3.
Figure 2 shows the result of the program run with this data with a normal
distribution assumption. Figure 3 shows the result when some spurious points
from a straight line are added. A Cauchy distribution was assumed for this run
so that the "outliers" would not influence the result.
Additionally, the knots and their positions in the figures from [1] were found
using the program listed here.
Bibliography
1. Brannigan, M. "Fitting Curves to Data," The C Users Journal, January 1990.
2. Brannigan, M. "A Multivariate Adaptive Data Fitting Algorithm" in
Numerische Methoden der Approximation Theorie, vol. 5, ISNM Birkhauser 1982.
3. Akaike, H. "Information Theory and the Extension of the Maximum Likelihood
Principle," Proc. 2nd Symp. Inf. Th. Akademie Kiado, Budapest 1973.
Figure 1
Figure 2
Figure 3

Listing 1
#include "math.h"
#include "stdlib.h"

double adapt (x,y,n,res,err,lambda,numlambda,norm)
/*
* This routine adaptively fits a Hermite cubic to the data points
* x[i],y[i] i = 0,1,...,n-1 according to the value of norm.
* norm = 1 L1 (Robust) fit,
* = 2 L2 (Least Squares) fit,
* = 3 Minimax fit.
* Output are the resulting coefficients in res[] and the errors
* err[] at each point. The final fit is achieved using numlambda
* knots lambda[].
* The return value is the final value of the norm.

*/
double *x,*y,*res,*err,*lambda;
int n,*numlambda,norm;
{
double m,capm,xm,*newlambda,*s,z,aic1,aic2;
int j1,j2,i,k,l,newknots,numknots,flag=0;
extern double (*faic)();
/* Allocate space */
newlambda = (double*)calloc(2*n,sizeof(double));
s = newlambda + n;
/* Fit two knots */
newlambda[0] = x[0]; newlambda[1] = x[n-1];
z = Hermite (x,y,n,norm,newlambda,2,flag,res,err);
aic1 = AIC(err,n,4,norm);
/* Fit three knots */
newlambda[2] = newlambda[1]; newlambda[1] = point(newlambda,res,1);
z = Hermite(x,y,n,norm,newlambda,3,flag,res,err);
aic2 = AIC(err,n,6,norm);
numknots = newknots = 3;
while (aic1>aic2) /* Test for minimum AIC */
{
aic1 = aic2;
for (i=0; i<numknots; i++) lambda[i] = newlambda[i];
numknots = newknots;
j2 = 0;
for (capm=0.0,i=0; i<n; i++) capm += err[i]*err[i];
capm /= (double)n;
/*
* Test each interval (lambda[k],lambda[k+1])
*/
for (k=0; k<numknots-1; k++)
{
j1 = j2;
l = numpoints (x,n,lambda[k],lambda[k+1],&j2);
for (m=0.0,i=j1; i<j2+1; i++) m += err[i]*err[i];
m /= l;
if (m>capm) /* Candidate point */
{
xm = point(lambda,res,k+1);
if (test_point(x,n,lambda,k) k==0 k-1==numknots)
{
/* Add the knot to newlambda */
for (i=k+1; i<newknots; i++)
newlambda[i+1] = newlambda[i];
newknots++;
newlambda[k+1] = xm;
}
}
}
z = Hermite (x,y,n,norm,newlambda,newknots,flag,res,err);
aic2 = AIC(err,n,2*newknots,norm);
}
/* Fit is complete now optimize on minimum AIC */
z = Hermite(x,y,n,norm,lambda,numknots,flag,res,err);
for (i=1; i<numknots-2; i++)
s[i] = log((lambda[i+1]-lambda[i])/(lambda[i]-lambda[i-1]));
z = QN_BFGS_f(faic,s,numknots-3,err);
return(z);
}


double AIC (obs,n,p,flag)
/*
* This function computes the Akaike Information Criterion for
* the n observation obs[] with p parameters. The distribution
* assumed is given by flag, where
* flag = 1 Cauchy,
* = 2 Normal,
* = 3 Minimax
*/
double *obs;
int n,p,flag;
{
double xn,xp,sum,a,aic,xx,xbar;
int i;
xn = (double)n; xp = (double)p;

switch (flag)
{
case 1:
for (sum=0.0, i=0; i<n; i++) sum += obs[i]*obs[i];
a = sqrt(sum/xn);
for (aic=0.0, i=0; i<n; i++) aic += log(a*a + obs[i]*obs[i]);
aic -= (xn*log(a) - xp);
break;
case 2:
for (xbar=0.0,sum=0.0,i=0; i<n; i++)
{ sum += obs[i]*obs[i]; xbar += obs[i]; }
xx = sum/xn - (xbar/xn)*(xbar/xn);
aic = xn*log(xx) + 2.0*xp;
break;
default:
for (sum=0.0,i=0; i<n; i++) sum += pow(obs[i],20.0);
aic = xn*log(sum)/20.0 + 2.0*xp;
break;
}
return aic;
}

double point (lambda,res,i)
/*
* This function computes the possible placement of an extra knot.
* The present set of 'numknots' knots is given by lambda[] with a
* present set of results res[]. The interval lambda[i-1],lambda[i]
* is tested.
*/
double *lambda,*res;
int i;
{
double a,b,c,dist,x[3],s,middle,xm,half,p;
int k,k1,k2,k3,k4;
k1 = 2*i - 2; k2 = k1 +1; k3 = k2 + 1; k4 = k3 +1;
dist = lambda[i] - lambda[i-1];

/*
* Compute derivative of cubic
*/
a = 6.0*res[k1] + 3.0*res[k2]*dist - 6.O*res[k3] + 3.0*res[k4]*dist;
b = -6.0*(dist + 2.0*lambda[i-1])*res[k1];

b -= 2.0*dist*(lambda[i-1] + 2.0*lambda[i]*res[k2];
b += 6.0*(dist + 2.0*lambda[i-1])*res[k3];
b -= 2.0*dist*(lambda[i] + 2.0*lambda[i-1])*res[k4];
c = 6.0*lambda[i-1]*(dist + lambda[i-1]*res[k1];
c += dist*lambda[i]*[lambda[i] + 2.0*lambda[i-1])*res[k2];
c -= 6.0*lambda[i-1]*(dist + lambda[i-1])*res[k3];
c += dist*lambda[i-1]*(lambda[i-1] + 2.0*lambda[i])*res[k4];
/*
* compute new knot
*/
x[0] = lambda[i-1]; x[1] = x[0];
s = b*b - 4.0*a*c;
if (s > 0.0)
{
s = sqrt(s);
x[0] =(-b + s)/(2.0*a);
x[1] = c/x[0];
}

x[2] = -b/(2.0*a);
half = (lambda[i] - lambda[i-1])/2.0;
p = middle = lambda[i-1] + half;
for (k=0; k<3; k++)
if ((xm = fabs(middle - x[k])) < half) { half = xm; p = x[k]; }
return p;
}

numponts (x,n,xl,xr,k)
/*
* Compute the number of values from the data set x[n] in
* the interval [xl,xr] starting with x[k]. Output the last
* subscript in k.
*/
double *x,xl,xr;
int n,*k; 
{
int count = 0, j = *k;
while (x[j] < xr)
if (x[j] > xl)
{ count++; j++; )
*k = j-1;
return count;
}

test_point (x,n,lambda,xm,k)
/*
* Routine to test whether the point xm is to be added to the
* new set of knots in the interval (lambda[k],lambda[k+1]).
*/
double *x,*lambda,xm;
int n,k;
{
double x1,x2,x3; 
int insert,11,12,j=0;
/* Count points in the interval */
l1 = numpoints (x,n,lambda[k],xm,&j);
l2 = numpoints (x,n,xm,lambda[k+1],&j);
x1 = xm - lambda[k]; x2 = lambda[k+1] - xm;
x3 = ((x1<x2) ? x1 : x2)/(x1 + x2);

insert = (x3 > 0.1 && ((x1<x2)? l1 : l2) > 10) ? 1 : 0;
if (!insert) /* Exchange the knots */
if (x1<x2) lambda[k] = xm;
else lambda[k+1] = xm;
return insert;
}

























































Executable Strings


James A. Kuzdrall


James A. Kuzdrall has been programming digital computers since 1960 and
designing them since 1970. He is an MIT and Northeastern University graduate,
and enjoys creating efficient algorithms and using computers extensively in
engineering analysis and instrumentation control. He may be reached at Intrel
Service Co., Box 1247, Nashua, NH 03061 (603) 883-4815.


Programmers who use C for hardware control soon discover they can't do
everything from C. For example, C doesn't provide control over interrupts or
access to the processor's internal registers. When processor dependent
facilities are required, programmers must create assembly language modules and
link them to the C program. The executable string technique explained here
lets you write short assembly programs without learning the details of your C
compiler's assembler and linker.
C strings are sequential arrays of bytes (characters) that reside someplace in
memory and are identified by a pointer containing the address of their first
byte. It is quite simple in C to make a pointer point to a function rather
than a string. Creating a string of bytes which represents a real program
isn't that difficult either.
The executable string technique is practical for programs of up to 20 bytes or
so. Creating an executable string does require a knowledge of machine
language. But since most of these simple programs need only
register-to-register transfers and the most elementary addressing modes, only
a superficial knowledge of the processor's instruction set is necessary.
For example, consider an executable string which masks (prevents) 6809
hardware interrupts by setting a bit in the condition code (CC) register of
the microprocessor. Listing 1 gives the assembly code and its numeric
equivalent.
The two-line assembly program is manually translated to the equivalent three
bytes of machine code using lookup tables for the 6809 microprocessor. Most
lookup tables give machine code in hexadecimal, whereas C strings
traditionally accept octal character constants. (You may wish to generate a
hex-to-octal lookup table using printf().) The executable string is coded in C
as follows:
char *x_str;
x_str= "\032\020\071";
For C to call the string as a function requires a cast. C must be told that
x_str is a pointer to a function returning an integer or, for better
documentation, a function returning nothing (void). The cast function call is:
/* for non-ANSI compilers */
#define void int
/* set interrupt mask */
(*((void (*) ())x_str)) ();
Since the string is short and won't be used elsewhere, the call can be
simplified by eliminating the step of assigning the string address to x_str:
/* set interrupt mask */
(*((void (*)())"\032\020\071"))();
For those who wish to believe C is inscrutable, this function call may well
serve as the final proof. You should at least include the code-equivalent
comment with executable strings, as shown in the listings.
This example is one of very few functions that require no parameter exchange
with the C program. Where such exchanges are necessary, you must know how your
compiler passes single parameters back and forth to functions. Most compilers
pass the parameter to the function on the stack just above the return address.
Some compilers pass single parameters in one of the processor's registers. A
simple function return value is most likely passed in a processor register.
Your compiler manual will have this information in a section on assembly
language interface.
Executable strings also provide an efficient way to call system functions that
require parameters to be passed in registers different from those used by the
compiler. The executable string program can shuffle the parameters into the
correct registers, coming and going. The system calls seldom involve more than
a single pointer or integer parameter.
Consider the drive-select function common to CP/M, FLEX, and MS-DOS. The
function requires an input parameter specifying which drive to select and
returns the status of that drive. For example, FLEX requires the address of a
file control block (fcb) containing the desired drive code. FLEX wants the fcb
address in the X register whereas the INTROL-C compiler delivers it in the D
register. On return, FLEX sets the condition code register (CC) carry bit to
one if there was an error, whereas the INTROL-C compiler would like to see a
value (-1 for ERROR) in the D register. Listing 2 shows an executable string
that does the swaps.
Executable strings work with any microprocessor or computer that has a linear
(continuous) address space. These include the 65xx, 8085, Z80, 68xx, 68xxx,
32xxx, and most minis or mainframes. This trick won't work with the 8086
family, which uses a segmented addressing scheme, unless the code is compiled
using a "large" memory model (four byte pointers).

Listing 1
/* Executable string: mask interrupts
Assembly Hex Octal Comment
ORCC #$10 1A 032 SET Condition Code Register
10 020 Turn on IRQ mask bit
RTS 39 071 Return, no value (void)
*/


Listing 2
/* Executable string: drive select
Assembly Hex Octal Comment
TFR D,X 1F 037 transfer fcb addr to X
01 001
JSR $DEOC BD 275 call the FLEX routine
DE 336
OC 014
BCS +3 25 045 branch if error (carry set)
03 003 3 bytes beyond this branch
CLRA 4F 117 make D=0 (ACCB+ACCB=D)
CLRB 5F 137
RTS 39 071 return zero to caller
LDD #-1 CC 314 make D= -1 (ERROR)
FF 377
FF 377
RTS 39 071 return ERROR to caller
*/


FILE *fcb;
char *drvset =
"\037\001\275\336\014\045\003\117\137\071\314\377\377\071";

/* code fragment using drvset; reports and exits if error */
if( (*((int(*)())drvset))(fcb) == ERROR )
errxit(fcb->errno);























































A Packetized Ring Buffer


Martin Stitt


This article is not available in electronic form.






















































Storage In C


Matt Bishop


Matt Bishop teaches computer science at Dartmouth College and has used C
extensively on many different types of UNIX-based systems. Before, he was a
research scientist at the Research Institute for Advanced Computer Science.
His research areas are computer security and software engineering, especially
portability. He may be contacted at Department of Mathematics and Computer
Science, Dartmouth College, Hanover, NH 03755 (603) 646-3267.


Unlike other high-level languages, C grants the programmer some control over
where variables are placed -- over "storage." When a program defines a
variable, the compiler decides how to store the value of that variable and
creates a space in memory for the value. The compiler uses the declaration to
determine how much space to allocate and how to allocate it -- as temporary
storage (such as on a stack) or as more permanent storage (in data space).
The format of a C variable declaration is:
<storage-class specifier> <type identifier>
The storage classes are static, auto, register, and extern. (A fifth keyword,
typedef, is considered a storage class for syntactic reasons, but plays no
part in anything that follows.)
Each of the following sections discusses one type of storage declaration, its
effects, and when it is legal. The last section looks at a simple program
written in a variety of ways and shows how changing storage classes affects
the way the compiler handles storage.


Auto


The simplest class of storage is automatic storage, indicated by the storage
class keyword auto. Any variables defined at the beginning of a block with no
explicit storage class specifier are assumed to be automatic. The storage for
these variables is created when the block is entered, and exists until the
block is left. For example, in Listing 1, the storage space for i is allocated
when the routine is entered, and deallocated when the program leaves the
routine.
Most compilers keep autos in a stack -- a list onto which things are pushed
(or put) and from which things are popped (or removed.) Because the last thing
pushed onto the stack is the first thing removed, a stack is sometimes called
a "LIFO" list (for "Last In, First Out").
Note that the storage class specifier is almost always omitted when a variable
is automatic; since automatic variables only occur in blocks and never outside
them, this does not pose any problems. However, another version of automatic
storage, register requires a storage class specifier.


Register


Unlike with other storage class specifiers, compilers are at liberty to obey
or ignore the register keyword. This storage class is the same as automatic so
far as the programmer is concerned; however, rather than allocating storage on
a stack (as with variables declared auto) the compiler arranges for the
variables to be stored in CPU registers. Usually the machine can access data
in the registers much faster than it can data on the stack.
Because of the nature of machine registers, declaring a variable as a register
variable entails some restrictions. Many machines cannot use the address of a
register in the same way they use a memory address, so the address operator &
cannot be applied to a register variable. Some compilers will not allow
certain types of variables to be assigned to registers. Also, compilers accept
only a limited number of register declarations (the number varies from machine
to machine and even from compiler to compiler) and do not give messages
indicating when the register keyword is being ignored.
In Listing 2 the variable ir is stored in the stack; however, the contents of
address_of_ir will be put into a register. This register variable will be
assigned as its value the address of ir. Note that references using pointers
are legal with registers, so the printf will print the value stored in ir
correctly. However, if ir were declared as a register variable rather than an
auto, the compiler would have printed an error message for the line
address_of_ir = &ir;
since that line involves taking the address of the register variable.
Both register and auto classes are transient; they go away when the block
finishes executing. But often a program needs values to remain throughout the
life of the program; the next two sections deal with longer duration storage
classes.


Extern


When applied to a variable, the storage class extern indicates that the
definition of the variable is located in another file. Hence, extern
statements are declarations giving just type information rather than
definitions; the extern class does not cause any memory to be allocated.
An extern declaration may appear anywhere before the declared variable is
referenced. An extern declaration can even appear in the same file as the
corresponding definition which supports the common practice of collecting all
related extern declarations in a header file that is #included in all source
files.
An extern function declaration indicates that the function is defined
elsewhere. (The keyword extern is often omitted here.) The compiler uses this
declaration for type information and nothing else. See the example in Listing
3.
When the compiler encounters a function, it assumes that function returns an
integer value unless that function has been declared previously. So, in
Listing 3 the compiler assumes sqrt() returns an integer and generates code to
coerce the returned integer into floating point format at the assignment
statement. This leads to a result that is quite wrong:
The square root of 2 is 1070596096.000000
(The precise answer is machine dependent; but it will be wrong.) However, if
the line
extern double sqrt();
is placed before the assignment statement, the compiler will understand that
the function sqrt() returns a double and will not do any type coercion at the
assignment statement; the result in this case is
The square root of 2 is 1.414214
Every extern declaration must have a corresponding definition (unless the
variable or function in it is never referenced). Precisely what constitutes an
acceptable definition varies among compilers.
With some compilers, the single definition at the top level (that is, not
contained inside any function's body) is taken to be the definition associated
with extern declarations. If more than one such definition occurs, the
compiler (actually, the linker) will report an error. These errors are of the
form "variable (or function) multiply defined."
Other compilers follow the ANSI C standard and use a more complex scheme. They
consider a top-level declaration of a variable to be a tentative definition if
the storage class is static or omitted. If a tentative definition is found in
which the variable is initialized, that is taken to be the definition and the
other tentative definitions become declarations. Otherwise, the first
tentative definition becomes the definition and the rest become declarations.
For example, suppose Listing 4 and Listing 5 are two separate source files
belonging to the same program. If the first rule of defining variables is
followed, each source file will compile correctly, but when they are linked,
the linker will find two definitions of testcalled, and report that testcalled
is multiply defined. If the ANSI rule of defining variables is used however,
the statement
int testcalled;
in test.c is considered a tentative definition, and the statement
int testcalled = 0;
in main.c is considered the real definition, because testcalled is
initialized. Hence, the tentative definition in test.c becomes a declaration
and the linking procedure succeeds. Note that if the statement in test.c had
been
int testcalled = 0;
there would have been two definitions, not one, and the linker would have
complained that testcalled had been multiply defined.

The rules of global definition also apply to another storage class, which on
first glance seems to be the most complicated.


Static


Variables with static storage class retain their value throughout the life of
the program. When used outside a function definition, static has the side
effect of not allowing the variable or function to be referenced anywhere
except within the enclosing source file.
The slightly reworked versions of main.c and test.c in Listing 6 and Listing 7
illustrate the effect of static on variable life.
When compiled, linked and executed, this program generates:
test() called, testcalled = 1
test() called, testcalled = 2
If, for comparison, we omit the storage class keyword static from the
declaration of testcalled in test(), testcalled will be an automatic variable.
As an automatic, testcalled will be created each time test() is called, and
discarded each time test() returns. Hence, we get the following result:
test() called, testcalled = 1
test() called, testcalled = 1
This graphically points out that regardleas of their scope, static variables
retain whatever value they are assigned throughout the life of the program,
whereas automatic variables do not.
When declared outside a function, the storage class static has one additional
side effect: it limits the scope of the variable or function so declared to
the file in which it is declared. For example, a program built from the main.c
in Listing 4 and the test.c in Listing 8 would produce
test() called, testcalled = 0
test() called, testcalled = 0
because the variable testcalled in test.c is not visible to any other file.
So, it and the variable testcalled in main.c are completely different.
To illustrate what happens when a function is declared static, I've combined
main.c and test.c into one file, and made test() static (see Listing 9). The
result is:
test() called, testcalled = 1
test() called, testcalled = 2
If, however, I put test() into a separate file (Listing 10), both files main.c
and test.c will compile, but the linker will complain that "test is
undefined". Because test is declared as static, it is only visible in the file
test.c; it cannot be accessed by anything in the file main.c.


An Example


Listing 11 contains a very simple program that sets a variable total to 20,
and loops, adding 20 each time through, until total's value is more than 1000
or until 1000 loops have been made.
First, note that the variable counter is declared automatic. It could just
have easily been declared static; the effect would be the same, since it is
local to main and exists until that routine exits (at which time the program
exits.) So, counter could be any storage class except extern.
However, notice that total is defined as static on line 13A, thus total will
increase in value each time dowork is called. The program would exit when
total reached 1020. If, instead, the word static were replaced by auto (or
register, or just omitted) then each time dowork was called, total would be
recreated and reinitialized to 20. In this alternative version dowork would
always return 40, and the program would loop 1000 times before exiting.
In yet another alternative, we could move total outside the function dowork
and put it either above or below main. We could even put it after dowork, but
then we would need the statement
extern int total;
somewhere before line 14A (otherwise total would be undefined and undeclared
at that point, causing a compile-time error.)
If we split this example into two files main.c (Listing 12) and dowork.c
(Listing 13) and declare the function dowork static, the program will not
compile correctly because the function dowork isn't defined anywhere (so far
as the function main is concerned).
In this variation line 13B could be moved before line 11B without changing
anything. In fact, line 3B could be put before line 1B, and every occurrence
of the variable counter could be renamed total without any problem, because
the variable total in the file dowork.c is declared static and hence is
defined for that file only.


Conclusion


By choosing storage classes carefully, a programmer can balance speed against
memory used. Moreover, because static affects the visibility of objects, a C
programmer who plans to practice sophisticated modular design, must thoroughly
understand the ramifications of storage class.

Listing 1
show()
{
auto int i;

i = 7;
printf("In show, i = %d\n", i);
}


Listing 2
show()
{
auto int ir;
register int *address_of_ir;


ir = 7;
address_of_ir = &ir;
printf("In show, i = %d\n", *address_of_ir);
}


Listing 3
main()
{
double x;
x = sqrt(2.0);
printf("The square root of 2 is %f\n", x);
}


Listing 4
int testcalled = 0;

main()
{
test();
printf("test() called, testcalled = %d\n", testcalled);
test();
printf("test() called, testcalled = %d\n", testcalled);
exit(0);
}


Listing 5
int testcalled;

test()
{
testcalled++;
}


Listing 6
main()
{
register int result;

result = test();
printf("test() called, testcalled = %d\n", result);
result = test();
printf("test() called, testcalled = %d\n", result);
exit(0);
}


Listing 7
test()
{
static int testcalled = 0;
testcalled++;
return(testcalled);
}



Listing 8
static int testcalled;

test()
{
testcalled++;
}


Listing 9
main()
{
test();
printf("test() called, testcalled = %d\n", testcalled);
test();
printf("test() called, testcalled = %d\n", testcalled);
exit(0);
}
static test()
{
testcalled++;
}


Listing 10
int testcalled;

static test()
{
testcalled++;
}


Listing 11
1A. main()
2A. {
3A. int counter = 0;
4A. while (counter < 1000){
5A. if (dowork() > 1000)
6A. break;
7A. printf("Number of loops: %d\n", counter++);
8A. }
9A. printf("Done!\n");
10A. }
11A. dowork()
12A. {
13A. static int total = 20;
14A. total += 20;
15A. printf("total is %d ... ", total);
16A. return (total);
17A. }


Listing 12
1B. main()
2B. {
3B. int counter = 0;
4B. while (counter < 1000){
5B. if (dowork() > 1000)

6B. break;
7B. printf("Number of loops: %d\n", counter++);
8B. }
9B. printf("Done!\n");
10B. }


Listing 13
11B. dowork()
12B. {
13B. static int total = 20;
14B. total += 20;
15B. printf("total is %d ... ", total);
16B. return (total);
17B. }
















































An Applied File I/O Tutorial: The Mini-Database System
Leor Zolman wrote "BDS C", the first C compiler designed exclusively for
personal computers. Since then he has designed and taught programming
workshops and has also been involved in personal growth workshops as both
participant and staff member. He still doesn't hold any degrees. His latest
incarnation is as a CUJ staff member.


Last month I began this series on a special-purpose small database system by
presenting the global data structures and main program. While my main goal is
to present an applied example of the design interdependencies between data
structures and disk I/O techniques, there is still one major component of the
system to tackle before sinking our teeth into the actual file I/O code: the
data-entry and editing module.
The entire listing this month is MDBEDIT.C, the record editing module. It
consists primarily of the function edit_db(), most of which is a large switch
statement to handle all the available editing options. edit_db() is called
twice from the main program: to edit a database (via the EDIT main menu
command) and to begin editing a newly created database (after the CREATE
command has been issued).
edit_db() maintains an internal state variable, cur_rec, identifying the
currently selected record. Upon entry to edit_db(), cur_rec is set to zero
(the number of the first record of the file). For coding convenience, the
variable rp (a pointer to structures of type record) holds a pointer to the
cur_rec'th element of the RECS array; this allows any expression of the form
RECS [cur_rec] ->member
to be written more simply as
rp->member
provided, of course, rp is updated to reflect any changes made to the value of
cur_rec.
The global variable n_recs (declared externally in header file mdb.h) is also
vital to the record editing process: whenever a record is added or purged from
the database, the value of n_recs is adjusted accordingly. n_recs always
reflects the total number of records, both active and inactive, currently
stored in the database.


Record Management


To simplify the task of inserting and deleting records, I've selected a simple
scheme for record management: each record is either "active" or "inactive", as
reflected by the value of the active flag associated with the record. When a
new record is created, its active flag is set to TRUE (1). To delete a record,
the DELETE command merely changes the record's active flag to FALSE (0). The
(now inactive) record's other data remains intact, however, until the FIX
command is used at some later time to release the storage of all inactive
records en masse.
The commands for perusing data records sequentially NEXT and PREVIOUS both
ignore inactive records. The LIST command, however, lists all records in the
database, both active and inactive. To access an inactive record, the SELECT
command may be used to address the record directly via the record number shown
by the LIST command. Once an inactive record has been SELECTed, the UNDELETE
command may be used to make the record active once again, if desired.
Inactive records only "stay around" until the next time the FIX command is
invoked. Each time FIX is run, the storage for all deleted records is freed
and the record pointers are compacted and sorted.


The User Interface And Main Editing Loop


A menu structure, similar to the one shown last month for the main menu,
controls the option menu for edit_db(). The record editing options I chose to
implement under this menu are rudimentary to any sort of data-retreival system
and, for brevity's sake, are operationally limited to a line-oriented
interface.
Upon entry to edit_db(), cur_rec is initialized to the first record of the
database and an infinite loop is begun. For each iteration of the loop, rp is
set to point to the new current record's data, and the values of all fields in
the current record are displayed on the console output. If there are currently
no records in the database (determined by checking the value of n_recs for 0),
then the current record display sequence is skipped.
Next, the menu is displayed and the user selects from the list of editing
options. The options are:


LIST_RECS


This option displays a "short form" listing of all records in the database.
For each record, the record number (equal to the record pointer's subscript
number in the RECS array) and name fields are always listed. If the value of
the variable active for the current record is false (indicating that the
record is inactive), then the message "(deleted)" is also displayed. Records
are displayed in their physical order in the RECS array; this will correspond
to the order of entry until the first time the FIX option is invoked (see
below.)


NEXT, PREVIOUS


These two options select the next or previous active record (in physical
sequence) to be the "current" record by adjusting the value of cur_rec as
required. Before adjusting cur_rec, however, we check that we're not sending
its value out of bounds, by iterating the variable i forward or backward
through the records. If an active record is encountered, then cur_rec is set
accordingly; if, however, the end-of-file (or beginning-of-file) is reached,
then a message to that effect is displayed and cur_rec remains unchanged.


NEW


Creates a new record. If we have reached maximum record capacity, then an
error message is displayed and the option is aborted. The alloc_rec() function
returns a pointer to storage for the new record; if the return value is NULL,
then memory could not be allocated and the option is aborted.
Once storage is acquired, cur_rec is set to the old value of n_recs (the
record count) and n_recs is incremented to reflect the new number of records
(line 125). After this operation, cur_rec is the subscript of the slot in RECS
that will hold the new record pointer. Next, rp is copied into the new slot
(line 126) and the element active is made TRUE (line 127). In lines 129 though
134, all data elements of the new record are initialized to "empty" values and
control falls through to the next case, MODIFY.


MODIFY


This section is a quick-and-simple editing sequence for the data elements of
the current record. For each element in sequence, the following is performed:
The record description and current value is displayed.
A line of input text is read from the console.
If the line is empty (i.e., if the Return key was pressed immediately), then
the original value of the element is left intact.
Otherwise, the value of the element is overwritten with the value of the input
(converted to the appropriate data type).

In a "real" application additional input validations would occur at this
point. For example, to limit the gender field to only the values M or F, you
could change the if sequence in lines 154 and 155 to:
while (strlen(gets(buf)) > 0 /* if not empty line */
{
char c = toupper(*buf);
if (!(c == 'M' c == 'F')) /* if not legal */
{
printf("Please enter 'M' or 'F': ");
continue; /* ask again */
}
rp->gender = c; /* assign the value */
break; /* exit the loop */
}


DELETE


Deleting a record is simply a matter of setting the active flag to 0. If the
flag is already 0, we display a message to that effect. Otherwise, we make the
user confirm the deletion by typing a 'y'.


UNDELETE


This option restores a "deleted" record by setting its active flag back to 1.
If the record was active in the first place, a message is issued.


SELECT:


Used in conjunction with the LIST option, this option can directly select any
record in the database. SELECT asks for a record number and checks to make
sure the value is within bounds. If the record number is valid, it is assigned
to cur_rec. Since we wish to allow both active and inactive records to be
selected, there are no further validity checks necessary.


FIX


The FIX options serves a dual purpose: first, to purge all inactive (deleted)
records from the database; second, to sort the remaining records. The fix_db()
function does the work, and herein lies the only tricky code in this
installment.
To purge deleted records, a single pass is made over the database records
using two indices. The first index, the loop variable i, iterates over the
entire array of record pointers (that is, all n_recs records). The second
index, new_n_recs, is incremented only for active records; when an inactive
record is encountered, new_n__recs is not incremented, and the pointer to the
inactive record's data is passed to the free() function to deallocate the
storage.
At the start of each iteration through the loop, record i is copied into
position new_n_recs (line 222), and then new_n_recs is adjusted as described.
The result is that the memory used by all inactive records is de-allocated,
and the positions in the RECS array formerly occupied by inactive record
pointers are filled in with active pointers.
After the purge loop completes, a single call to the qsort() library function
sorts the remaining records. qsort() is a wonderfully versatile function, but
the interface can be a bit confusing. In this case, we wish to sort the
pointers in the RECS array by the last field; we first set up the call to
qsort() and then set up a comparison function that qsort() uses to determine
the relationship between two pointers.
qsort() takes four parameters: a pointer to an array of data items to be
sorted, the number of items in the array, the size of each item, and a pointer
to a user-supplied function (to be called by qsort() directly) that accepts
two pointers to the data array elements as parameters, compares the two data
items specified by the pointers, and returns (to qsort() one of several
pre-defined values indicating the relative ordering of those two data items.
Since we are sorting the pointers in the RECS array, we pass RECS as the first
parameter. The number of items in the array is new_n_recs (as a result of the
purge loop just completed.) As we are sorting pointers, we pass the expression
sizeof(struct record *)
as the data item size (by using the sizeof operator instead of hard-wiring in
a constant value such as two for this parameter, we insure portability of this
code among different memory models or processor families.) For the final
parameter, we supply the name of a comparison function named compar().
The comparison function, as described above, must be designed to accept
parameters that are pointers to the data type being sorted. Since the data we
are sorting have type "pointer to structure", the parameters to the comparison
function must be of type "pointer to pointer to structure." For a return
value, the comparison function must return either zero (if the data items are
equivalent, using whatever arbitrary ordering rules we wish to apply), a
negative value (if the first item is "less than" the second by those same
rules), or a positive value (if the first item is "greater than" the second.)
The function header (line 242) reflects these specifications.
Notice that a prototype for the compar() function is given (line 214) so that
the identifier compar is recognized as a function having the proper
characteristics when fix_db() calls qsort() (line 230). Alternatively, the
entire definition of compar() could have been placed where the prototype is
now located, eliminating any need for the separate prototype. I prefer to list
functions in top-down order, however, so the prototype becommes necessary.
The body of the compar() function compares the element last (a string
representing the last name of the person described in the record) from both
records using the strcmp() standard library function. strcmp() is ideally
suited for use with qsort(), because its return value is defined exactly as
specified for the qsort() comparison function. The tricky part of compar() is
remembering to de-reference the "structure pointer pointer" parameters in
order to use them as simple "structure pointers" (line 245.) Note the need for
parentheses around the indirection sub-expression of each parameter to
strcmp(); without the parentheses, the selection operator (->) would take
precedence over the indirection operator (*) and draw a compiler error.


As For The File I/O...


Between this and last month's listings, the system is now complete except for
the read_db() and write_db() functions. Before the next installment, you may
wish to key in the system as presented so far (inserting a pair of dummy
functions in lieu of the real read_db() and write_db()) and play around a bit
with entering, deleting and editing records in order to get a feel for the
system's operation.
I'll devote future installments to showing some different approaches to
implement read_db() and write_db(). As a challenging exercise, see if you can
come up with a scheme for implementing these functions on your own. Would you
save the data on disk in the form of ASCII text or binary data? Which library
functions would you use in each case? What would be the advantages of ASCII
over binary, or vice versa?
For a smaller-scale exercise, see if you can develop a replacement for the
do_menu() function that has a slicker user interface. For example, you might
include the ability to specify commands by a highlighted letter within the
text of the command, as well as by a sequential number. Part of the menu
specification would then have to include information about which letter to
highlight perhaps by including a "magic" character in the menu item text that
flags the subsequent character as special. Or, you may choose to adapt a
windowing menu routine from some general C library function package.

Listing 1
1: /*
2: * MDBEDIT.C
3: *

4: * Program: Mini-Database
5: * Written by: Leor Zolman
6: * Module: Edit the Current Database
7: */
8:
9: #include <stdio.h>
10: #include <stdlib.h>
11: #include <string.h>
12: #include <ctype.h>
13:
14: #include "mdb.h"
15:
16: static void fix_db(void);
17:
18: #define LIST_RECS 1 /* Edit menu action codes */
19: #define NEXT 2
20: #define PREVIOUS 3
21: #define MODIFY 4
22: #define NEW 5
23: #define DELETE 6
24: #define UNDELETE 7
25: #define QUIT 8
26: #define SELECT 9
27: #define FIX 10
28:
29: static struct menu_item edit_menu[] =
30: {
31: {LIST_RECS, "List Records"},
32: {NEXT, "Go to Next Record"},
33: {PREVIOUS, "Go to Previous Record"},
34: {MODIFY, "Modify Current Record"},
35: {NEW, "Add New Record"},
36: {DELETE, "Delete Current Record"},
37: {UNDELETE, "Un-Delete Current Record"},
38: {SELECT, "Select Record (by Record Number)"},
39: {FIX, "Fix Up Database"},
40: {QUIT, "Return to main Menu"},
41: {NULL}
42: };
43:
44: /*
45: * Function: edit_db
46: * Purpose: Perform operations on current Database
47: * Parameter: Database Name
48: * Return Value: None
49: */
50:
51: void edit_db(char *db_name)
52: {
53: int cur_rec = 0; /* current record number */
54: struct record *rp; /* ptr to current record */
55: char buf[150];
56: int i;
57:
58: while (1)
59: {
60: rp = RECS[cur_rec];
61: printf("\nDatabase: %s\n", db_name);
62: if (n_recs)

63: {
64: printf("Current Record is #%d", cur_rec);
65: if (!rp->active)
66: printf(" (Deleted)");
67: printf(":\n");
68:
69: printf("Name: %s %s\n", rp->first, rp->last);
70: printf("ID#: %ld\n", rp->id);
71: printf("Age: %d\n", rp->age);
72: printf("Gender: %c\n", rp->gender);
73: printf("Salary: $%.2f\n", rp->salary);
74: }
75:
76: switch(do_menu(edit_menu, "Edit Menu"))
77: {
78: case LIST_RECS:
79: for (i = 0; i < n_recs; i++)
80: printf("%4d: %s%s\n", i, RECS[i]->last,
81: RECS[i]->active ? "" : " (Deleted)");
82: printf("Press RETURN to continue:");
83: gets(buf);
84: break;
85:
86: case NEXT: /* find next active record: */
87: for (i = cur_rec + 1; i < n_recs; i++)
88: if (RECS[i]->active) /* skip inactives */
89: break;
90: if (i == n_recs) /* over the top? */
91: {
92: printf("\aAt end of file.\n");
93: break;
94: }
95: cur_rec = i;
96: break;
97:
98: case PREVIOUS: /* find previous active record: */
99: for (i = cur_rec - 1; i >= 0; i--)
100: if (RECS[i]->active) /* skip inactives */
101: break;
102: if (i < 0) /* "under the bottom"? */
103: {
104: printf("\aAt beginning of file.\n");
105: break;
106: }
107: cur_rec = i;
108: break;
109:
110: case NEW:
111: if (n_recs+1 > max_recs)
112: {
113: printf("Maximum # of records ");
114: printf("(%d) reached.\n", max_recs);
115: break;
116: }
117:
118: if ((rp = alloc_rec()) == NULL)
119: {
120: printf("Out of memory. Try Fixing ");
121: printf("Database first...\n");

122: break;
123: }
124: /* make new record current: */
125: cur_rec = n_recs++;
126: RECS[cur_rec] = rp;
127: rp->active = TRUE;
128:
129: strcpy(rp->last,""); /* initialize the record */
130: strcpy(rp->first,"");
131: rp->id = 0;
132: rp->age = 0;
133: rp->gender = ' ';
134: rp->salary = 0.; /* fall through to MODIFY */
135:
136: case MODIFY:
137: printf("Last Name [%s]: ", rp->last);
138: if (strlen(gets(buf)) > 0)
139: strcpy(rp->last, buf);
140:
141: printf("First Name [%s]: ", rp->first);
142: if (strlen(gets(buf)) > 0)
143: strcpy(rp->first, buf);
144:
145: printf("ID# [%ld]: ", rp->id);
146: if (strlen(gets(buf)) > 0)
147: rp->id = atol(buf);
148:
149: printf("Age [%d] ", rp->age);
150: if (strlen(gets(buf)) > 0)
151: rp->age = atoi(buf);
152:
153: printf("Gender [%c]: ", rp->gender);
154: if (strlen(gets(buf)) > 0)
155: rp->gender = toupper(*buf);
156:
157: printf("Salary [%.2f]: ", rp->salary);
158: if (strlen(gets(buf)) > 0)
159: rp->salary = (float) atof(buf);
160: break;
161:
162: case DELETE:
163: if (!rp->active)
164: {
165: printf("Record is already deleted.\n");
166: break;
167: }
168: printf("Press 'y' to delete record,\n");
169: printf("anything else to abort: ");
170: gets(buf);
171: if (tolower(*buf) == 'y')
172: rp->active = FALSE;
173: break;
174:
175: case UNDELETE:
176: if (rp->active)
177: {
178: printf("Record is not deleted.\n");
179: break;
180: }

181: rp->active = TRUE;
182: printf("Record restored.\n");
183: break;
184:
185: case SELECT:
186: printf("Enter new record number: ");
187: i = atoi(gets(buf));
188: if (i < 0 i > n_recs)
189: {
190: printf("Record # out of range.\n");
191: break;
192: }
193: cur_rec = i;
194: break;
195:
196: case FIX:
197: fix_db(); /* clean up database */
198: break;
199:
200: case QUIT:
201: return;
202: }
203: }
204: }
205:
206: /*
207: * Function: fix_db
208: * Purpose: Purge deleted records, sort Database
209: * Parameters: None
210: * Return Value: None
211: */
212:
213: static int compar( struct record **p1,
214: struct record **p2);
215:
216: static void fix_db() /* File Fix module */
217: {
218: int i, new_n_recs;
219:
220: for (i = 0, new_n_recs = 0; i < n_recs; i++)
221: {
222: RECS[new_n_recs] = RECS[i];
223: if (RECS[i]->active)
224: new_n_recs++;
225: else
226: free(RECS[i]);
227: }
228:
229: n_recs = new_n_recs;
230: qsort(RECS, new_n_recs,
231: sizeof(struct record *), compar);
232: }
233:
234: /*
235: * Function: compar
236: * Purpose: Comparison function for qsort(),
237: * sorting simply on last name
238: * Parameters: Two pointers to record pointers
239: * Return Value: As per strcmp()

240: */
241:
242: static int compar(struct record **p1,
243: struct record **p2)
244: {
245: return strcmp((*p1)->last, (*p2)->last);
246: }
























































CUG Library


An Adaptable Disk Utility




Alex Cameron


Alex Cameron is currently an engineering Project Manager with a large
Australian Telecommunications company. Like many others, he caught the 'C' bug
from Ron Cain's small C compiler, and enjoys programming for embedded control
systems. He is currently working on a hardware-assisted Disk Utility for the
IBM PC.




Format Adaptable Disk Utility


For many programmers, the PC was not their first machine. Many of us started
with CP/M and Apple IIs and had to find some way of porting that old data to
the PC. There are two options:
If you still have the old computer, no problem, just load compatible file
transfer programs on both machines and connect an RS-232 link between them.
If your old computer's disk format is supported by programs such as PC-Alien
and Xenocopy, you can use them. But, if the transfer utilities don't support
your format and you don't have access to your original computer, you're high
and dry. My solution: Write my own truly versatile and adaptable disk utility.
ADU (Adaptable Disk Utility) can scan a disk, building a disk parameter table.
Once the table is built, you can adjust any of the disk operating system
dependencies, such as system track offsets, and so on, before reading and
editing. The user interface includes a full screen editor so that patching and
editing the disk is a simple task.


Adjusting The PC's Disk Format Tables


The current standard PC floppy disk format consists of 40 tracks, each
containing nine sectors of 512 bytes per sector. Vary this in any way and the
PC chokes.
Each time the BIOS is called to perform a disk function, through INT 13H, it
calls the BIOS routine GET_PARM which indexes into a table called DISK_BASE.
DISK_BASE is a collection of floppy disk controller (FDC) parameters which
tells the BIOS in what manner the FDC chip must be programmed in order to
correctly read the disk (see Figure 1). Forcing C pointers to address specific
locations is complicated by the PC's 8088/8086 memory segmentation and the
high/low byte reversal, but it is doable.
Some of the parameters relating to the GAP lengths must be determined
empirically and are important when attempting to read multiple sectors during
a diskcopy or formatting process. These GAPs are basically buffers designed to
allow the controller's data separator to "turn off" during splices or periods
of unknown data. If the Phase Locked Loop (PLL) within the data separator is
not instructed to "turn off", it may get out of synchronisation and so not be
able to lock onto the identification (ID) marks. Thus, an incorrect GAP
setting can "turn off" the data separator during an area of good data. Errors
of this type normally manifest themselves as "Time out errors" (BIOS return
code 80H). The BIOS's disk status area is maintained at address 0040:0041
(Figure 2).


Determining The Disk Format


ADU finds the correct disk parameters by trial and error. If a call to BIOS
function 13H returns an error, it adjusts a parameter and retries. For
example, ADU continually varies the bytes per sector code from 00 to 05 on,
say, track 0, sector 1 until no error is returned. Once the number of bytes
per sector is determined, ADU determines the total number of sectors by
incrementing the sector number and perform a seek/verify until an error
occurs. Next it looks for a match on side one to determine the number of
sides. Finally it steps off the number of tracks.
The number of sectors per block is user-definable, as are all the other
parameters (default is eight sectors per block). The number of sectors per
block is only important during the copying of CP/M type disks. As with other
parameters, if you don't want to work in blocks then just set the sectors per
block to one. Table 1 lists all the commands available with ADU.
An Adaptable Disk Utility (ADU) is available through the CUG Library as CUG
volume #307. See the order form in the center of the magazine.
Figure 1 The set of parameters contained in DISK_BASE are used by the BIOS for
normal diskette operations. This paramater block is pointed to by the BIOS
variable DISK_POINTER located at 0000:0078, which is equivalent to the address
of interrupt 1EH. This pointer value can be redirected to point to a new set
of parameters thus allowing the PC to access non DOS type formats.
DISK_BASE:
 DB 11001111B ;SRT=C, HD UNLOAD=F
 DB 2 ;HEAD LOAD=1,MODE=DMA
 DB 37 ;WAIT AFTER OPERATION UNTIL
 ;MOTOR TURNS OFF
 DB 2 ;CODE FOR 512 BYTES/SECTOR
 ;1=256,2=512,3=1024 ETC
 DB 9 ;LAST SECTOR ON TRACK
 DB 2AH ;GPL IS GAP3 LENGTH FOR DATA
 ;READ AND DETERMINES HOW LONG
 ;VCO WILL 'IDLE'. MUST BE
 ;ADJUSTED.
 DB OFFH ;DATA LENGTH, LEAVE AT OFFH
 DB 50H ;GAP3 LENGTH FOR FORMAT COMMAND
 ;MUST BE ADJUSTED FOR SOME
 ;DISK FORMATS.
 DB OF6H ;FORMAT FILLER BYTE. CAN BE

 ;ALTERED IF REQUIRED.
 DB 25 ;HEAD SETTLING TIME (MILLI SECONDS)
 DB 4 ;MOTOR START TIME (1/8 SECONDS)
Figure 2 The PC's BIOS maintains a set of status bytes in the PC's low memory
area. Data segment 0040 is used to store these and other variables.
SEGMENT 0040:OFFSET

OFFSET LABEL DESCRIPTION

003E SEEK_STATUS DRIVE RECALIBRATION STATUS
 BITS 3-0 = DRIVE 3-0. DRIVE
 NEEDS RECAL IF BIT = 0

003F MOTOR_STATUS MOTOR STATUS
 BIT 3-0 = DRIVE 3-0 AND
 INDICATES DRIVE IS
 CURRENTLY RUNNING

0040 MOTOR_WAIT COUNTER BEFORE DRIVE
 TIME OUT. JAMMING A
 LOWER VALUE THAN 37 WILL
 REDUCE THE NORMAL WAIT TIME.

0041 DISKETTE_STATUS BIOS RETURNS STATUS BYTE HERE
 TIME_OUT 80H
 BAD_SEEK 40H
 BAD_CONTROLLER 20H
 BAD_CRC 10H
 DMA_BOUNDARY 09H
 BAD_DMA 08H
 SECTOR_NOT_FND 04H
 WRITE_PROTECT 03H
 BAD_ADD_MARK 02H
 BAD_CMD 01H

0042 CONTROLLER_STAT AN AREA OF 7 BYTES IS RESERVED
 FOR BIOS TO FILL WITH THE
 CONTROLLER STATUS BYTES. SEE
 THE 18272 DATA SHEET FOR THE
 VALUES RETURNED.
Table 1
F1 will give you help.
F3 will re-issue the command without CR.
F4 will re-issue the command with CR.

All commands are either single, or double letter. Capital
letters in the table below indicate the number of significant
letters in the command. Commands may be separated by a semicolon

e.g. >>+s;r;e will increment the sector, read it in, then go to
edit mode. To assist searching for particular text issue the
command >>+s;r;d. Pressing F4 will interrupt the dump and
increment the sector.

The following commands are allowed:


Dump: Hex dump the buffer to the screen.
Edit: Edit the current sector. ESC to quit.
Read [T S H]: Read in sectors worth of data.

Write [T S H]: Write out sectors worth of data.
+/-Sector: Increment/decrement sector
+/-Track: Increment/decretment track
Sb/St/Ss/ N: Set Block,Track or Sector.
Sh Sets current Head.
Vb/Vt/Vz/Ve/Vh/Vo N: Vary configuration parameters
FS/SX [n1,n2..]: Vary both sector translate and fmt skew

Log [Drive]: Log in a drive.
Copy N:fn Blocks: Copy Alien Blocks to DOS Disk in
 Drive N:. Blocks can be entered as
 b1,b3,b5,b8/b22 for example. b8/b22
 represents contiguous blocks 8/22.
DC Disk copy - Make a copy of the Alien
 Disk from Drive B to Drive A.
Quit: Quit - restoring PC DOS environment.
?: Re-display main menu.

EXAMPLES

>>C c:junk.dat 2/32 ; copies blocks 2 to 32 from alien disk
 ; into file junk.dat on drive C:

>>R 32,2,0;E ; reads data into buffer from track 32,
 ; sector 2, side 0, and then enters edit mode
>>W ; ...after editing W will write back to disk
 ; using the current Track,sector and side.
>>+T;R;D ; Increment Track, read data into buffer and
 ; then dump to screen.

As a final example, suppose you want, for some reason, to copy only
sector 1 from each track on side 0 to a file on say drive C, for analysis,
then the following commands will allow you to do it: (note they could all be
on one line!)

>>Vb 1 ; set sectors per block to 1
>>Ve 1 ; set disk to 1 sector per track
>>Vh 1 ; set disk to single sided ie one head
>>Vo 0 ; set first block offset to track 0
>>C C:sects. 1 0/39 ; copy blocks 0 to 39 to file sects.1
 ; on drive C:

>>Vz 1024 ; vary sector size, useful for looking
 ; at strange disks which are for example
 ; copy protected.



















Standard C


Wha Gang Agley, Part II




P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standards committee.




Introduction


Last month, I vented a number of gripes about the ANSI Standard for C. (See
"Wha Gang Agley," CUJ April 1990.) I discussed several aspects of the C
language that I feel did not get properly cleaned up. I identified a few areas
where committee X3J11 actually broke parts of the language in small ways. And
I listed several additions that might have made C a better language to use.
This column extends that discussion to the Standard C library. My previous
remarks dealt only with the language proper, including preprocessing. Each of
the two is sufficiently large as to warrant separate treatment. Besides, we
tend to take a different attitude toward the library than to the language
itself. It is psychologically easier, for example, to add functions to the
library than to add features to the language. That leads to a different mix of
shortcomings in the two areas.


<signal.h>


One of the biggest messes we inherited involved the handling of signals. The
two functions signal and kill came from the UNIX system interface. You call
signal to specify the handling of certain exceptional conditions, such as
overflow or the striking of an attention key. You call kill to report a signal
and stimulate whatever form of handling was earlier specified.
Both names are misnomers. The function signal does specify the handling of
software generated "signals." It also deals with hardware traps, caused by
exceptional conditions that arise when your program executes. And it deals
with asynchronous events, such as attention key strikes. Similarly, kill was
originally intended primarily to report the software "kill" signal. It was
generalized almost from the start to report all signals, however.
The names of many signals came straight out of the PDP-11 hardware reference
manual. SIGSEGV refers to a trap caused by the memory management hardware,
which centers around sets of "segmentation registers" on the PDP-11. And
SIGFPE refers to a floating point exception as reported by the optional PDP-11
floating point processor.
The committee did clean up some of the naming. The function kill eventually
became raise, in the process of losing some of its peculiar UNIXisms. The
signals got generalized and some of the most PDP-11 specific ones got dropped.
Each implementation can now contrive a more or less sensible mapping from its
hardware to the Standard C signals.
That's only the cosmetic layer, however. What's really difficult about signals
is that they introduce the concept of multiple threads of execution within a
single program. Nothing else in Standard C requires such semantics. True, the
type qualifter volatile is a foot in the door, but it doesn't require more
than the notion that "other agencies" may be at work between certain sequence
points.
Leaving signals in Standard C exposes the language to two dirty truths. The
first is that the semantics of signals weren't very good from the start. Under
UNIX, they are a profoundly unsafe mechanism for synchronizing separate
activities. Signals are too easily reordered or lost. Generalizing signals to
all operating systems only makes them weaker. Next to nothing is promised
about what you can count on if your program tries to handle signals.
The second dirty truth is that it's next to impossible to write a portable
signal handler. The standard does include a type definition for sig_atomic_t.
An integer data object of this type is supposedly read or written in one
atomic operation. That should make such a data object a safe candidate for
holding a flag that is set and tested by separate threads of execution.
Presumably, you can write a signal handler that merely alters the value stored
in such a safe data object and returns. I wouldn't want to put all those
presumptions to the test in a serious program.
I wanted the signal handling functions omitted from Standard C from the
outset. Too many other people felt we needed the machinery, bad as it was. We
spent a lot of committee time trying to pin down semantics that were at once
usable and good standards language. I think we failed.


<errno.h>


Another spongy area of the C library concerns how various functions report
errors. In many cases, functions reports exceptional conditions "in channel."
That is, a function signals the occurrence of something out of the ordinary by
returning a strange value. The value is sufficiently strange that it is
unlikely to be confused with a "normal" result. A null pointer is a good
example of a strange value. The macro EOF is another, since it is guaranteed
to be negative and hence easily distinguishable from a valid character code.
But not all errors are reported this way. Sometimes there are too many
possible error codes to report all of them in channel. Sometimes there is
strong historical precedent for reporting errors a different way. In either
case, a traditional channel for reporting back errors was by writing an error
code in a data object named errno.
As I recall, errno originated with the UNIX system interface. You make a
system call by calling one of fifty-odd different functions in the C library.
If the system call fails, the function returns a generic failure code as the
value of the function. The actual error code returned by the system is tucked
away in errno should you wish to learn more details about the nature of the
failure. Now, that's not my idea of a perfect convention by any means, but at
least it's consistent and easily understood.
The same machinery later got commandeered for additional purposes, however.
When the stream I/O library was added atop the UNIX system calls, it
essentially passed any system call failures through via the errno channel. To
find the detailed cause of a stream operation failure, you had to first store
a zero in errno then call the stream function. A failure return told you to
inspect the value stored in errno. It should now be nonzero and more or less
indicative of what went wrong. The more different system calls that arise out
of a single stream operation, the less you can be sure exactly what the error
code means.
Then other functions took to storing nonzero values in errno. That meant you
could never be sure whether a particular code was stored by a system call or
by some agency acting entirely within the library. It also opened up the set
of potential codes you had to check for. This was more than just the
underlying operating system speaking to you.
Probably the worst addition to this collection of agents was the set of math
functions. It seemed only natural to report domain and range errors by adding
EDOM and ERANGE codes to the existing errno machinery. But once faster
floating point hardware came along, the machinery began to get in the way.
Numeric coprocessors like the Intel 80X87 chips and the Motorola 68881 don't
coexist well with errno. They want to handle exceptions with in-channel code
values such as plus infinity or some kind of NAN (for "not a number"). Or they
want to set some on-chip error indicators for later inspection.
To meet the reporting requirements implied by errno, compilers have to
generate suboptimal code. You have to keep checking for errors in-channel or
on-chip and mapping them into code values stored in errno. That makes
numerical analysts shun C for FORTRAN, for example. Or it encourages the
proliferation of nonstandard libraries and code generation options. It does
not encourage the wider adoption of Standard C by programmers.
The committee did tidy things up a bit. They decreed that errno is an lvalue
macro, not the name of a data object within the library. That gives
implementations more latitude in how errno gets handled under the hood. An
implementation can even call a function every time errno is accessed, to
update its location or status at the last possible moment. The committee also
better clarified just when errno can get values stored in it. And they
clarified that no library function can ever store a zero in errno.
Those are small improvements. All the basic shortcomings of the machinery are
still there. Please note that the committee wrestled with these issues over
many meetings. A number of us have long been unhappy with leaving errno in the
library. Sadly, we could never contrive an alternative that won the support of
the committee.


<setjmp.h>


Another small area of leftover dirt involves the functions setjmp and longjmp.
This pair provides C's answer to the nonlocal goto you can find in languages
as diverse as PL/I and Pascal. You call setjmp to memorize a calling
environment in a data object of type jmp_buf. A later call to longjmp
specifying the same data object whangs the calling environment back to its
earlier state. Suddenly, your program finds itself returning from setjmp much
as it did on the original call. Any intervening calls to functions that have
not yet returned are simply forgotten.
You need machinery like this sometimes. It lets you unwind from an arbitrary
situation when a nasty error occurs. Thus, it's just what you need to build up
exception handling machinery like you find in Ada. It's also what you need to
translate those nonlocal gotos of PL/I and Pascal into C.
The only trouble is, the machinery is not well integrated into the language.
By pushing the problem out to the library, C limits the ability of translators
to do the job completely right. The major screw up occurs in the handling of
data objects stored in registers. Many implementations can only whang all
registers back to their state at the time of the original setjmp call. That
can be fine for temporaries put in registers by the translator. It is less
fine for data objects that the programmer knows something about.
You'd think that the standard could lay down a simple rule. Data objects
declared with the register storage class keyword get reset when your program
calls longjmp. All others remain unchanged. Only problem is, not all requests
to put data objects in registers need be honored. And good translators know to
promote some heavily used data objects into fast registers. There is no simple
rule you can state that doesn't cause some sort of trouble.

One kind of trouble could cause translators to avoid many juicy optimizations,
on the off chance that setjmp may get called. Another kind permits more
optimizations, but practically requires that all translators know that setjmp
and longjmp have magic properties. Even then, C could lose its long-standing
ability to be translated in a single pass. There is no satisfactory solution.
So the compromise was to leave the setjmp machinery dirty. The standard
contains additional and clearer warnings about the dangers present. You should
call setjmp only from particularly simple expressions. You should declare auto
data objects volatile if you want them not to get whanged. You should confine
the machinery to fairly small function bodies to isolate the uncertainties.
That's not a good answer from a linguistic standpoint. It's the one we got,
however.


<stdarg.h>


The macros that let you walk varying length argument lists have similar
shortcomings. Once again, the committee had to deal with a mechanism that
probably belongs in the language proper. And once again, they opted to tidy up
existing practice a bit rather than take on a complete fix.
In the earliest days of C, this was a nonproblem. Everyone knew how arguments
were laid out on the stack on a PDP-11. With a bit of clever pointer
manipulation, you could easily walk from argument to argument. It was only
when diverse implementations proliferated that problems became apparent. You'd
be surprised at the number of different ways C implementations use to pass
arguments on a function call.
One of the first relatively clean fixes to the problem came out of Berkeley.
They developed a set of macros, in the header <varargs.h>, that encapsulated
the various ways for walking argument lists. Committee X3J11 used these as a
starting point. Because we changed them along the way, we decided to introduce
a new header name, <stdarg.h>, to hold the revised macros.
Nevertheless, we stuck with macros as a way of hiding the underlying
machinery. That constrains the language in a number of ways. For example, some
implementations must be told the name of the rightmost argument that is always
required. That eliminates the possibility of defining a function with zero or
more required arguments. A fairly small number of implementations need to
execute the va_close macro before the function can safely return. That
requires all portable programs to include va_close calls if they use these
macros.
Perhaps the worst aspect of this machinery is that it barely works. There's
all sorts of funny qualifiers in the standard to ensure that you use them only
in ways that have been known to work. It's not good language design, which
results in not good standards language.


Allocating Empty Objects


One place where we lost ground was in the memory allocation functions. It is a
pet peeve of mine that malloc(0) once worked fine, but now is labeled as
undefined behavior. We ended up in this sorry state as a compromise between
two camps with conflicting views. It is a bad compromise, however, because
everybody lost.
One camp believes that a zero-sized object is patent nonsense. You certainly
cannot declare one statically, not even an array of zero elements. (Don't
confuse this with a declaration such as char a[], which is an incomplete type
that you later complete.) So a call such as malloc(0) is probably a
programming error. The best thing to do is diagnose it, or at least return a
null pointer to signal failure.
The other camp believes that arrays with zero elements occur dynamically all
the time. Even if you can't declare one statically, you should be able to ask
malloc to make you one. If you do, you want to get a non-null pointer back.
That tells your program that the runtime has not run out of heap space, which
is the normal meaning of a null pointer return from malloc. Of course, you
can't access any storage using that pointer, but your program won't even try.
It will process all zero elements of the array in a while loop and go on to
other business.
Both camps have defensible arguments. I argued heatedly for the latter,
however, for several reasons. It was certainly the status quo in both the UNIX
and Whitesmiths C libraries. It lets you write more elegant programs. And it
fits (my idea of) the spirit of C, by letting you do something that might be
useful without complaint.
Whatever, neither side prevailed in the end. The committee simply got tired of
hearing arguments on the subject. The final vote was to make malloc(0)
undefined behavior. Now a programmer can't depend on any kind of useful
behavior. Grumf.


Conclusion


I could also complain about the irregularities that remain in the library. Too
often, the naming conventions are inconsistent. More often than I'd like,
functionality of similar functions differs in surprising ways. I see that
situation, however, as inevitable in the evolution of any nontrivial set of
functions. It's really hard to fix without breaking a lot of existing code.
More serious is the prevalent practice of using static storage within the C
library. Some functions promise to remember information between calls. A
notorious example is strtok, which helps you parse a string into a sequence of
tokens. Others return pointers to internal buffers that hold large data
objects. The function localtime and its brothers are typical examples. In
either case, the library is messier than it should be. It is harder to
implement, particularly in a shared environment. It is harder to use, because
the behavior of a function changes with the history of function calls.
Once again, however, existing code made it difficult for the committee to
eliminate this practice. That's forgivable. We also added a few functions that
commit similar sins. The multibyte function mbtowc is an example. That's less
forgivable. I confess to being a party to some of these additions. That
doesn't mean that I like the outcome, however.
Finally, there is the thorny topic of functions that didn't get added.
Everyone has a pet list of functions that would really make the Standard C
library much better. Were we to add them all, the library would be enormous.
Some would argue that it is already much too large. All names in the library
are essentially reserved. You can write a function called asin and include it
in any file of your program. If you do so, however, you must give it internal
linkage by writing the static storage class keyword. And you had better not
include the standard header <math.h>. Sure, you can break these rules safely
on some implementations of Standard C. But you never know when you might want
to move the code to another system. Its Standard C translator might very well
cough.
So if you want to be meticulous about portability, you have a lot of names to
keep track of. The library defines hundreds of names. You can see why adding
another function to the language is not as "free" as you might think. It is
not entirely true that what you don't use you can safely ignore.
Having said all that, I still wish we had added a couple of functions that
didn't make it. One is a standard facility for parsing flags on a command
line. Whitesmiths's C library called this function getflags. (Some other
libraries have a roughly similar function.) Every utility we wrote used it. It
went a long way toward standardizing how you write flags and how a program
gobbles them up. It even provided a standard "help" mechanism to remind you
what flags a program was ready to accept.
A brother to getflags was getfiles. It helped a program loop over a list of
filename arguments on the command line, so the program could do its thing with
each. It also enforced the common convention that the program should process
standard input in the absence of any filename arguments. And it let you
uniformly write the filename argument "-" whenever you wanted to process
standard input as part of a list of files. Many utilities got simpler and more
uniform by writing main in terms of getflags and getfiles.
Alas, we had to stop somewhere. The committee stopped well short of
entertaining such functions as these. In general, I think we stopped none too
soon. We included plenty of items from many different wish lists. So my final
complaint is a weak one. And it's time for me to stop complaining, at least
for awhile.
ANSI Update
Committee X3J11 held a two-day meeting 5-6 March '90 in New York City. This
was the first meeting since the adoption by ANSI of the C standard. Thus, the
principal business was to respond to the handful of requests for
interpretation that have been sent to ANSI over the past year.
One item was deferred for further study, but the remainder were answered. In
many cases, part of the answer was simply, "Sorry, you're suggesting a change
in the standard, and that's no longer possible until the next revision." As
much as possible, the committee cited chapter and verse within the standard to
support their interpretation.
The committee also discussed about a dozen less formal queries. These were not
registered with ANSI and hence did not require formal replies. (In fact, by
ANSI rules a formal reply was not allowed.)
In the process of interpreting the standard, the committee unearthed several
places where the wording did not exactly capture the remembered sentiment of
earlier votes. None of these are major flaws, but they were surprises to some.
A number of committee members expressed regret that there is no mechanism for
making small changes to the standard.
Nevertheless, the document is remarkably clean. ANSI complimented the
committee on the relative absence of blemishes. Requests for interpretation by
their very nature home in on the places where wording is vague or weak. If the
initial round of queries is any indication, the C standard has few such
places.





















Dr. C's Pointers(R)


Pointers To Arrays




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091
or via UUCP at uunet!aussie!rex.


The address of an array is the same as the address of the array's first
element, right? Well, yes, I guess so, but a pointer to an array is definitely
not the same as a pointer to the array's first element.
Pointers to arrays have long been supported by C; it's just that they are not
often used, at least not directly. Pointers to arrays do, however, come into
play when using multi-dimensional arrays, but usually so covertly that their
role generally goes unnoticed. In my experience most seasoned C programmers
either have no idea what pointers to arrays are or deny ever using them.
Nonetheless, pointers to arrays are part of the language and they can be
useful.


Getting Started


A pointer to an array is tricky to declare because you need grouping
parentheses in the declaration, something rarely seen and then usually only in
pointers to functions. (Of course, redundant grouping parentheses can exist in
any declaration, but in this case they are not redundant.)
In Listing 1 p2 is a pointer to an array of three chars while p3 is a pointer
to an array of six chars. Without the grouping parentheses both declarations
would produce an array of pointers, something quite different. As you should
expect, the type of the expression *p2 is array of 3 char, hence the result 3.
Similarly for *p3 and 6.
Given an array A, the address of that array is the same memory location as the
address of A[0]. That is, a pointer to array A points to the same location as
does a pointer to A[0]. How are these pointers different? When you perform
arithmetic operations on a pointer, the integer offset is scaled by the size
of the underlying object. Therefore, the expressions
p1 + 1
p2 + 1
p3 + 1
are quite different. p1 + 1 points to the char one beyond p1.p2 + 1 points to
the array of three char, one object (that is, three char) beyond p2. And of
course p3 + 1 points to the array of six char, six chars beyond p3. Similarly,
the following expressions all have different type:
*p1 /* char */
*p2 /* char [3] */
*p3 /* char [6] */
And since subscripting can be written in terms of indirection and integer
arithmetic, the expressions p1[i], p2[i], and p3[i] are also quite different.


Naturally Occurring Array Pointers


I stated above that pointers to arrays come in to play when multidimensional
arrays are used. In the example of Listing 2 a two-dimensional array is passed
to a function f.
Now we know that arrays in C are always passed by address but what is the type
of the expression a in f(a)? When pressed for an answer most people guess that
f(a) is equivalent to f(&a[0][0]) so the type must be long *. That is, in
fact, wrong. To understand the error, we need to look at how multi-dimensional
arrays are referenced.
The expression a[i][j] can be written as (a[i])[j] since the [] operator
associates left to right. Specifically, a can only be subscripted to one
level. The result of that subscript, however, can also be subscripted to one
level. We do not directly subscript a to two levels as it may appear on the
surface. To be able to legitimately use the second subscript, the type of a[i]
must be a pointer type since only pointer type expressions can be subscripted.
Let's consider the type long *. Subscripting this to one level gives the type
long which cannot be subscripted further. Considering the type long **, this
can indeed be subscripted to two levels, however, what about the scaling
factor for each row?
The answer is, the type of a in f(a) is long (*)[5]. That is, the expression
passed in by value to f is a pointer to an array of five long ints. In fact,
it is a pointer to the first vector of five longs (the first row) in the
multi-dimensional array. (As we know, multi-dimensional arrays in C are stored
in row-major order.)
Returning then to the notion that an expression designating an array is
converted to the address of its first element, this does hold true in this
case. The problem is, however, that a is not a two-dimensional array; it's a
one-dimensional array whose elements are vectors. (I admit that this sounds
like hair splitting but I find it a useful concept when dealing which such
expressions.) f(a) then, is actually equivalent to f(&a[0]). Since a[0] has
type array of 5 longs, &a[0] has type pointer to array of 5 longs.
In Listing 2, pointers to arrays are never overtly declared although one is
implied by the definition of function f. The argument list could have been
written in either of the following two forms:
void f(long a[][5]) {...}
void f(long (*a)[5]) {...}
When used in the context of formal parameters, these declarations are
equivalent.


Subscripting Array Pointers


In the expression pd = d + 1 (Listing 3) the subexpression d is converted to
the address of the first element of the array, that is, to &d[0]. And since d
is a two-dimensional array, the type is a pointer to an array of two double.
As a result, pd points to the second row of the array d. By subscripting pd to
two levels, we can access d as though its row numbers were -1, 0, and 1 as
shown.


Array Pointers And Casts



Just as you can cast a pointer to int to a pointer to char, you can cast a
pointer to one size array to a pointer to another size array. For example:
main ()
{
char *pc1 = "abcdefghi"; /* OK */
/*4*/ char (*pc2a)[3] = pc1; /* error */
char (*pc2b)[3] = (char (*)[3])pc1; /* OK */
/*6*/ char (*pc3a)[6] = "abcdefghi"; /* error */
char (*pc3b)[6] = (char (*)[6])"abcdefghi"; /* OK */
}
In strict ANSI mode, lines four and six should be diagnosed since the two
pointer types are not assignment compatible. Note the parentheses around the *
in the casts. These are necessary, for without them the cast type would be an
array of pointers and you cannot cast anything into an array.


Returning An Array Pointer


Due to the symmetry of C's typing mechanism, you can return pretty much any
type object that you can declare. (The only exceptions are arrays and
functions.) As such, it is possible to return a pointer to an array. (See
Listing 4.)
The expression (*f())[0][1] looks rather strange, but on close inspection is
quite sensible. Function f returns a pointer to an array of three pointers to
char. Therefore, *f() is the array of three pointers. (*f())[i] references the
ith element of the array of three pointers to char and (*f())[i][j] references
the jth char offset from the ith pointer.
Prior to ANSI C, statements of the form return &ap; were typically accepted by
compilers, but the ampersand was regarded as superfluous. In fact, quite a few
compilers produced a message saying so. However, ANSI C, says that &array is
not the same as &array[0]. Specifically, an lvalue expression that designates
an array is converted to a pointer to the first element in all cases except
when preceded by the unary & operator, when used as the operand of sizeof, and
when the expression is a string literal being used in the initializer of an
array.


A Real Need For Array Pointers


The following example shows a situation where you must use pointers to arrays.
It involves the dynamic allocation of a multidimensional array.
#include <stdlib.h>

main()
{
int (*p)[3][4];

p = malloc(2 * 3 * 4 *
sizeof(int));
p[0][0][0] = 1;
p[1][2][3] = 1;
}
An array of type int [2][3][4] is allocated and the resultant pointer type is
a pointer to an array 3 of array 4 of ints. To use the space for an array of
type int [4][3][2], the pointer must be declared as int (*p)[3][2].

Listing 1
#include <stdio.h>

main()
{
char *p1;
char (*p2)[3];
char (*p3)[6];

printf("sizeof(*p1) = %lu\n",
(unsigned long)sizeof(*p1));
printf("sizeof(*p2) = %lu\n",
(unsigned long)sizeof(*p2));
printf("sizeof(*p3) = %lu\n",
(unsigned long)sizeof(*p3));
}

sizeof(*p1) = 1
sizeof(*p2) = 3
sizeof(*p3) = 6



Listing 2
#include <stdio.h>

void f(long [][5]);

main()
{
static long a[3][5] = {
{ 1, 2, 3, 4, 5},
{ 6, 7, 8, 9,10},
{11,12,13,14,15}
};

f(a);
}

void f(long a[][5])
{
printf("a[0][2] = %2d\n", a[0][2]);
printf("a[1][0] = %2d\n", a[1][0]);
printf("a[2][4] = %2d\n", a[2][4]);

printf("sizeof(long) = %lu\n",
(unsigned long)sizeof(long));
printf("sizeof(long *) = %lu\n",
(unsigned long)sizeof(long *));
printf("sizeof(a) = %lu\n",
(unsigned long)sizeof(a));
printf("sizeof(*a) = %lu\n",
(unsigned long)sizeof(*a));
}

a[0][2] = 3
a[1][0] = 6
a[2][4] = 15
sizeof(long) = 4
sizeof(long *) = 2
sizeof(a) = 2
sizeof(*a) = 20


Listing 3
#include <stdio.h>

main()
{
static double d[3][2] = {
{1.2, 2.3},
{3.4, 4.5},
{5.6, 6.7}
};
double (*pd)[2];

pd = d + 1;
printf("pd[-1][0] = %3.1f\n", pd[-1][0]);
printf("pd[ 0][0] = %3.1f\n", pd[0][0]);
printf("pd[ 1][0] = %3.1f\n", pd[1][0]);
}


pd[-1][0] = 1.2
pd[ 0][0] = 3.4
pd[ 1][0] = 5.6


Listing 4
#include <stdio.h>

char *(*f(void))[3];

main()
{
printf("f()[0][1] = %c\n", (*f())[0][1]);
printf("f()[1][1] = %c\n", (*f())[1][1]);
printf("f()[2][1] = %c\n", (*f())[2][1]);
}

char *(*f(void))[3]
{
static char *ap[] = {
"red",
"blue",
"green"
};

return &ap; /* no conversion */
}

f()[0][1] = e
f()[1][1] = l
f()[2][1] = r































Applying C++


Building A Text Editor, Part3: Back To The Editor




Tsvi Bar-David


Tsvi Bar-David is president of Deerworks, a C++ and Object-Oriented design
training company, and currently a faculty member in the Software Engineering
Department at Monmouth College. He received his PhD in mathematics from the
University of California at Berkeley. Previously, he was employed at Bell Labs
in the development and delivery of UNIX, C++, and Object-Oriented courses. He
can be contacted at 411 Valentine St., Highland Park, NJ 08904 (201) 745-7458.


In the last column, I designed and implemented the Buffer, which, together
with File, is one of the main helping types underlying the Editor. In this
column, I'll complete the construction of the Editor type. I use the word
"complete" advisedly, because there are always future enhancements to think
about, and certainly bugs to fix!


Editor -- High-Level Architecture


To recap, an Editor is in one-to-one correspondence with a Buffer object, and
coordinates changes against the Buffer with changes against the user's view of
it. This view corresponds to a File object which, for now, represents the
terminal as an output device (standard output). The Editor parses the command
(input) stream into actions against itself. An action against the Editor
object changes the state of its constituent data members, which most likely
include a Buffer, an input File and an output File. Without much trouble, we
can map this verbal description of the editor into C++:
class Editor {
// incomplete class declaration
public:
void eval(); // parses the command stream
private:
Buffer b;
File input;
File output;
};
The operations (from the requirements document -- see CUJ January 1990) are:
Note: bracketed arguments are optional
[n]g Move point to just before nth character (zero-based). Default value for n
is 0.
[n]p Print n characters starting at the first character after point, followed
by a newline. n defaults to 1. Increment point by n.
i Insert arbitrary number of characters before point. Terminate insertion with
'.' on a line by itself.
[n]d Delete n characters starting at the first character after point. n
defaults to 1.
[n]y Paste whatever was last deleted n times just before point. n defaults to
1.
w [file] Write out the internal representation of the file (the buffer) to the
named file. The primary default for file is the filename command line argument
to ced. If ced was invoked without a filename, it selects the last file
written to.
q Exit the editor.
? Print out useful information, like file name, point and size of file.
Now, how shall we represent the actions or operations on the Editor? Here the
object model itself provides the answer.
Definition of object: An object is a repository of state which responds to a
fixed set of messages. The response may involve changing the state of the
object, associating a return value with the message, and sending messages to
other objects.
Each edit operation is represented as a member function of Class Editor. For
now, assume that the return type of each of these member functions is void
since each member function changes the state of the Editor object, and
besides, I can't think of any other return type! Without more information to
determine the argument type(s) of these member functions, make the simplest
assumption: no arguments. In C++, the Editor class now looks like Listing 1.


The Parser


To invoke an operation, the user types an optional integer followed by a
single character which represents the command. The easiest way, by far, of
mapping a character to a function (member function or otherwise) is an array
of pointers to functions, wherein the character plays the role of index. At
most, the dimension of the array -- call it actions[] -- is 128 (the range of
ASCII characters). And we know the type of the array, because each of its
slots contains the address of a member function of Editor which takes no
arguments and returns void
void (Editor::*action[128])();
Making the action[] array a private and static data member of class Editor,
renders it invisible to Editor clients and shared by all Editor objects. We
can initialize the array at compile time outside the class template with:
Class Editor {
...
private:
static void (Editor::*action[128])();
...
};


void (Editor::*action[128])() = {
...
&Editor::go, // corresponds to ASCII 'g'
...
};
We have developed enough machinery to sketch out the parsing function:
typedef int Truth;
const Truth forever = 1;
void Edit::eval() // parsing function
{
while( forever) {
// get character c from input stream
int c = input.get();
(this->*action[c])();
}
}
The basic idea is to loop forever, getting command characters from the input
stream. Each character is mapped to an edit operation via the action[] array.
To terminate the loop, the user types 'q' which invokes
void Editor::quit()
via the action[] table. It suffices for quit() to call exit(0), terminating
the edit program.
All characters that don't correspond to a command, for example 'z', are mapped
-- via action[] -- to a single function that indicates an error, say
void Editor::eerror()
{
output.put( "?\n" );
}
where put( char *) is an overloaded member function of File that knows how to
write C strings to a File, just like the library function fputs(). File
already has a put (int c) member function to write a single character.
Apropos, this put(char *) is an example of a useful addition to the public
interface of a type that was not discovered until well after the bulk of the
design work had already been completed for that type. So, let's add put( char
*) to the public interface of File with the charming implementation
void File::put( char *p)
{
for( ; *p; p++)
put( *p); // File::put(int)
}
Certain harmless characters (for example, white space: blank, newline and tab)
don't need to generate an error message. These characters can also be mapped
via action[], but this time to an Editor member function that does nothing:
void Editor::donothing() {}
It is not clear yet whether eval() can handle the optional integer modifier to
a command. In the first design iteration, I'll temporarily eliminate the
optional integers from the command stream. Hopefully, the simplification will
provide enough insight into the problem so that I can recover the integer
modifiers in the next design go-around.


A Simple Main Program


Time to go for the jackpot and write a main program:
// invoked as ced at the command line
main( int argc, char *argv())
{
Editor ed;
...
ed.eval [];
}
The main instantiates an editor object and fires off an edit session with
eval(). This main is incomplete though; it should also optionally associate a
file with the edit session. We could implement this feature with two Editor
constructors. The first
Editor::Editor()
corresponds to the case
prompt> ced
where the program is invoked with no command line arguments. This constructor
doesn't do much: just provide default values for the Editor's data members.
The second constructor
Editor::Editor( char *)
corresponds to the case
prompt> ced filename
where the edit program is invoked with a filename. The second constructor must
handle these possibilities:
The named file exists. Read it into the buffer.
The file does not exist.
In both eventualities, the constructor squirrels away the name so that the
Editor write operation has a default file name to which to write the buffer.
We can save the name in an Editor data member

char *Editor::filename;
We can combine these two constructors into one
Editor::Editor( char *)
by employing the convention that the null string "" corresponds to the absence
of a file name command line argument. Adapting main yields
main( int argc, char *argv[])
{
Editor ed( argc > 1 ? argv[1] : "" );

ed.eval();
}
which, you have to admit, is an awfully short main program! Now to implement
the constructor:
Editor::Editor( char *fname)
: b( fname), input(), output( "", "w")
{
// play it safe, make a copy of the file name
filename = new char[strlen( fname) + 1];
strcpy( filename, fname);
}
Here I've used a delightfully elegant feature of C++: the facility for
constructing constructors out of constructors (the tongue twister was too good
to miss!). I initialize the Editor object by invoking the constructor of each
of its data members. In the expression
... : b( fname), input(), output( "", "w")
the Buffer b initializes itself from the file name fname. No problem results
if fname is the null string; the buffer simply positions itself at the
beginning. The input File object connects itself to standard input, and the
output File object connects itself to standard output. (Refer to the previous
columns for the details of Class Buffer and Class File.) The actual body of
the Editor constructor is thus small; it just copies the file name to the data
member filename.


Editor Commands -- Implementation Phase 1


Now I'll implement the edit operations -- go(), print(), insert(), del(),
paste(), write(), quit(), info(). I'll explain two of these operations to give
the flavor of how it's done. I'll also assume that the parser Edit::eval()
only handles single command characters (no numeric modifiers).
The go() function advances the point in the buffer by one. The function should
emit a warning if asked to advance beyond the end of the buffer. We can
implement go() by passing the buck to the buffer
void Editor::go()
{
if( b.isend() )
eerror();
else
b.next();
}
The delete operator must deal with the end-of-buffer boundary condition
void Editor::del()
{
if( b.isend() )
eerror();
else
b.dela();
}


Editor Commands -- Implementation Phase 2


In the original editor specification each command could be qualified by an
optional integer prefix. Now I'd like to add this feature, but keep as much as
possible of the code that I've developed up to now. In particular I will keep
eval() unchanged, I'll preserve the names of the Editor action functions
(go(), insert(), ...) and I'll keep these addresses in the very same slots in
the action[] table. The action functions, however, will change.
The revised Editor go() function must handle (for now) two cases: with a
numeric prefix, and without a prefix. In the case without a numeric prefix,
the go() function simply provides the default, 1.
If we choose between these cases with a switch statement, we would need to
switch on some type-tag. As I mentioned in the column on object-oriented
design (July 1989), the wrong way to solve the problem is with a C-style
switch-case statement, because it is a difficult structure to maintain and
enhance.
In general, whatever can be done with a switch-case statement in C can be done
far better with dynamic binding in C++. Somehow I want the code of
Editor::go() to pass the buck and send a 'go' message to a pointer to an
object
p->go( ...);
where the pointer is formally of type Base (yet to be determined)
Base *p;
but at runtime, p points to either an object of type Base or to an object of
type publically derived from Base, say D1 or D2. With this arrangement, the
message expression invokes the go() function corresponding to the type of
object to which p points. Each type in the inheritance hierarchy has its own
implementation of go(), and go() is declared virtual in Base. For this
exercise Base can represent the default numeric prefix, and the derived type
D1 can represent the explicit numeric prefix.


Class Register



Because I think of the hierarchy containing the tentative types Base, D1, and
D2 as a tree of registers out of which to construct an editing machine, I'll
give these types the real names Register and Iregister (I for integer) to Base
and D1 respectively. A register may contain a value (an integer or a character
...) and responds to messages that involve editing.
Now I'll flush out the functional prototype of go() in class Register. Since
go() is virtual and we want dynamic binding, all the go() functions in classes
derived from Register (like Iregister) must have the identical prototype. The
return type is void.
What part(s) of the Editor object does Register::go() manipulate? At most, I
could pass go() the entire Editor object.
virtual void Register::go( Editor &e);
Unfortunately go() has no way to directly access the Buffer or the input and
output Files (they are private data members of the Editor). So, instead I must
pass the individual Editor data members. Given our simple user interface, it
suffices to pass the Buffer and the output File. So by the rule of parsimony
(less is more ...) I select the prototype
virtual void Register::go( Buffer &b, File &f);
Note that the Buffer argument to go() is a reference, which makes perfect
sense -- I do want to change the state of the Buffer. These decisions make it
trivial to implement Editor::go().
void Editor::go() { rp->go( b, output);}
where rp is a pointer formally of type Register, which, at runtime, may be
pointing either to a Register or Iregister object. Since rp must be visible to
Editor::go() (and I don't like global variables), I'll add the following data
members to the Editor
class Editor {
...
private:
...
Register *rp;
Register reg;
Iregister ireg;
};
Now rp can simply point to data members within its own Editor object
...
rp = &reg;
...
rp = &ireg;
I can strengthen the typing and express the fact that rp is only going to
point to data members of Editor of a particular type by declaring rp
Register Editor::*rp;
within class Editor. The declaration implies that rp is restricted to pointing
to Editor data members of type derived from Register.
Now you can see why I think of the Editor as a little computer constructed out
of these nice registers! I can re-use the work from Phase 1 to provide an
implementation for Register::go()
virtual void Register::go( Buffer &b, File &output)
{
if( b.isend() )
output.put( "?\n" );
else
b.next();
}
Notice that this implementation makes absolutely no assumptions about the data
members of class Register!
The existence of an Editor operation, like Editor: :go(), implies the
existence of a virtual member function of the same name in class Register
(Register::go() (an application of the pass the buck principle). Well, what's
sauce for the goose is sauce for the gander! That is, for each Editor
operation, we may as well conclude that there is a virtual member function in
class Register of the same name. Furthermore, this virtual function is
over-ridden in class Iregister. This observation determines the public
interface of class Register and Iregister, although there remains some work to
determine the precise functional prototypes of the member functions. Class
Register appears to establish an a-priori interface, meaning that types
derived from Register inherit the semantics (manual page specification) of
Register, while being at liberty to override the inherited implementation (of
the member functions). Here's the public interface of Register to date:
class Register {
public: // argument types to be
//determined
virtual void go( ...);
virtual void print( ...);
virtual void insert( ...);
virtual void del( ...);
virtual void put( ...);
};


Editor Implementation


Building the Editor is reduced to a collection of little do-able, bite-size
pieces: a member function in the public interface of some type derived from
Register. The architecture of the Editor is extensible: we can build features
into the Editor gradually over a period of time by adding new types derived
from Register. The implementation of the Editor follows the pass the buck
principle: each edit operation sends a message of the same name to the
Register pointer.
void Editor::operation()
{ rp->operation( ...); }
Of course, some operations (like quit()) have such simple implementations
(currently) that there is no need to deal with registers. Listing 2 shows the
Editor implementation, including the functional prototypes of the member
functions of Register.
I have added a killbuffer so that Editor::put() can yank back what
Editor::del() has deleted.


Meditation On A Further Decomposition



The data members of class Editor seem to fall into two orthogonal groups:
An editing machine: rp, reg, ireg, kbuf, action[] (class Editable?)
an editable object whose state is modified by the editing machine: b, input,
output (class Editmachine?)
Is this a reasonable decomposition? Can we exploit the identification of these
two new types (Editable and Editmachine) for the purposes of code re-use,
maintainability and enhancement? This partitioning decouples the parsing
engine (Editmachine) from the object against which action is taken (Editable)
as the result of parsing a command and simplifies the argument list of the
member functions in the public interface of Register (and its derived types).
Namely each function could take a reference to an Editable object. Moreover
one Editmachine object could edit multiple Editable objects.
In this analysis an edit session would be a relation consisting of an
Editmachine with an Editable object -- perhaps the basis for a Session type?
I'll leave this refinement to you.


Register Implementation


An implementation of the base class Register (Listing 3) completes the editor.


Iregister Implementation


Recall that the input command stream looks like a sequence of lines. Each line
consists of an optional integer prefix followed by a character. The character
is mapped via the action[] array to an Editor member function, which in turn
may send a message to the object pointed to by rp.
The optional integer prefix can also be mapped via the Editor action[] array
to a common Editor member function
void isinteger( int c)
This function takes a character argument that tells it what character invoked
it (e.g. '1' as opposed to '2'), requiring a mild change in the functional
prototype
void (Editor::*action[128]) ( int character);
and in all the Editor operations whose addresses appear in the action[] table:
void Editor::quit( int c) { exit(0); }
The isinteger() function scans for a sequence of ASCII digits. As soon as it
encounters a non-digit, it ungets that character to the input stream and
converts the digit sequence into an internal integer. The Iregister object has
an integer data member (val) that can be set or inspected via member functions
load() and store() respectively:
class Iregister : public
Register {
public:
Iregister() { val = 0; }
void load( int x){ val = x; }
int store() { return val; }
// other member functions
private:
int val;
};
Isinteger() can set the Iregister (ireg) data member of an Editor object like
so
ireg.load( atoi(buf));
where buf is a character array containing the digit sequence, and where atoi()
is a standard library function (for both C and C++) which converts
null-terminated strings into integers (if possible). Just before it returns,
isinteger() sets the Editor's rp pointer with the address of the Editor's
Iregister data member
rp = &ireg;
Now, if the next character that Editor::eval() sees corresponds (hopefully) to
an edit command, that command sends a message (go, insert ...) to rp
rp->message (...);
which is dynamically bound. However, each Editor command must take care to
reset rp
rp = &reg;
before it returns, to properly prepare for the next command line. All that
remains is to implement the messages sent to rp (go, print, insert, del, put)
(see Listing 4).
This implementation of Iregister is notable in two regards: Firstly, Iregister
inherits the implementation of insert() from its parent class Register.
Secondly, when we delete and put a bunch of characters, they remain in the
kill buffer so that they can be put again someplace else. This is a result of
using Buffer geta() insteadof dela() to 'get' characters from the kill buffer.
As noted, the design of class Iregister caused some modifications to be made
to class Editor. The latest version appears in Listing 5.


Summary


This completes an object-oriented design and implementation of a simple text
editor with a command-line user interface.
In the next column, I'll show how you can enhance the Editor without modifying
the existing code.

Listing 1
class Editor {
public:
void eval(); // parses the command stream

void go(); // move point
void print(); // print characters
void insert(); // insert characters into buffer
void del(); // delete characters from buffer
void paste(); // paste last deletion after point
void write(); // write buffer out to file
void quit(); // quit the editor
void info(); // display filename, point offset & file size
private:
Buffer b;
File input;
File output;
};


Listing 2
typedef int truth;
const truth forever = 1;
class Editor {
public:
Editor( char *fname) : b(fname),
input(fname), output( "", "w")
{
filename = new char[strlen( fname) + 1];
strcpy( filename, fname);

rp = &reg;
}

void eval()
{
while( forever)
(this->*action[input.get()])();
}

void go() { rp->go( b, output); }

void print() { rp->print( b, output); }

void insert() { rp->insert( b, input); }

void del() { rp->del( b, kbuf, output); }

void put() { rp->put( b, kbuf); }

void quit() { exit(0); }

void eerror() { output.put( "?\n"); } // error function

void donothing() {}
private:
Buffer b;
File input;
File output;

Register Editor::*rp;
Register reg;
Iregister ireg;
Buffer kbuf; // kill buffer

};


Listing 3
class Register {
public:
// move point forward one character
virtual void go( Buffer &b, File &output)
{
if( b.isend() )
output.put( "?\n" );
else
b.next();
}

// print character after point and move point forward
virtual void print( Buffer &b, File &output)
{
if( b.isend() )
output.put( "?\n" );
else
output.put( b.geta());
go(); // Register::go()
}

// insert after point
virtual void insert( Buffer &b, File &input)
{
int c;
int prev = '\n';

while( !input.iseof() ) {
c = input.get();
if( (prev == '\n') && (c == '.') )
if( input.peek() == '\n' ) {
input.get();
break;
}
b.putb( c);
prev = c;
}
}

virtual void del( Buffer &b, Buffer &kbuf, File &output)
{
if( b.isend() )
output.put( "?\n" );
else {
// need to capture deletion into kill buf
// empty out the kbuf ...
for( kbuf.begin(); !kbuf.isend(); kbuf.next())
kbuf.dela();
kbuf.putb( b.dela());
}
}

virtual void put( Buffer &b, Buffer &kbuf)
{ b.putb( kbuf.geta()); }


// no need for data members. defaults are implicit!
};


Listing 4
class Iregister : public Register {
public:
Iregister() { val = 0; }
void load( int x) { val = x; }
int store() { return val; }

void go( Buffer &b, File &output)
{
if( !b.go( store()))
output.put( "?\n" );
}

// print
void print( Buffer &b, File &output)
{
for( int n = store(); !b.isend() && n; n-- )
output.put( b.geta() ), b.next();
output.put( '\n');
}

// Inherit parent class insert(), ignore prefix

void del( Buffer &b, Buffer &kbuf, File &output)
{
// flush previous contents of kill buffer
for( kbuf.begin(); !kbuf.isend(); kbuf.next() )
kbuf.dela();

for( int n = store(); !b.isend() && n; n-- )
kbuf.putb( b.dela());
if( n)
printf( "? %d characters left undeleted\n", n);
}

// re-use kill buffer by geta'ing and not deleting from it
void put( Buffer &b, Buffer &kbuf)
{
for( kbuf.begin(); !kbuf.isend(); kbuf.next())
b.putb( kbuf.geta());
}
private:
int val;
};


Listing 5
Appendix E

const int forever = 1;
class Editor {
public:
Editor( char *fname) : b( fname), input(fname), output( "", "w")
{
strcpy( filename, fname);


rp = &reg;
}

void eval()
{
while( forever) {
int c = input.get();
(this->*action[c])( c);
}
}

void go( int c) { rp->go( b, output); rp = &reg; }

void print( int c) { rp->print( b, output); rp = &reg; }

void insert( int c) { rp->insert( b, input); rp = &reg; }

void del( int c) { rp->del( b, kbuf, output); rp = &reg; }

void put( int c) { rp->put( b, kbuf); rp = &reg; }

void info( int c)
{
char buf[128];
int point = b.lengthb();
b. go( point);
output.put( filename);
sprintf( buf, ": point %d, length %d\n", point, b.length());
output.put( buf);
}

void write( int c)
{
output.put( "file? ");
char buf[128];
char *p = buf;

// candidate member function of File: getline( char *buf)
while( (c = input.get()) != '\n' )
*p++ = c;
*p = 0;
if( !*p)
p = filename;

File f( p, "w" );
if( f.isok() )
for( ; !b.isend(); b.next() )
f.put( b.geta() );
else
output.put( "cannot open file\n" );
}

void quit( int c) { exit(0); }

void isinteger( int c)
{
char buf[64];
char *p = buf;


for( *p = c; !input.iseof(); p++) {
c = input.get();
if( isdigit(c) )
*p = c;
else {
input.unget( c);
break;
}
}

*p = 0; // null termination of string
ireg.load( atoi( buf));
rp = &ireg;
}

void eerror( int c) { output.put( "?\n"); } // error function

void donothing( int c) {}
private:
Buffer b;
File input;
File output;
char filename[128];

Register Editor::*rp;
Register reg;
Iregister ireg;
Buffer kbuf; // kill buffer
static void (Editor::*action[128])( int character);
};

// here we initialize the action[] array
// most entries are &Editor::eerror. To save space, we show the
// non eerror entries

void (Editor::*action[128])( int character) = {
&Editor::eerror, // '\0'
...
&Editor::donothing, // tab
&Editor::donothing, // newline
...
&Editor::donothing, // 0x20 blank
...
&Editor::isinteger, // 0x30 '0'
&Editor::isinteger, // '1'
&Editor::isinteger, // '2'
&Editor::isinteger, // '3'
&Editor::isinteger, // '4'
&Editor::isinteger, // '5'
&Editor::isinteger, // '6'
&Editor::isinteger, // '7'
&Editor::isinteger, // '8'
&Editor::isinteger, // '9'
...
&Editor::info, // '?'
...
&Editor::del, // 'd'
&Editor::eerror,

&Editor::eerror,
&Editor::go, // 'g'
&Editor::eerror,
&Editor::insert, // 'i'
...
&Editor::print, // 0x70 'p'
&Editor::quit, // 'q'
...
&Editor::write, // 'w'
&Editor::eerror,
&Editor::put, // 'y'
...
&Editor::eerror // 0x80
};

















































Questions & Answers


Converting Files, Info On X_Windows




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on ANSI C Committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


You may fax questions for Ken to (919) 493-4390. When you hear the answering
message, press the * button on your telephone. Ken also receives email at
kpugh@dukeac.ac. duke. edu (Internet) or dukeac!kpugh (UUCP).


Q
Recently I had reason to write a short C program to convert a series of large
BASIC files into another format.
Everything was fine except where the BASIC program had stored numbers as
single or double precision floating point numbers. Microsoft C did not read in
and convert these numbers correctly.
Am I correct in assuming that BASIC stores floating point numbers in a
different format than C does? If so what is a routine which will do conversion
between the formats?
Finnbarr P. Murphy Cork, Ireland
A
Perhaps a reader can tell us what internal representation BASIC uses for
floating point numbers.
Your question raises an important issue for anyone who is trying to move data
between environments -- whether the environmental differences be hardware,
operating system, or (as in this case) development language.
The only sure path to data compatibility is to use ASCII characters for the
numeric values (i. e. text format). (There is a possibility that the lowest
order bit of floating point values may differ, but this is usually not a
significant issue.)
You could write a BASIC program to dump the binary files in ASCII. A C program
would then read the ASCII files and convert the data back to internal format.
The data file is typically written in quote and command delimited format. For
each record, the field values are printed out as:
value1, value2, value3 . . . (Newline)
The Newline may be composed of either a single newline or a combination of
carriage return/linefeed. If a comma appears in a value, then the value is
enclosed in quotes. For example:
This is a field,
"This is a comma, field",
333.45, 666.9 (Newline)
This format is easy to generate in BASIC, but a little more difficult to parse
in C. Because scanf stops reading string input at a white space, it is easier
to read the entire string in with fgets() and then use the strpbrk() or
strcspn() functions to break it into its parts.
I know of only one binary floating point format that is guaranteed to be
compatible among machines -- the IEEE standard. Most compilers do not support
this relatively new standard, and the 80-bit long format is typically used
only with compatible co-processors.
Q
I have a question about how the va_arg macros in Standard C include file
<stdarg.h> are intended to be implemented and/or used. I have not seen this
topic raised anywhere in the Standard C references I have, though I have not
seen the actual text of the standard.
To pose the question, consider the following as a possible implementation
intended for use in a 68000 environment:
typedef int * va_list;

#define va_start (arg_ptr, first_param) \
(arg_ptr= (va_list) (&first_param+1))
#define va_arg(arg_ptr,type) \
(* (((type*) (arg_ptr= (va_list) ((type*) arg_ptr) +1))-1))
#define va_end(arg_ptr) (arg_ptr=NULL)
The tortuous expression for va_arg is used because the typecasted pointer is
not an 1value. (At least, that is what my compiler tells me if I try to use
*((type*)arg_ptr)++).
The problem appears when this definition is used with narrow types. For
example, if a float value is passed, it will be widened to double when stored
on the stack. The compiler is smart enough to recognize this fact for declared
parameters, but the expression generated by the above va_arg macro would not.
The two solutions that I can see are: some kind of test for narrow types
inside the macro; and restricting the use of va_arg to non-narrow types. The
only way I can see to implement the first solution is to string-ize the type
and generate code to test whether a narrow type is being used and handle it
separately. This seems like a very undesirable solution. On the other hand, I
have seen no mention of the restriction specified in the alternative. What am
I missing?
Jeff Newmiller
Davis, CA
A
I'm glad I don't come across those expressions too often in my programming. I
haven't actually seen anyone use the variable argument macros in writing their
own functions. They are useful for writing the standard printf and scanf
family of functions in C, rather than in assembler.
The answer to your question lies in the standard. It reads "The va_arg macro
expands to an expression that has the type and value of the next argument in
the call..... if type is not compatible with the type of the actual next
argument (as promoted according to the default argument promotions), the
behavior is undefined."
It would appear that solution #2 is your only choice.
If all types are integral multiples of the size of an integer, you can
simplify your expression for va_arg by adding the sizeof(type) /
sizeof(va_list) to arg_ptr. If not, then the expression will have to stay as
it is.


Reader Requests



I have a customer in Florida using a VAX running VMS 5.0-2. I would like to
write programs here in PA on MS-DOS and only transmit a finished product to
the end user.
To relieve the cost of telephone connect time, I would like to compile on my
MS-DOS machine. I have heard of cross compilers from VMS to DOS. Have you
heard of one that goes MS-DOS to VMS? Any answer would be appreciated.
Paul Hetherington
Levittown, PA
Q
Is X-Windows available for the IBM-PC? Rumours claim that MIT freely
distribute the C-source. It this true? And in case how may I obtain a copy?
Thanks for a splendid magazine, it is highly appreciated.
Sincerely,
Einar Valen
Ristangen
Haugland
5265 Y. Arna
Norway
A
I asked fellow CUJ columnist Sydney Weinstein to respond. Here is his answer.
The X Windowing system is split into two sections. The client that wants to
display information and the server that displays the output the client
generates. These two processes are connected by TCP-IP sockets. For IBM PCs
there are several commercial X Server packages that reside on top of
underlying TCP-IP networking packages. These packages are just ports of the
MIT X distribution with changes necessary for the PC, and with varying amounts
of optimization of the resulting server. Some of the commercial products run
directly on the PC, some run on top of Microsoft Windows. Many vendors at
UniForum 1990 in Washington, D.C. displayed MS-DOS X Servers. Of course, the
commercial packages are just executable binaries.
If you wish to adapt your own X server from the sample X server and the
sources provided from the X Consortium at MIT, order the X source tapes. These
tapes include the complete code of the X system and many contributed programs,
toolsets, and window managers. On January 4th, 1990, the MIT X Consortium
announced Release 4 of the X Window System, Version 11. Quoting from their own
release notice:
"The software in this release is not in the public domain, but is freely
available without restrictions. No license is required and there are no
royalties; vendors are actively encouraged to base products upon this
software... a set of four 1600bpi tapes in UNIX tar format plus printed
versions of the major manuals and a copy of the new Gettys, Newman, and
Scheifler book X Window System: C Library and Protocol Reference are available
from the MIT Software Center."
In North America, the manuals and books are priced at $125 (for tapes too, add
$275). Abroad, the manuals and books go for $175 (for tapes, add $325). Prices
include shipping.
To order, please send a letter and a check payable to MIT in US currency for
the appropriate amount to:
MIT Software Distribution Center
Technology Licensing Office
Room E32-300
77 Massachusetts Avenue
Cambridge, MA 02139


Announcing The Great Name / Obscure Code Contest


As announced in the previous column, this contest has begun.
Send examples of the worst names or abbreviations that you have seen in other
people's programs (or even your own). Include both the name and a description
of what it is supposed to represent. The best (or worst) examples will be
published here, with credit for your submission. The name of the programmer
who actually wrote the code in which the name is used will not be mentioned
without his or her express permission.
For starters, here are some abbreviations that have appeared in various
journals.
cls, dfthlp(), q_bsy(),
revdot(), hpel, strlwr(),
do_esq(), dld_char
Can you determine the full name for the variable or function? These variables
were found in programs from the following general application areas:
cls/* Window and File */
dfthlp() /* Windowing */
q_bsy()/* Queues */
revdot() /* Graphics */
hpel/* Graphics */
strlwr() /* General package */
do_esq() /* Communications */
dld_char /* Communications */
ANSI C allows 32 characters for variable names; most linkers allow the same
limit. What longer names would you choose?











































































Implementer's Notebook


The Far Side Of C




Don Libes


This article is not available in electronic form.

















































PCX Toolkit


Bob Barrett


Bob Barrett is a principal consultant at Commsoft Consulting Inc., where he
has advised AT&T Bell Labs, US West Communications and the New York Stock
Exchange. He has worked in computer software and hardware design for over nine
years and holds a B.S.E.E. degree. He may be contacted at 528 N. Riverside
Dr., Neptune, N.J. 07753 usenet : uunet!rwbix!cci.


Genus Microprogramming's PCX Programmer's Toolkit v3.52 contains 62 routines
that manipulate, display, save, capture and print PCX format images, a format
that is gaining wide acceptance in desktop publishing circles. The toolkit
functions are all written in assembly language and source code is available
(at additional cost). The toolkit also features support for 21 video modes, 12
compilers and six programming languages in 3 memory models. The specific
compilers supported are Microsoft C v5.0 and v5.1, Turbo C v1.x and v2.0,
QuickC v1.x and v2.0, Lattice C v3.2x, v3.3x, and v6.0, Microsoft Basic v6.0,
QuickBasic v4.0 and v4.5, Microsoft Pascal v4.0, Turbo Pascal v4.0, v5.0, and
v5.5, Microsoft FORTRAN v4.1, Microsoft Asm v5.0 and v5.1, Turbo Asm v1.0 and
Clipper Summer '87.
The toolkit includes nine utility programs which were built using the toolkit
itself. PCXSHOW displays a PCX image file. PCXGRAB is a TSR screen capturing
program. PCXPRINT prints two-color images on an HP Laser printer. The other
utilities manage image libraries, locate screen coordinates and translate
image files from an older PCX format to the current one.
The toolkit functions are organized in the following categories: buffers,
files, display, virtual screens, text screens, printing, palettes, headers,
video and miscellaneous. The toolkit functions also support expanded memory
(LIM EMS v4.0) for storing virtual image buffers. With virtual buffers the
user can manipulate PCX image files that are larger than the local display
resolution and "pan" across the image in any direction -- ideal for scanned
images. And for software developers who are "sticklers" for portable
high-performance code, there's a function that queries the current VGA adapter
to determine the chip set in use (Tseng, Paradise or Video-7).
The toolkit's minimum system requirements are: an IBM PC/XT/AT (or
compatible), an IBM CGA/EGA/VGA or Super VGA up through 800x600x256, or a
Hercules display adapter (or compatible), a floppy disk drive (360K), MS-DOS
or PC-DOS v2.1 or above, any of the supported compilers and 128K of available
system memory. For best performance, an extended VGA card, expanded memory and
a hard disk is recommended.
I installed the toolkit on an Epson Equity 2 (XT compatible) with an NEC
Multisync EGA monitor, 1.5M of expanded memory and Microsoft C v5.1.
Installation is quite straightforward. Simply copy the appropriate libraries,
include files and the utilities to a convenient directory on your hard disk.
The programs compile and link with cl targfile.c/link pcx_cs.
For the sake of speed, some toolkit functions round off the X-coordinate so
that the left edge of an image is aligned on a byte boundary. In some modes
this may cause a displayed image to shift left a few pixels and a saved image
to include a few extra pixels to the left. This limit can be bypassed if the
image you want to display will uncompress into a virtual buffer of less than
64K. Once in this virtual buffer, the image can be "PUT" at exact pixel
coordinates with the added option of logical operations such as AND, OR and
XOR. The only other restriction is that you must never define a virtual window
that is larger than the image or current display resolution.


Documentation


The 300-page manual is concise and very professional. The first four chapters
cover installation, getting started, using the utility programs and the
toolkit's programming interface including a toolkit library and a quick
reference.
I was most impressed with chapter five, which contains the library functions
reference. Each routine's description is formatted in the following manner:
summary, description, return value, related functions and examples. The
summary section gives the function prototypes for each of the five supported
programming languages. The description provides not only a function
description, but also any dependencies, warnings and recommendations on its
use. The examples section gives a clear and even useful example of the
function's usage within a program, and again, one example for each of the five
supported languages.
The last two chapters, six and seven, describe some useful applications for
the toolkit utility programs and provide example code listings for the sample
program "pcxtest" -- again in each of the five supported languages. The
appendices cover topics such as problem determination/resolution, toolkit
constants (function return codes and display types and modes), data structures
(defined for each language except Clipper), and the utility program error
messages.
The last appendix describes the PCX file format. This section details the PCX
header information and the run-length encoding compression method. The index
is very clear and functional.


Support


Genus Microprogramming provides phone technical support. (This report went so
smoothly, I never had the opportunity to use it!) They also provide a FAX
number, a CompuServe ID number and a ZSoft BBS number. Of course you can
always write them a letter as well. Obtaining upgrades is a matter of calling
an 800 number to be put on a list of customers that want the next library
upgrade when it becomes available.


Competition


I don't know of any competition for this product. If there is, it'll be a
tough battle for market share with this toolkit priced at $195.00 ($495.00
with source code) and no royalties.


Conclusion


Overall, I found the toolkit to be an exceptional product. It performed "as
advertised" in every respect. I would highly recommend this product to any
software developer working with PCX files. It is well worth the price.



















Three Books On Concurrent Programming


P. J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standards committee.




Introduction


Concurrent programming is hard. It's hard enough to do it UNIX style, where
each concurrent entity is a separate program. (Rarely do you have a process
fork unless the child process immediately execs another program.) It's really
tough when you try for parallel operation within a single program (or
function, or expression). Incorrect code can be too easy to write, correct
code can be too hard to read.
Committee X3J11 spent most of a year trying to clean up the semantics of
Standard C for the parallel programmers. The notorious noalias type qualifier
was designed explicitly to aid in the writing of functions that could be
vectorized semiautomatically. It didn't stay in the language because we didn't
have time to chase down all the implications of such a radically new concept.
I like to think that the exercise was not a complete waste of time, even if it
did delay the standard.
An easier way to parallelize C is to extend the language. (I mean extend it
even more than adding the odd noalias declaration). Require the user to add
hints in critical places where you think parallel operation can and should
occur. That's what most of the successful concurrent FORTRAN implementations
do. Most customers seem to find it a palatable approach.
One problem is to figure out what extensions to add. Commercial computers that
support concurrent programs are far from identical. What works for one machine
may be hard to port to another. Even worse, software mechanisms for describing
concurrency are still evolving. Should you use locks, semaphores, monitors, or
Ada-style rendezvous? Leave one out and someone will complain. Throw in too
many and elegance goes out the window. There's no easy answer today.
With that in mind, let's look at three recent books on concurrent programming
in C.


Programming Parallel Processes


Robert G. Babb II, Addison-Wesley, Reading MA, 1988.
This book provides a comparison of eight commercial concurrent computers.
Students from Babb's graduate seminar at the Oregon Graduate Center solved the
same problem on each of the machines. About half the book is a report about
each machine. The reports focus on the program development environment and the
pitfalls encountered by a novice user. No attempt is made to develop credible
performance comparisons. The last half of the book consists of appendixes
listing the code for each machine plus any relevant documentation peculiar to
that machine.
Most of these machines have C compilers. That's the good news. The bad news is
that FORTRAN is still the language of choice for parallel programming. The
language is more amenable to sorting out aliasing issues. It also has a longer
history of being mucked over by the parallelizers. As a result, nearly all of
the examples in this book are in FORTRAN. You will find just enough C to whet
your interest, not enough to satisfy your curiosity.
If you want to get a feel for the current state of the art, you might want to
read this book. It is more a collection of trip reports than a general
overview of the field. That can be of use to programmers in the trenches,
however. If you want to see how to do concurrent programming in C, you may be
frustrated. You will have to intuit many of the lessons from FORTRAN.


The Concurrent C Programming Language


Narain Gehani and William D. Roome, Silicon Press, Summit NJ, 1989.
Here is a serious attempt to introduce concurrency directly into C. The
authors have altered Stroustrop's C++ to C translator, cfront, to accept a
different dialect. That dialect has extensions that let you specify process
creation, control, and synchronization. As a result, you get either Concurrent
C or Concurrent C++, as your heart desires.
Much of the machinery is modelled on Ada's rendezvous semantics. That's
powerful stuff. Some would say it's too powerful for simple applications. The
performance overheads can be large. Getting away from full blown Ada to a
sleeker C environment certainly helps. Beyond that, I can't comment on whether
the performance penalty is too high.
I can observe that you can cut corners. You are given enough primitives to
synchronize processes several different ways. Pasted as it is atop C,
Concurrent C also gives you ample opportunity to get in trouble. You have to
know what you're doing to avoid horrendous debugging problems.
On the plus side, the book gives lots of examples of how to do things right.
Most examples are small and cleanly written, which helps. Many could use more
description in the running text, which does not. The book contains exercises
at regular intervals. The exercises look doable and reasonably well keyed to
the material. Muck with the examples, do some of the exercises, and you will
probably learn a lot.
The authors claim to have ported Concurrent C to a number of useful
environments. You should be able to get your hands on a version that you can
tinker with. Whether you should do production work in Concurrent C I can't
say. It will certainly give you some useful experience in writing concurrent
programs.


Portable Programs for Parallel Processors


James Boyle, Ralph Butler, Terence Disz, Barnett Glickfeld, Ewing Lusk, Ross
Overbeek, James Patterson, and Rick Stevens, Holt, Rinehart and Winston, New
York, 1987.
You can look on this book as providing a kind of poor man's Concurrent C. The
authors have developed a macro package that isolates concurrency control
within a C program. They use the standard UNIX macro preprocessor m4 to expand
the macros. Buy the hardbound version of the book and you get a diskette
containing sets of these macros for a variety of machines.
The "language" is naturally restricted to what you can do by rewriting C
source text with a macro preprocessor. That's not too bad since m4 lets you
write what look like C function calls to invoke macros with arguments. It
looks a little silly where you have to write special declarations, but what
the heck. I found the code in the book to be about as readable as Concurrent
C, even though the latter has a much more integrated grammar.
More to the point is the choice of primitives provided by the macro package.
They seem to have accreted over time as the authors tackled progressively more
demanding problems to solve. I'd have been happier if some effort had gone
into tidying up the set.
Still, the macros do provide a good way to improve portability. I earlier
lamented the variety of architectures you must choose among if you want truly
parallel code execution. The marketplace is changing too fast for you to want
to hitch your wagon to any one star, no matter how brightly it shines today.
You need help in isolating implementation peculiarities. Nobody likes code
that's half #ifdefs; it's almost impossible to read. So long as macros don't
get too clever, they can help a lot.
So the authors deserve credit for developing a method for improving the
readability of concurrent C code that must remain portable. If they haven't
yet formulated an elegant language for describing concurrency, it's hard to
fault them. I don't know of anyone else who has.
What they have done is produce a book that is fairly readable. It has lots of
examples and reasonable good explanations. (It does not have exercises,
sadly.) The code looks believable. You can learn a lot about concurrent
programming by reading it and playing with the macros. You might want to give
it a try.






































































Publisher's Forum
In this issue we again present -- as part of our continuing public service
program -- the winners of the International Obfuscated C Code contest (see Don
Libes' Column). As always, the winning entries are perversely clever and
entertaining -- and frightfully effective demonstrations that "clever and
entertaining" don't build understandability.
Least there be any misunderstanding, we do not encourage these obscure
programming practices. We publicize the results only as a public service -- to
give this dangerous form of expression a safe outlet. "Please, please, boys
and girls, don't attempt this trick at home. Remember these are highly trained
professionals." (We hope.)
I suspect an eastern mystic would take satisfaction from the preprocessor's
critical role in many of the winning entries. When used simply and directly,
the pre-processor contributes as much to understandability and maintainability
as any language component. But when purposely exploited -- when the
pre-processor's latent powers are stretched to achieve non-obvious ends --
understandability and maintainability are seriously damaged.
"Too much of a good thing" and all that.
We'll continue our public service campaign in the next issue by announcing the
winners of our Bad C Pun Contest. Like the obfuscated code contest, the bad
pun contest is intended as a "safety valve". Through this contest we were able
to capture and destroy several hundred distressingly bad C puns, hopefully
removing them from circulation, at least for a while.
An independent judge selected the most nearly humorous groaners for prizes. We
saved only the winners, and will share those with you in the next issue. But,
please don't expect much -- remember this was a "bad" pun contest, and bad
they were.
Sincerely yours,
Robert Ward
Editor/Publisher



















































New Products


Industry-Related News & Announcements




Drivers Support Borland Graphics


Fleming Software has released Graf/Drive Plus, a software product that allows
Turbo C and Turbo Pascal graphics program to produce output on printers and
plotters.
Graf/Drive Plus provides device drivers for Borland's built-in graphics
library instead of implementing a separate set of graphics commands. This
approach ensures a high degree of compatibility with existing screen graphics
programs. Graf/Drive Plus is not permanently memory resident.
Turbo C and Turbo Pascal support a wide range of CRTs via Borland Graphics
Interface (BGI) drivers. Graf/Drive Plus provides additional BGI-compatible
drivers for Hewlett-Packard LaserJet, Epson FX80, and Epson LQ1500 printers,
as well as HP747OA/7475A pen plotters.
Graf/Drive Plus is $149 for a personal use license (without redistribution
rights) and $299 for a developer's license, and will be licensed to developers
without distribution royalties. It requires Turbo C v2.0 or Turbo Pascal
v5.0/5.5 for PC compatibles. Contact Fleming Software, P.O. Box 528, Oakton,
VA 22124 (703) 591-6451.


ALAC Works On Mac And PC


United Data Corporation has introduced ALAC for Windows, which, together with
MacWorkStation, will allow a host program to control an Apple Macintosh or any
PC/AT, PS/2 or AT-compatibles. There are no restrictions on the language used
on the host. Programmers not familiar with the complexities and intricacies of
Microsoft Windows and Macintosh can build applications for these environments
in a language of their choosing. Programmers have full access to and control
over dialog boxes, menus, windows, and other features of the graphical user
interface.
For more information, contact Paul Craft at United Data Corporation, 3755
Balboa Street #202, San Francisco, CA 94121 (415) 221-8931.


HCR Releases Test Suite & V2.2 C++


HCR Corporation and Associated Computer Experts (ACE) have jointly released
SuperTest, a robust, comprehensive test suite for C compilers providing
complete conformance with the new ANSI standard for C compilers. HCR
Coporation has also released HCR/C++ v2.2. HCR/C++ operates on most popular
386 and 486 UNIX systems. SuperTest uses more than 11,000 tests to verify
conformance of a C compiler. SuperTest also checks for quality and in-depth
stress, using more than 380,000 depth tests.
HCR/C++ v2.2 provides extensions to the set of class libraries in HCR/C++
v2.1. The popular NIH (National Institute of Health) class libraries are
included in source format, providing classes for strings, linked lists, date
and time conversion, indexed arrays, hash tables, regular expressional and
vector operations.
The InterViews libraries developed at Stanford University provide a simplified
interface to X_windows.
Enhancements have been made to dbXtra; for example, hardware debug registers
speed up execution by 300 times over v2.1 when tracing access to variables.
HCR/C++ v2.2 also displays C and C++ source code, program output, display
expressions and debug output at the same time through windows on both graphics
and ASCII terminals.
Version 2.0 customers can obtain v2.2 for $99.
Contact HCR at 130 Bloor Street West, 10th Floor, Toronto, Ontario M5S 1N5
(416) 922-1937; FAX (416) 922-8397.


Evolution Computing Opens CAD Engine To C Users


Evolution Computing has release a C language interface for FastCAD, which
allows the user/programmer to add a new layer of customization on top of the
CAD interface. The programmer can write new commands to work at full FastCAD
speeds, or program completely new entity types (like triangles, doors, or
gears).
C-CAD allows the user to write plug-in FastCAD XP (External Procedures)
modules in C. The provided C library contains over 130 routines for entity
manipulation, computation, and construction of the user interface. C-CAD
includes routines for reading/writing disk files, custom entity support, and
dBase file access and linear algebra. Source code for many routines is
provided.
C-CAD requires FastCAD; Turbo C, v2.0 (Turbo C Professional recommended); a
PC, XT, AT, PS/2 compatible or 386 computer with math co-processor; graphics
card; pointing device; and 640K of RAM (minimum).
C-CAD retails for $595 and is available from Southern California CAD
Engineering (SCCE), 7663 Winnetka Avenue, Canoga Park, CA 91306 (818)
700-0398.


Library Supports Neural Nets


NeuroSym Corporation has released the NeuroSym Neurocomputing Library. This
library of C functions allows neural network capabilities to be added to C
programs developed for MS-DOS systems. The neural network paradigms supported
are backpropagation, counterpropagation, Hopfield, bidirectional associative
memory, self-organizing map, brain-state-in-a-box and perceptron. Functions in
the library support both training and production environments for each network
type. No special hardware is required to run software developed with this
system. Multiple networks and multiple paradigms can be used in a program.
Example programs for each paradigm are included. The system sells for $179
which includes support for either Turbo C v2.0 or Microsoft C v5.1.
For more information, contact NeuroSym Corporation, P.O. Box 980683, Houston,
TX 77098-0683 (713) 523-5777.


AI Ware Releases Sun Versions


AI Ware has released the N-NET 500 Series, designed to run on the complete
line of Sun Microsystems workstations. The N-NET 500 Series consists of three
versions: N-NET 530, designed for the Sun386i; N-NET 540, designed for the
Sun-3; and N-NET 550, designed for the Sun-4 and SPARC station. All three
versions are $2995. Multiple user discounts are available.
The N-NET 500 Series software is written in C and is available on 1/4" data
cartridge or 3 1/2" floppies. It can be used on either a monochrome or color
monitor, and a hard disk is required.
For more information, contact AI WARE, Inc., 11000 Cedar Avenue, Cleveland, OH
44106 (216) 421-2380.



GSS Adds OSF To XVT Support


Graphic Software Systems Inc. (GSS) has released Extensible Virtual Toolkit
(XVT) v2.0, adding support for OSF Motif and character-based screens to its
existing roster of Microsoft Windows, OS/2 Presentation Manager and Apple
Macintosh environments. Version 2.0 also contains an enhanced feature set,
including color addressability, extensive text editing, dynamically modifiable
menus, child windows and a universal resource language capability.
The Macintosh, Windows, Presentation Manager and character versions of v2.0
range from $395 to $595. The Motif version is $1495. Customers can order the
product directly from GSS by calling (503) 641-2200.


db_Vista Adds MS Windows DLL


Raima Corporation's db_Vista III Database Management System, featuring both
relational and network model database technology, now supports Microsoft
Window's Dynamically Linked Libraries (DLL) with its database engine.
The ability to dynamically link library code into an application allows
multiple applications to share code segments.
Contact Raima at 3245 - 146th Place SE, Bellevue, WA 98007 (206) 747-5570 or
(800) 327-2462.


Canonizer Runs On HP Systems


The Canonizer, an automatic database normalization package for UNIX
applications, is now available for Hewlett-Packard 9000/300 and 9000/800
systems, running on HP-VX.
The Canonizer for the HP 9000/300 is $1295 and the 9000/800 version is $1995.
Volume discounts are available.
For more information, contact Six Sigma CASE, Inc., 14405 SE 36th St., Suite
210, Bellevue, WA 98006-1515 (800) 827-4462 or (206) 643-6911; FAX (206)
641-7501; e-mail canon@6sigma.UUCP.


New Interactive Profiler Featured In Borland Toolset


Borland International has released a new toolset, Turbo Debugger & Tools,
which includes Turbo Debugger v2.0, and the new Turbo Profiler (an interactive
profiler), and Turbo Assembler v2.0.
Turbo Debugger & Tools is a three-way productivity package for programmers who
need to get the bugs out of their programs, analyze run-time performance and
supports all popular compilers for MS-DOS.
Turbo Debugger & Tools will ship during the second quarter of this year with a
suggested retail price of $149.95. Registered users of Turbo Assembler & Turbo
Debugger v1.0, as well as the Turbo Professional Series, will receive a $59.95
upgrade offer.

Turbo Debugger & Tools require MS-DOS v2.0 or higher and a hard drive or high
density floppy disk drive. The toolset supports all popular video display
adapters and monitors. Minimum RAM requirements are 384K for Turbo Debugger,
384K For Turbo Profiler and 256K for Turbo Assembler.
Contact Borland at 1800 Green Hills Road, P.O. Box 660001, Scotts Valley, CA
95066-0001 (408) 439-1880.


Tool Automates C Documentation


Software Blacksmiths has released C-DOC v3.0, a set of software tools which
automate much of the process of documenting C programs. These tools produce
stand-alone documentation, reformat user programs and generate and insert
comment blocks of up-to-date documentation at the beginning of each individual
function.
C-DOC runs on any PC clone with PC/MS-DOS 2.X or later, one floppy disk (a
hard disk is recommended), and 256 K of memory (640K is recommended). C-DOC is
a single integrated program, but its component programs may be purchased
independently. C-DOC is $149 for the complete package and there is an
unconditional 30-day money-back guarantee of satisfaction.
For more information, contact Software Blacksmiths, Inc., 6064 St. Ives Way,
Mississauga, Ontario L5N 4MI (416) 858-4466.


Package Connects UNIX To LANs


Performance Technology has released Powerfusion, a software package that
connects a local area network link directly to standard AT&T System V
UNIX-based computers. A variety of LAN systems are supported.
With this software, personal computer users can run MS-DOS applications
against UNIX files and UNIX printers. UNIX programs can, in many LANs, print
out directly on any LAN accessible printer.
Typical packages are about $3000 with no incremental cost per workstation.
Contact Performance Technology at 800 Lincoln Center, San Antonio, TX 78230
(512) 349-2000.


Coromandel Offers Integra RDBMS


Coromandel Industries, Inc. has released Integra RDBMS, an SQL-based
relational database management system. Integra RDBMS is designed with an open
architecture to integrate data from other databases and file formats without
file conversions or importing data. Integra RDBMS provides ANSI-level SQL
facilities compatible with Windows and UNIX, and allows developers to develop
faster, smaller ISAM-based applications, and then move them to SQL at a future
time.
The components are available as separate, individual products designed to work
with specific front ends and other user-preferred products. Integra RDBMS
includes: an SQL Engine which needs 350K of memory; a front end that permits
users to create, insert, select and update tables interactively; a
nonprocedural report writer that allows users to imbed SQL select statements
to generate reports; and an applications development toolkit, consisting of a
preprocessor and a library.
Coromandel offers support for developers requiring customization of the
database engine. Source code for the product is also available. Integra RDBMS
costs $695 for MS-DOS, $995 for either Xenix or UNIX versions. For more
information, contact Narayan Laksham, Coromandel Industries, Inc., 108-27 64th
Road, Forest Hills, New York, 11375 (718) 997-0699; FAX (718) 997-0793.



Az-Tech Releases Protection System


Az-Tech Software has released Everkey, a hardware-based software protection
system which allows the developer to prevent illegal usage of his software by
providing a printer port "KEY" to authorized users.
Az-Tech has also released Evertrak v1.10, a software security system designed
to protect a developer's products by deterring illegal copying and thwarting
reverse engineering. Contact Az-Tech Software, 305 East Franklin, Richmond, MO
64085 (816) 776-2700; FAX (816) 776-8398.


Library Supports Semantic DB


Konexsys Corporation has introduced the KNET (Knowledge Network) Library v1.0,
a set of over 250 C routines to create, maintain and query an advanced
semantic data base. This object-oriented data environment is embedded in a
structure of meaning which allows many of the routines in the library to
exploit inferencing and inheritance. The KNET Library was designed to allow
multi-dimensionally linked knowledge to be added, deleted, or queried almost
instantaneously.
The KNET Library can be currently purchased for the Macintosh at the
introductory price of $495 which includes a developers license, 5 hours of
phone support, and a free upgrade to v1.1. Konexsys will port to UNIX, OS/2 or
other appropriate operating environments as requested. Applications of the
KNET Library are not recommended for MS-DOS or systems with less than 4Mb of
RAM.
For more information, contact Konexys Corporation, 3825 Academy Parkway South,
NE, Albuquerque, NM 87109 (505) 344-8155; FAX (505) 344-8891.


Saber Enhances Environment


Saber Software, Inc. has released Saber-C v3.0 with enhanced debugging
capabilities and a new error browser. Intelli-Corp, Inc. has licensed Saber-C
for internal use and remarketing in conjuction with the company's advanced
Knowledge-based systems product family. Saber has also made an agreement with
Platform Technologies for the distribution of Saber-C in Australia.
With the addition of object code debugging in v3.0, users can develop, debug,
and test C programs without ever leaving the Saber-C programming environment.
Contact Saber Software, Inc., 185 Alewife Brook Parkway, Cambridge, MA O2138
(617) 876-7636; FAX (617) 547-9011.


MetaWare High C And Pascal Support High Speed Cyrix Math Chip


MetaWare Incorporated's new Cyrix FasMath CX-83D87, an 80387 alternative math
coprocessor outperforms the 387 by over 50 percent on the Whetstone benchmark.
The Cyrix CX-83D87 is pin compatible and software compatible with the Intel
80387 math coprocessor. The 83D87 implements its floating-point primitive
operations in hardware. This approach allows the Cyrix chip to perform simple
floating-point operations at the same speed that the 80386 can perform integer
additions; square root, elementary functions, and transcendental functions are
evaluated correspondingly faster.
High C and Professional Pascal for MS-DOS 80x86, Extended MS-DOS 386-486, and
UNIX System V 386-486 also support the Intel 80x87 math coprocessors as well
as the Abacus chips from Weitek.
For more information contact MetaWare Incorporated, 2161 Delaware Avenue,
Santa Cruz, CA 95060-5706 (408) 429-6382; FAX (408) 429-9273; Telex 493-0979
(META UI).


EKS Updates JAKE Library


English Knowledge Systems, Inc. has released JAKE v1.2, a natural language C
library that allows application programmers to add a natural language front
end to applications.
The JAKE package is $495. Upgrades will be provided at no charge to registered
users of v1.1 and for $50 to registered users of v1.0. A tutorial and
interactive demo is available for $10.
Contact EKS, 5525 Scotts Valley Dr. #22, Scotts Valley, CA 95066 (408)
438-6922.


JMI Releases 860 Executive


JMI Software Concultants, Inc. has released the latest version of C
Executive*, a real-time, multi-tasking, ROMable kernel which supports a wide
variety of CISC and RISC processors. The new version of C Executive supports
the Intel i860 RISC processor.
For more information, contact JMI Software Consultants, Inc., 904 Sheble Lane,
P.O. Box 481, Spring House, PA 19477 (215) 628-0846.


SXDB Simulator/Debugger Exploits Add-In PC Card


Intermetrics, Inc. has released SXDB v5.0, a debugging system that uses a PC
add-in board to provide an integrated environment for simulating and debugging
microprocessor code for the Motorola 68000/010/020 and the Intel 8086/186/286
families. SXDB v5.0 allows execution and debugging of C code before the
prototype product hardware is available.
SXDB offers significant advantages over software-only simulation. Code running
in real-time on the dedicated SXDB simulation board can execute over 1000
times faster than under software simulation. Because SXDB uses the same
microprocessor as the final product hardware, authentic program behavior is
assured.
The simulation board features 256K of programmable RAM (expandable to 768K)
and offers a variety of machine-level monitoring and control features which
may be used directly by the programmer.
The SXDB product is currently available for the 68000 and 8086 families of
microprocessors. The product is available on the IBM PC host, and prices begin
at $2100 for the 256 RAM board with the XDB v5.0 interface. For an integrated
development environment which includes an optimizing C Cross Compiler, Macro
Assembler and Utilities, hosted on the PC, the prices begin at $1800.
For more information, contact Intermetrics at (800) 356-3594 or (617)
661-0072.


ADBUS Starts C Language SIG



ADBUS has formed a C language/file manager SIG (special interest group) for
developers and users interested in using data base file managers.
ADBUS, The Atlanta Data Base Users Society, Inc., is a non-profit educational
group formed to exchange information about the use and development of
microcomputer data base management systems. The group includes both novice
users and sophisticated programmers. ADBUS owns and operates its own Bulletin
Board Service using two computers and 180 Mb of storage. $25 annual dues to
ADBUS permit access to all SIG's and BBS by members.
For more information, contact Jack W. Thompson, 91 Wyldewoode Dr., McDonough,
GA 30253 (404) 393-5081.


Oasys Tools Run On System/6000


Oasys, Inc. has released software tools for native, cross and embedded
development for use with IBM's new RISC System/6000 running AIX v3.1. Oasys
has also signed an agreement with Ready Systems to include Oasys' Green Hill
optimizing compilers and the Assembler/Linker System for 680x0 and 88000 based
embedded system targets with Ready Systems' VRTX Velocity, their new software
development and run-time environment system.
For more information, contact Oasys at 230 Second Avenue, Waltham, MA 02154
(617) 890-7889.


Watcom Releases Version 8.0


Watcom has released Watcom C Optimizing Compiler and Tools v8.0.
New features with v8.0 include support for OS/2 and Windows, and a 386
source-level debugger. Advances have been made in performance, debugging
capability, multi-platform support, language extensions and
performance-tuning. C v8.0 is run-time compatible with Watcom FORTRAN 77.
Both C v8.0 and C v8.0/386 will be available in Standard and professional
editions. The professional editions provide extra features including OS/2 and
Windows support, the execution profiler, graphics library and 386
protected-mode compiler. The professional edition of C v8.0/386 provides the
386 source-level debugger.
First customer shipments for Watcom C v8.0/386 are scheduled for May 1990.
Watcom C v8.0 for 16-bit 80x86 systems will ship in June 1990.
Upgrades from standard to professional edition are $100 for C v8.0 and $400
for C v8.0/386.
Customers who purchase C v7.0 after Feb. 6 1990 will receive the upgrade to C
v8.0 standard edition at no charge.
Watcom C v7.0/386 early adoptors will receive C v8.0/386 standard edition
upgrades at no charge to all registered users.
Contact Watcom at 415 Phillip Street, Waterloo, Ontario N2L 3X2 (519)
886-3700; FAX (519) 747-4971.


Netwise, AT&T Plan C++ Library


Netwise, Inc. and AT&T's UNIX software operation have announced that they will
work together to ensure the interoperability of the Netwise RPC Tool and
AT&T's C++ language system. The goal is to define a new series of standard
object classes for both products that will make it easier for C++ programmers
to develop object-oriented applications for multi-vendor networks.
Contact Netwise at 2477 55th Street, Boulder, CO 80301 (303) 442-8280; FAX
(303) 442-3798.


TSR Offers 386 Library


TSR Systems has released a general-purpose C library designed specifically for
386 protected mode programming.
The "C"erious 386PM library, which is compatible with "C"erious Toolkit Plus
v2.0, currently supports Watcom 386 and the Phar Lap Assembler/Linker.
"C"erious 386 PM contains over 225 routines including video, keyboard,
windows, mouse, printer, sound, plus many miscellaneous categories. Most
routines are coded in both C and assembly, and optimized for protected mode
operation.
"C"erious 386 PM is $199.95, including source code.
For more information contact TSR Systems Limited at (516) 331-6336; FAX (516)
331-6377.


Sage Releases New Editor


Sage Software, Inc. has released the Sage Professional Editor, with a
windowing, mouse-driven interface. The editor is configurable and extensible,
and runs under MS-DOS, OS/2 and local area networks. The editor is shipped
with turnkey emulations of other popular editors and can edit files up to
100Mb in as many as 256 windows. The Editor uses a standard extension language
much like C, but is compatible with all programming languages. For more
information, contact Sage Software at 1700 N.W. 167th Place, Beaverton, OR
97006 (800) 547-4000; FAX (503) 645-4576.


Ultimate Buys Training Firm


The Ultimate Corp. (NYSE:ULT), a supplier of UNIX and Pick-based business
computer systems has acquired Hands-On Learning (HOLC), a purveyor of training
for UNIX, X-Windows, and C. Joint marketing activities have already begun.
Contact Ultimate Corporation at 717 Ridgedale Avenue, East Hanover, NJ 07936
(201) 887-9222; FAX (201) 887-9546.


Halo Toolkit Announced


Media Cybernetics, Inc. has released HALO v3.0 and HALO Window Toolkit, a new
object-oriented, event-driven graphical user interface (GUI) tool for graphics
and imaging applications developers.

Version 3.0 incorporates the most salient features of the HALO graphics
toolkit: extensive device support (nearly 270) and development environment of
more than 200 subroutines.
HALO v3.0 is $295 per copy and $595 for Microsoft Languages (C, FORTRAN,
Pascal). The package includes the HALO Programmers' Workbook, a self-paced
HALO tutorial. HALO for OS/2 is $695 and HALO Ada is $595-$995. Reseller
discounts and license fees may be obtained on a per quote basis.
For more information, contact Media Cybernetics, 8484 Georgia Avenue, Silver
Spring, MD 20910 (301) 495-3305, or (800) 992-HALO.


Silverware Releases Asynch And EMM Libraries


SilverWare has released SilverComm "C" Async Library, a developer's tool
designed to provide reliable high-performance interrupt-driven control of
asynchronous communications in C programs and the C EMM Library, an interface
to the expanded memory manager device driver.
The SilverComm C Async Library comes with device event monitor capablity that
permits execution of your C functions at interrupt time. The functions can set
flags signaling your applications of an important event. You can also create
device event monitor functions that process the interrupt.
The C Async Library is fully interrupt-driven to ensure reliable data
transfers. It incorporates a table-driven design for flexibility.
The SilverComm C Async Library supports IBM Dual Async, PC and Micro Channel
Versions of DigiBoard 4-, 8- and 16-port boards, the AST 4 port adapters, and
an unlimited number of COM ports. It also supports the Norton Guide database.
The Expanded Memory Manager (EMM) was created to handle applications that
require more than 640K of memory. The C EMM Library gives you easy access to
the power of EMM by providing a link between your C program and the EMM
driver.
Using EMM you can add up to 32 megabytes of extra space for data storage. No
assembler code is necessary.
The SilverWare C EMM Library supports three C compilers: Microsoft C,
Microsoft QuickC, and Borland Turbo C.
The suggested retail price of The SilverComm C Async Library is $249. The
SilverWare C EMM Library is $199. For more information contact John Halovanic
at (214) 247-0131, or David Irwin at (619) 450-1960.















































New Releases


CUG310 Little Smalltalk for MS-DOS


This Little Smalltalk, submitted by Henri de Feraudy (France) is part of
Smalltalk Express Ltd.'s (England) effort to bring the object-oriented
paradigm to the general public. They ported Professor Timothy Budd's Little
Smalltalk to three different, low-cost platforms: the IBM-PC, the Atari ST,
and a British machine, the Acorn Archimedes. This particular volume is for
MS-DOS.
The disk includes the source code and executable files but no documentation.
In order to learn the language, you can purchase Timothy Budd's book A Little
Smalltalk from Addison-Wesley Publishing Company. The program was compiled
under Microsoft C v4.0. The original UNIX version of Little Smalltalk is also
available from our library (CUG229 and CUG230).


CUG311 DB package


Ken Harris (WI) has contributed his database routines, DB package v1.3. The DB
package consists of a library of file handling routines that may be linked
with user applications. The routines support the four file organizations:
sequential (data records of fixed length are stored sequentially), index (data
records are stored in an ISAM type organization), random (data records are
stored using a hashed method), and variable (data records of variable length
are stored sequentially). The routines include not only basic database
manupilation routines such as addition, deletion, and search, but also sort.
The DB package also lets you develop a data object dictionary to create a
relation between two data records. The disk includes all C source code and
test files, and a users guide that describes each routine. The program has
been compiled and tested under Ultrix on a VAX station-2000 with gcc, using
Microsoft C v5.1 and Turbo C v2.0 under MS-DOS, and cc under UNIX on 3B1.


CUG312 Make-Maker


Contributed by Jim Yehle R. (CO), Make-Maker automates the process of creating
a makefile. Make-Maker is a series of awk programs that first scans C source
code for #include files to build a dependency list (a makefile component,
called a Dependency Generator), then scans a linker configuration file to
build a primary target dependency list (Object file Extractor), and finally
builds a full makefile. Since the linker configuration file is unique for each
linker, you must create your own object file extractor. The disk includes
object file extractors for the Turbo Link Response file and the Intel Linker
(binder) configuration file. The disk also includes Intel 80x86 assembly and
PL/M dependency generators, as well as a C dependency generator. All the awk
code takes advantage of the 1985 enhancements to the original, 1977 awk.


CUG313 STEVIE


Written by Tony Andrews (CO) and submitted by Michael Yokoyama (HI), STEVIE
v3.71 is an editor designed to mimic the interface of the UNIX text editor
'vi'. The editor was first written for the Atari ST, hence the name (ST Editor
for VI Enthusiasts). The current version of the program also supports UNIX,
Minix (ST), MS-DOS, and OS/2. The disk includes all C source code and
documentation.


UPDATES




CUG209 Simplex Curve Fitting


Some text files in the distribution disk included some strange characters. If
you have trouble compiling the program, strip the eighth bit of those
characters.
























We Have Mail
Dear Robert,
I subscribe to many journals -- but this is one I read first. It is in-depth
and correct, an excellent journal.
However, I must correct your response to Mr. Hagan's letter in the Jan. 1990
issue (page 138): Pascal, like C, stores multiply indexed arrays in row major
order -- just as C does -- and unlike Pascal Fortran. (I'm doing it, too!)
Nonetheless, thank you for excellence!
Sincerely,
David B. Teague
PO Box 2866
Cullowhee, NC 28723
Seems like I should re-run the letter from a few issues back that chastised me
for shooting off my mouth without researching the subject. I used to know
these things when I used to teach them. Apparently "used to" is the
significant fact. It's great that CUJ can be useful in spite of my flaws.
--rlw
Robert,
I've been with the magazine since the June '85 CUG issue. I like the mag and
the changes you have made to it.
I'd like to use the magazine to inquire if anyone has any code out there to do
Printed Circuit Board design and routine. I'd be happy to start out with
something imperfect so don't worry about the completeness of any submissions.
Thanks a lot.
Richard Howells
Farmarsh House, Cotmarsh,
Broad Town, Wiltshire
SN4 7RA, United Kingdom
Thanks for the kind words. Here's your request. --rlw
Dear Mr. Ward,
A couple of comments on the 3/90 issue (Volume 8, Number 3)...
Page 24 column 1 paragraph beginning "It was implied earlier...", sentence
beginning "Also, an increasing": This sentence by itself is correct, but its
implication in context is not. A function can be coded inline only when the
compiler knows of what the function's code actually consists. That is why one
must put inline function definitions in the .H files, rather than the .C
files, of modules. Clearly, use of inline functions cannot apply to so-called
"jump tables" (which really should be referred to as "call tables" since
functions are being called), since the compiler is explicitly ignorant as to
which function is being invoked by a given call and thus cannot include the
target function's call directly in the code it is compiling.
In other words, while you can do:
inline int max(int a, int b)
{
return a > b ? a : b;
{

void main()
}
int a, b, c;

/* Get values for a and b from
somewhere. */

c = max(a,b);
}
and expect the call to max by main to be compiled inline, you could not expect
to even try something like:
int anyfunc(int (*fn) (),int a,int b);

void main()
{
int a, b, c;

/* Get values for a and b from
somewhere. */

c = anyfunc(max,a,b); /* Same max
as above. */
}

int anyfunc(int (*fn) (),int a,int b)
{
return (*fn) (a,b);
}
and have the reference via *fn to max in anyfunc to be made inline, since it
is not clear when compiling anyfunc what function it will end up calling.
(Yes, a global optimizer might, but wouldn't for the more real example of the
call tables described in the article.)
In trying to understand what is or is not possible, I often resort to thinking
in terms of generic assembly code. So, while I can easily envision how the
former example is compiled into assembly code, the latter is inconceivable in
the general case (more than one caller of anyfunc.)

One possible optimization might be for the compiler to detect, based on global
optimization, what possible functions can be accessed through, in this case,
fn, compile them all inline, provide a true jump-table access path to them
within anyfunc, and convert into jump-table addresses all references to those
functions that end up going to anyfunc. Pretty sophisticated stuff.
In other words, for the most part, when you use call tables, you are going to
pay for those calls. But aside from machines like Prime 50 series, where
normal procedure calls are somewhat expensive (and feature-laden to an extent
not needed by C), speed should not be a problem. Even on Primes, one could
teach the compiler about short-calls, but in the call-table setup, this
requires some fancy footwork, also. (I haven't worked on Primes in years; they
probably already have this kind of optimization in their C compilers.)
By the way, I've adopted a similar approach quite often in the past, and am
currently using one in a large project I'm working on. I'm writing a front end
where the choice of lexer is not known until run time. I could have the
syntactic analysis call the lexer via a variable rather than use a switch
statement, but in fact I invert the whole approach -- I have the lexer call
the syntactic analysis functions by delivering tokens to them. They return the
next syntactic analysis function to call. The maintenance so far seems to be
easier. What really drove this design was the need for ambiguity resolution by
trying to parse one kind of statement, abandoning it due to an error, and then
trying another, and so on until one succeeds. For this, it seems easier to
have the lexer drive the parser than the other way around. (Though I'm fairly
confident it could have been done the other way around!)
Curiously, C seems to offer no way to cleanly define a type that identifies a
function whose return value is another function of that type. For example:
given 26 functions named A through Z, accessed via a variable whose type is
intended to be FN, so the caller just declares FN foo;, initializes to, say, A
via foo=A;, then repeatedly calls via foo=(*foo)(args);, there seems to be no
valid way of actually constructing the typedef for FN. I want to say "typedef
FN (*FN)();", but naturally this upsets a C compiler that doesn't know what FN
is. (I'm trying to tell it, "FN is a type of pointer to function returning
type FN", but C requires that I specify the return type before I specify the
type I'm defining. When they are the same type, you get this problem, and here
is the only legitimate need to do so that I know of offhand.)
Instead, I build up the definition by doing typedef void *FN_sigh_; typedef
FN_sigh_ (*FN_sigh_2_)(); typedef FN_sigh_2_ (*FN)();, which seems to be
enough levels to keep what should be unnecessary casting to a minimum. But one
still must say, for example "return (FN) K;" within one of the state functions
when it returns the next state. (I'm doing all this under THINK C 4.0 on my
Macintosh -- your mileage may vary.)
Page 46 column 2 paragraph beginning "Suppose Yact[T] is derived...": My
understanding of inheritance is that, if B inherits (is derived from) A, then
the statement "B is an A" should be true. My understanding of sailing is that
a sloop is a yacht, but a yacht is not necessarily a sloop. Thus, when picking
these arbitrary names, I would have chosen Sloop to be inherited (derived
from) Yacht instead of the other way around.
If memory serves, the inheritance tree for yachts should look something like:
yacht is a ship; //a yacht has
// masts for sails

sloop is a yacht; // a sloop has
// one mast

[bimast] is a yacht; // a bimast,
// a made-up name, has two masts

schooner is a yacht or a [bimast];
//a schooner has three masts,
// or two masts with the aft mast
// not shorter than the forward
 // mast

ketch is a [bimast]; // the aft
// mast, or mizzen, of a ketch is
//forward of the rudder
// yawl is a [bimast]; the mizzen
// is behind th erudder (not the
// same as the helm)
I guess this makes a schooner a good argument for multiple inheritance!
Oh, and thanks for the great magazine. By the way, my favorite article was
probably the belief maintenance system one. It talked about something I don't
think much about but am curious about, and it gave me enough information to
experiment if I wish, but I don't have to just to find it interesting.
Sincerely,
James Craig Burley
4 Mountain Gate Rd.
Ashland, MA 01721-2326
My graduate degree allegedly includes an emphasis in AI, but I hadn't seen the
Dempster-Shafer approach earlier, so I too found that article very
interesting. I'm not certain I understand your comments on the inheritance
issue; here in Kansas yachts and sloops all look alike. -- rlw
Dear C Users Journal,
Thank you for many interesting articles relating to C programming. People seem
to use a variety of naming conventions for function and variable names. I
believe one convention stands out above all the rest for ease of maintenance
and readability: The Stanford Naming Convention.
Because case is significant in C programs, the Stanford Convention suggests
using it to:
1. Differentiate between global and local identifiers
2. Separate parts of words
Global identifiers are initially capitalized, and local identifiers (including
qualified identifiers, such as field labels and structure tags) are initially
uncapitalized.
Identifiers made up from several words use capitalization to indicate where
each word begins, for example BlockSize would be a global identifier, and
numberColumns a local identifier. See the example below.
Example Function with Stanford
Naming Convention:
/*******************************/
/* UPDATE SYSTEM DATA BASE /
/*******************************/
unsigned int UpdateDataBase(typeOf-
DataBase)

int typeOfDataBase;

{
extern int SystemTime;
extern unsigned int ModifyDataBase();
extern unsigned int RejectDataBase();

int dataBaseStatus;


if( SystemTime > 1100 )
dataBaseStatus = ModifyDataBase
( typeOfDataBase );
else
dataBaseStatus = RejectDataBase
( typeOfDataBase );

return( dataBaseStatus );
{
Sincerely,
William T. Hedrickson
Global Wulfsberg Systems
6400 Wilkinsion Dr. Prescott, AZ 86301
Dear Editor,
The assistance received from The C Users Journal in my efforts to achieve
greater understanding of and greater competence in C programming language is
appreciated. With regard to my current projects, I am wondering if either you
or any of your readers might know of an inexpensive source of annotated source
code for any of the following three types of programs:
elementary spreadsheet
elementary sideways printing routine
elementary database interpreter
Also, do you know the address of A.I. Architects software company? Article
writers frequently refer to their products, but I have not been able to find
an advertisement from this company. Many thanks.
Yours truly,
Lee Shackelford
520 Messina Drive
Sacramento, CA 95819-2520
I don't know how elementary it is, but CUG volume 238 contains rotate.c, a
demonstration program which prints text sideways using Conrad Kwok's shareware
graphics library.
You can contact A.I. Architects at: One Intercontinental Way, Peabody, MA
01960. Phone (508) 535-7510 or FAX (508) 535-7512.
Tell 'em we sent you! --rlw
Dear Robert Ward,
This letter regarding what I believe are coding errors published in the July
1989 article on "Accessing the MS-DOS Master Environment" by Scott Ladd may be
inappropriate since someone may have hitherto brought these errors to light
since I missed, much to my chagrin, the issues of The C Users Journal between
the August 1989 one and the February 1990 one due to letting my subscription
lapse. If, however, no one else has reported these errors, would you be so
kind as to do so in a forthcoming issue of the magazine for the benefit of
other programmers who have adapted Mr. Ladd's routines for their own use and,
especially, in conjunction with Leor Zolman's article in the February 1990
issue of "Tools For MS-DOS Directory Navigation."
The first error is a quite serious one leading to the dreaded Memory
Allocation Error, cannot load COMMAND.COM etc. In the M_DELENV routine after
the loop that looks for the matching environment string and the test for
whether it was found or not, the next while loop, after the
/* otherwise, copy the reaminder of
the environment over name */
comment, should be within an if statement:
if (*e1)
{
while (!((*e1 == NUL) && (*e1 + 1)
== NUL)))
{
*e2 = *e1;
e2++;
e1++
}
}
The reason for this is that the end of the environment may have already been
reached and e1 be pointing to the second of the double NUL bytes rather than
another environment string.
The second error is less serious and occurs in the M_PUTENV routine where the
remaining space in the environment is calculated, after the comment
/* get the amount of remaining space */
as
l = env_len - l - 1;
I have it instead as
l = env_len - l - 3;
The double NUL bytes at the end of the environment don't seem to have been
taken into account in the first calculation. If I am wrong about this I will
be glad to be corrected. I say less serious because the error will only cause
havoc when the environment is nearly filled up and the environment string to
be added is 1 or 2 bytes too long. The first error causes problems whenever
the deleted environment string is the last one in the environment chain and
double NULs occur again somewhere off in the wild blue yonder (past the end of
the environment memory block).
Finally I would like to say that the article was extremely clear and
well-written despite the coding flaws. If others would just slow down and
explain things clearly to the reader and not assume that the reader knows
everything that he or she is writing about immediately, the quality of the
articles in The C Users Journal would vastly improve.
Yours truly,
Edward Diener
I'm glad you found Scott's work useful. I appreciate your taking the time to
pass along these corrections. As for explaining everything, we always try for
that ... the problem is striking a balance between offending some by assuming
they know too little and losing others by assuming they know too much. --rlw





































































Action Diagrams


Louis Barnett


Louis Barnett is president of MetaLogic Corporation, a 10-year-old
microsystems consulting firm specializing in a variety of fields. He has been
involved in microcomputers since the early '70s. Contact Louis at 3566
Vigilance Dr., Rancho Palos Verdes, CA 90274 (213) 544-1278.


A system developer's job description should include a sentence about bringing
order out of chaos and complexity. Even though some popular CASE (Computer
Aided Systems Engineering) tools are genuinely useful, the CASE approach is
often more useful at promoting a processor methodology -- and producing
methodology-specific diagrams -- than at producing results.


Choosing A Case Tool


Frequently, design methodologies break down when they are stresstested with
large scale, complex problems (read "real-world"). Perhaps the methodology
isn't quite flexible enough to let you model your system the way it should be.
Perhaps the final result is increasingly obscure output. The bottom line is
that many CASE tools, techniques and methodologies get in the way of clear
thinking.
Therefore, I have developed a simple set of standards to help me judge whether
a particular approach is useful to me. When I evaluate a tool, I ask:
Is it easy to learn and apply?
Does it enhance the expression of my ideas; does it help me express them more
succinctly, clearly and accurately?
Does the methodology adapt to my evolving understanding of a problem? It
shouldn't break down or become cumbersome as complexity increases.
Does it complement other techniques with which I have become comfortable -- is
there a net gain?
Are the end results better than those possible by other means? That is, are
the results more complete, robust and reliable?
There are other criteria, but these questions seem to me to be the most
relevant.


Action Diagrams


I discovered action diagrams some time ago. They satisfy the five criteria
listed above, and I've used them in my consulting practice with excellent
results. So let me tell you what action diagrams are, and why they have worked
well for me.
At first glance, action diagrams are simply outlines with brackets (see Figure
1). Beyond that, though, I believe they are a language that promotes more
effective and efficient thinking. Action diagrams will help you organize and
structure your thinking -- something that's critical when you're confronted
with new and complex problems. Put another way, action diagrams are a language
for talking about and modeling systems.
A closer look at Figure 1 shows an algorithm for a topological sort in action
diagram format. Notice that comments, preceded by *, serve the usual purpose
of explaining what is going on. The single line brackets (also called control
brackets) delimit the entire algorithm, the initialization section, and the
algorithm itself. This major categorization serves to clarify the organization
of the diagram by grouping related details.
Within the initialization bracket you see a double-lined bracket that
represents iteration. In this case, the steps within the bracket's scope are
performed for each ordered pair in the input, analogous to a for loop in C.
Once the loop terminates, a single sequential action is performed.
The final section, in which the topological sort order is actually
constructed, relies predictably on nested loops and includes a conditional If
statement.
There are several lessons here. First, the algorithm is clearly and concisely
described in a language-independent fashion. The logic of the algorithm is
easy to follow thanks to the control structure brackets; action diagrams are
very visual. Finally, action diagrams aren't hard to create; they're succinct,
high-level aids to structured programming.
Action diagrams are typically created by building up layers of detail, using
the brackets to signify logical groupings, repetition and decisions. As the
diagram evolves, lower levels of detail can be nested within higher level
control brackets.
My action diagrammer program inserts control brackets for the standard control
structures into a diagram. The program can also stretch the brackets to
enclose other structures, or delete, copy and move brackets, along with
everything that is contained within their scope. Additionally, I can define my
own language (C, Modula, structured English, etc.) to correspond to the
bracket styles. Illustrations of the various bracket styles using C as the
design language are found in Figure 2.
What isn't obvious from Figure 1 is the ease with which it was constructed
using an automated action diagrammer. The one I use (called Action Designer,
which I wrote) allows me to view my work at any level of detail with just a
few keystrokes. That is, I can hide or display the diagram at the desired
level of detail. It works like a special outline processor, one that knows the
rules of structured design and programming. As you can probaly guess, I can
also insert, delete and modify control brackets as needed.


From Concept To Code


Going a step further, suppose that I need to design some functions to
implement queues (called for in the sort) in C. I could do this as an
extension of the sort diagram, but I'll show it as a separate diagram for
clarity.
No matter what programming language I use, the first step is to specify and
design the algorithm (Figure 3). My first cut is very clean. Keeping things
simple and high level, I define the general method and data structures. I
defer the implementation details until I'm ready to describe them.
Now, I'd like to proceed from design to code in smooth steps, adding detail
incrementally (Figure 4). As you can see, I've used C programming constructs
instead of the more generic English of Figure 1.
At this level, coding proceeds pretty much as it would without the diagram --
with an important exception. As I work, I'm able to suppress lower-level
detail to examine the design or to print the specification only (Figure 5).
The view from Figure 5 is helpful in two ways: I can always collapse the
diagram to figure out where I am and where I want to be. Such a view
essentially comprises the program's external (printed) documentation. It
should be obvious that action diagrams and C go well together since they share
the same highly structured notation.
In Figure 6, I have embedded the design in the finished code. In fact, my
internal (embedded) documentation and external documentation are always
consistent -- change the program and you can immediately generate up-to-date
documentation. This may be one of the most important benefits of action
diagrams. Can you cite many examples of internal and external (if it exists)
documentation being up-to-date and consistent?
To generate the code, the brackets are stripped from the diagram, generic
comments are converted to programming comments, and since everything else is
programming code, the result is compilable code. An automated action
diagrammer lets me do all that.


The Payoff


How did the action diagram help? Well, I was able to design top down and
implement incrementally. That is, I started with the program's overall design
as a general framework and then coded incrementally -- modularly, if you
prefer. I didn't feel restricted to programming in a linear fashion. In other
words, I jumped around. I was able to move back and forth between high and low
levels of detail as needed. Furthermore, if I need to explain my program to
either a potential client or a colleague, I can collapse the diagram to the
appropriate level of detail.
Action diagrams help in another way. If my diagrammer is set up for the
programming language that I use, then it will always insert proper control
structures. In C, for example, braces will always be balanced. Action diagrams
can help enforce structured design and programming rules.
These advantages may not seem to be particularly relevant for small tasks. But
remember that the specification and design phases of most large projects
comprise one third to one half of the project life cycle. A good portion of
this time is typically spent going over what has been done and still needs to
be done. Then a lot of effort is expended converting what has been written for
analysts and users into something that programmers need. Well, that's the way
it is supposed to work. We know better. So, wouldn't it be smarter to use a
tool like an action diagram to record specifications and designs and use them
as the basis for coding?
In practice, I don't do all my coding using Action Designer. When I need to, I
import code developed elsewhere. Usually, I use action diagrams to prepare
specifications and designs. But action diagrams are also very effective in
working out complex algorithms. I've also found them to be especially useful
when I need to clarify a procedure or policy with a client. However, the end
result is almost always a clearly written and documented program -- and one
that's easy to maintain.



Future Directions


In spite of what I've said above, I'm no advocate of programming low-level
code within an action diagrammer. Most programmers I know just don't work that
way. However, some action diagramming software contains CASE-like enhancements
that actually generate code from a specification. Others go beyond simple
syntax checking and perform some correctness analysis.
To ensure correctness, a diagrammer could incrementally validate process
(function and procedure) inputs and outputs, build a data dictionary as coding
proceeds and validate function and variable usage on the fly. It could ensure
that function parameters and return values were used elsewhere in a type-safe
manner. This isn't trivial, of course, but it's not impossible either.
To add an integrated ability to reuse code, simply establish hot links to a
source code control system. Then, to use a proven piece of code, you would
reference it in the diagram -- just as library functions are referenced in C.
Generating code directly from specifications requires a lot more work, but
action diagrams could provide a suitable environment.
I'm extending Action Designer to facilitate hot linking code modules so that
more flexible code generation will be possible. I'm also exploring other
enhancements such as hyper links to support "hyper navigation" through related
diagrams and reusable logic modules.
Action diagrams are not my only CASE tool; I also use data flow and state
diagrams, structure charts, and other tools as required.


Available Tools


There are a few commercial action diagrammers. The one with which I am most
familiar is Action Diagrammer from Database Design Inc., which sells for about
$495. Other commercial diagrammers are available, but usually as only a small
component of very large and expensive CASE systems. You can obtain the
shareware version of Action Designer (for MS-DOS) on CompuServe in the
Computer Languages (CLMFORUM) and IBM Applications (IBMAPP) forums and from
BIX (SOFT.ENG).
For more about action diagrams, I suggest you read Action Diagrams by James
Martin and Carma McClure (Prentice-Hall).


Something For Everyone


Before closing, I'd like to point out that action diagrams aren't limited to
programming. Database designers will find them useful in working out and
documenting their data models. Management professionals can use action
diagrams to design policies and procedures for their operations whether
automated or not. Technical writers can use them to describe how to do almost
anything. Even trainers will find them helpful in teaching complicated tasks.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
/* Implemenation of a queue as a linked data structure */
/* Operations supported: */
/* enqueue(x) place item x at end of queue */
/* head return the item at head of queue */
/* dequeue remove next item from front of queue */
/* empty return true iff the queue is empty */
/* Data structures: */
/* qHead pointer to the head of the queue */
/* qTail pointer to the tail of the queue */
/* qItem queue member structure */
/* next pointer to next item on queue */
/* anItem the item itself (must be proper type */
/* to contain type of items on queue */

typedef long member; /* storing longs on this queue */
typedef struct item item;

struct item
{
 item next;
 member anItem;
};

item quhead = NULL, *qtail = NULL;

 /* Functions: */

/* add an item to queue */
void enqueue(item x)
{

 /* Allocate a new item structure */
 item new; /* pointer to new node */

 new = malloc(sizeof item);

 /* Place the item on the queue */
 new->item = x;
 new->next = NULL;

 If(empty())
 {
 /* Queue is empty, point the queue head and tail at
 the new item */
 qHead = new;
 qTail = new;
 }
 Else
 {
 /* Queue is not empty, insert the item and point
 everything to it */
 qTail->next = new;
 qTail = new;
 }
}

/* remove next item from queue */
void dequeue()
{ /* head is always next item */
 qHead = qHead->next;

 If(qHead == NULL)
 /* queue is now empty */
 qTail = NULL;
}

/* return item at head of queue; don't remove */
member head()
{
 return (qHead->anItem);
}

/* return true if the queue is empty, false otherwise */
int empty()
{
 return (qHead == NULL);
}

















Data Hiding And Abstraction


Bryan Glennon


Bryan Glennon has been designing and developing software professionally for
the past nine years. He consults for companies such as McDonald's, Rockwell
International and Ameritech Applied Technologies. For the last six years, he
has worked almost exclusively in C, on various platforms. Bryan can be
contacted at P.O. Box 841, Bensenville, IL 60106, (708) 595-6059.


When designing a library that will be available to a large number of
applications or programmers, you should generally hide the library's internal
structures and data representations from the user. Data hiding allows the
library to evolve with minimal impact on the users of the library. However,
the user must be able to access some or all of the data stored in the library.
This article will describe the benefits of hiding the actual internal data
structures of a library, and of providing abstract objects to library users.
To illustrate my point, I'll present several of the abstractions in a forms
management system.
Data hiding is the process by which a library's internal data elements and
structures are made inaccessible to the users of that library. If the data
item's internal representation is always hidden from the user, then changes to
the internal representation will have a minimal impact on the library's users.
Hiding the distinct data elements that a library uses increases the library's
independence from the application code that relies upon it. Variables are
usually hidden by restricting their scope. In C, data hiding can be
accomplished by using static variables. When declared static, a variable's
scope is limited to the source file in which it is defined.
Data abstraction, on the other hand, is the process that allows the user to
utilitze the library's otherwise hidden data. Data abstraction creates a data
type, or abstraction, from a group of individual data elements.
The elements that comprise an abstraction are usually related in some way, and
in C, the programmer generally provides a function or set of functions to
manipulate these abstract data items. This set of functions comprises the
"operators" which can be applied to the abstract type. For example, in a
graphics application, you might define an abstract data type called a POINT
corresponding to the x- and y-coordinates on the screen. The library user
would only "see" these abstract data items. With this level of abstraction,
the library is free to change the actual internal representations of the
items, as long as the abstract items created remain constant.
Abstractions also allow library users to deal with more meaningful items.
Again, using a graphics library as an example, one abstract data type could be
a circle, specified by a function accepting as arguments the center and
radius. Library users will invariably prefer to deal with the function than
the actual internal representation, a collection of points comprising the
circumference.
There are other benefits to incorporating data hiding and abstraction into
your library: internal consistency is enhanced, you increase the isolation
between your library and your application, and you tend to reduce maintenance
costs.
The internal consistency is increased since data structures can be changed
throughout the library without worrying about the external users. When
applications access the actual primitive data items used in a library, the
library would either have to use these representations throughout or convert
them to the most appropriate representation internally. With abstract types,
however, the programmer is free to change the library's internal
representation. After all, the abstract types will still be based on the
internal representation, even though it's modified.
Your library's isolation is also increased since no external functions access
any of the internal library data items. By isolating the library from the
application, the library can evolve without impact on the application.
The increased consistency and isolation both help to reduce maintenance costs.
The cost of maintaining the library is decreased since no external constraints
are imposed on the internal data structures. The cost of maintaining the
application is decreased because changes to the library will not affect the
application code. Furthermore, since the abstractions are derived types, it's
easier to keep them consistent than to keep the primitive data elements
constant.


A Forms Management System


The following example shows how I used data hiding and abstraction to enhance
a forms management system, which required:
providing a consistent look and feel to the ultimate end user,
providing a clean, easy interface for the application programmers,
hiding the details of screen manipulation and I/O from the application
programmers, and
designing a library that would be easy to expand and maintain over the life of
the project.
This design was based on the following main data types or abstractions:
Labels. These would be static areas of text, not modifiable at runtime.
Fields. These were areas where the user could provide input and where values
could be placed by the application program.
Screens. Also called forms, these were collections of labels and fields.
For this project, I created several tools to assist in the data hiding and
abstraction process. The main tool was a forms editor that allowed the
programmer/designer to create screens interactively. The programmer enters
text onto the screen and specifies fields by pressing a function key that
makes a data entry form appear. This form collects all of the information used
in a FIELD structure. Once the designer is satisfied with the screen layout,
placement, and types of fields, he or she can save the form to a disk file.
The file will be processed by an open_screen() function call at runtime. I
also created tools to test forms, change field access order, and generate
header files from a form file.
At the application layer, the programmer manipulates screens which are
referenced by a screen id, a small positive integer. For finer control, the
programmer can use any of a number of field-level routines. He or she can
reference the form either with manifest constants (generated by a tool) or by
the assigned field name. Labels are not accessible to the programmer at
runtime. All other details are hidden in the library. For instance, the
programmer does not have to know the type of data expected by a field; this is
handled by the library.
The first abstraction is that of a LABEL. Since labels aren't accessible to
the application programmer, I won't cover them in detail. I present the LABEL
structure in Listing 1 only because it is referenced in the SCREEN structure.
Actual labels are created by the forms editor tool, during form definition.
The second abstraction, and one that can be manipulated by the programmer, is
the FIELD. A field is a fixed length area on the screen where the user can
enter data (if allowed by the field type) and where the application can
display data. Fields are also created by the editor tool used to create a
form.
The programmer can reference a field in two ways: by the name assigned during
form creation, or by a field number. For simplicity, I created a tool to
generate manifest constants for all field numbers. The forms editor tool also
allows the programmer to change a field's location independently of the
application code.
The programmer manipulates fields by using library functions. I created
functions to get the data currently in a field, to place data into a field, to
change certain field attributes, and to associate functions with a field for
automatic execution by the library.
Without the abstraction of a field (see Listing 2), the programmer would have
to know where to place the cursor when collecting data from each field, how
long each field is, and what type of data is allowed, e.g. integer, floating
point, alphabetic. He or she would also be responsible for all input from the
terminal device. As you can see in Listing 2, there's a considerable amount of
data hidden by the field abstraction.
The only way to access items in this structure is from the library. No direct
programmer manipulation of these items is allowed; the data items are
completely hidden.
There are several other advantages to hiding the internal representation and
restricting programmer access. Since the library handles all input, all fields
behave the same. This goes a long way towards satisfying the design
specification of a consistent user interface. The application programmer
simply can't change how data is accepted -- except as allowed by the library.
Furthermore, it's impossible to accept more input than the field can hold,
since the library bases the input on the length hidden in the field structure.
Even routine processing of error messages and requests for help can be
guaranteed consistency, since they, too, are handled by the library.
The third abstraction is a SCREEN (see Listing 3). A screen contains, among
other things, a list of labels and a list of field abstractions. Screens are
opened, closed, hidden (un-displayed), populated, and read by the application
program through library functions calls. The actual data associated with the
screen -- the window, relative location of the fields, the static text,
default field access order -- are all hidden within the library. The
programmer can read all fields with a single function call. The internal
library routines that manipulate data actually use the field abstraction and
associated library function calls.
To display a screen, a screen name is sent to a library routine. The screen
definition file is located, the screen opened, and a screen descriptor
returned, which the programmer uses to reference the screen. Listing 4 shows a
sample application that opens a screen, associates an edit function with the
age field on the screen, and collects data until the user signals completion.
Listing 5 shows the screen layout, and Listing 6 shows the header file that is
generated from an applicable form definition.
Designs that employ data hiding and abstraction can also save development
time. Even as I was developing the library, applications programmers were
writing the code that used the library. As long as the interface remained
constant, I was free to change the internal structures. When an application
program issues a get_field() or get_screen() function call, the appropriate
data is returned, without the application having to know about the screen
layout or the internal library structures. As long as the application was able
to reference fields and screens by name and/or id, I was able to change the
library as needed.
To summarize, data hiding is the process of making certain information
unavailable to application functions. Data abstraction is the process by which
new data types, or abstractions, are created. These new data types are then
manipulated through a well-defined set of function calls.
I have found the concepts of data hiding and abstraction very useful, not only
in the application described, but in many others as well. A clearly defined
library interface, along with a well selected set of abstract data types,
reduces the pain of library implementation and maintenance.

Listing 1
typedef struct {
int row, /* Row in window */
col; /* Column in window */
long attr; /* Display attribute */
char *text; /* Label text */

} SML_LABEL;



Listing 2
typedef struct {
int row,
col,
len,
type, /* Bit pattern indication data type */
e_attr; /* Bit pattern for entry attributes */
long f_attr, /* Foreground display attribute */
b_attr; /* Background display attribute */
BOOL stat; /* TRUE if changed; else FALSE */

char pad,
*name, /* Assigned field name */
*dflt, /* Default assigned at creation time */
*picture, /* Picture of regular expression */
*buffer, /* Address of field contents */
*comp_rgx; /* Compiled regxpr for REGEXP fields */

BOOL (*pre_func)(),
(*edit_func)();
int (*post_func)();

} SML_FIELD;


Listing 3
/* Screen structure */


typedef struct {
int sid, /* Screen id */
s_row, /* Normal row offset */
s_col, /* Normal column offset */
height, /* Screen height (in rows) */
width, /* Screen width (in cols) */
num_labels, /* Label count */
num_fields, /* Field count */
curr_field, /* Current field number */
sr_id, /* Scrolling region id */
first_sfield, /* First scrolling field */
last_sfield; /* Last scrolling field */
char *name;
int version,
a_exit; /* Auto exit on last field flag */
SML_LABEL *label;
SML_FIELD *field;
WINDOW *swin; /* Curses window */
void (*pre_screen)(),
(*post_screen)(),
(*scroll_fore)(), /* Function for forward page */
(*scroll_back)(); /* Function for backward page */

} SML_SCREEN;


Listing 4
#include "empl.h" /* Generated by a tool */
#include "sml.h" /* Library defines, etc. NOT SHOWN */


main(argc, argv)
int argc;
char *argv[];
{

int sid, /* Screen identifier */
status;
char *data[EMPL_MAX_FIELDS]; /* Pointers for accessing fields */

init_sml(1); /* Initialize the library */

/*
* Open a screen
*/
if ((sid=open_form(argv[1])) == SML_ERROR){
fprintf(stderr, "Error opening %s\n", argv[1]);
exit(1);
}

/*
* Display it. The 0,0 is a row, columns offset on the current
* window where the screen is displayed.
*/
if(display_form(sid, 0, 0) == SML_ERROR){
fprintf(stderr, "Error displaying %s\n", argv[1]);
exit(1);
}

/* Set up the custom edit function */
set_edit(sid, FO_EMPL_AGE, valid_age);

/* Get the data */
do{
status = get_screen(sid, data);
/*
* Do something with the data. Write it, print it,
* whatever...
*/
}while(status != SML_EXIT); /* Loop until the user is done...*/

cleanup_sml();
}


Listing 5
Employee Data Entry

Name: _______________ Phone: (___)___-________LINEEND____
Address: _______________
_______________
_______________ Age: ______LINEEND____


Listing 6
/*
* empl.h
* Screen vers: 000
*/


#define EMPL_NUM_FIELDS 6

/ * FIELD OFFSETS IN SCREEN */

#define FO_EMPL_NAME 0
#define FO_EMPL_ADDR1 1
#define FO_EMPL_ADDR2 2
#define FO_EMPL_ADDR3 3
#define FO_EMPL_PHONE 4
#define FO_EMPL_AGE 5

/ * FIELD LENGTHS */

#define FO_EMPL_NAME 22
#define FO_EMPL_ADDR1 22
#define FO_EMPL_ADDR2 22
#define FO_EMPL_ADDR3 22
#define FO_EMPL_PHONE 14
#define FO_EMPL_AGE 2











































Structure Charts


Joe Celko


In the old days, when programmers lived in trees and ate their young, we were
supposed to design programs with flowcharts. Today, we no longer use
flowcharts; in fact the ANSI standard for flowchart symbols expired several
years ago and was not renewed.
Why did flowcharts fall out of favor as a design tool and what replaced them?
The first obvious problem with a flowchart is that it became too physical, too
soon in the design process. The symbols were largely for physical devices, not
logical functions, so the design went to implementation immediately.
Secondly, the relationships between flowchart elements were always ambiguously
defined. Do connecting arrows between symbols send data or control or both?
You could guess in many cases by reading the block captions, but more often
than not, you really had no idea.
Thirdly, flowcharts gave only a minimal hint about packaging the modules. By a
module, I mean a unit of code which can be referenced by a name. By packaging,
I mean how a module is implemented in the system. For example, a sorting
module could be packaged as a program, a procedure, an included text file, a
task, or a COBOL paragraph.
A fourth problem was that no two programmers ever used flowcharts the same
way. One programmer would give such detail that his final code was in the
blocks on a statement by statement basis. At the other extreme there were
programmers who would use such high level concepts in their blocks that the
flowchart was useless (imagine a single block with the phrase "do payroll" in
it). This meant that flowcharts neither provided support for top down design
nor for any sort of hierarchical leveling of the problem.
Yet another problem was that programmers rarely used flowcharts as they were
intended. Flowcharts were a necessary evil, drawn after the program was
written and not before. Even Daniel McCracken admitted that he did the
flowcharts in his Fortran, Algol and Cobol books after the fact. The industry
finally collectively admitted this and began to sell flowcharting programs
which would produce an overly detailed flowchart from the finished source
code.
Finally, flowcharts were virtually impossible to update on paper. This not
true any longer, since we have cheap graphics packages on microcomputers, but
it was a real hassle which people avoided by never updating them during the
maintenance phase. That was not as big a problem as you might think; research
showed that maintenance programmers got no help from them anyway.
In the 1970's Larry Constantine introduced structure charts as replacement for
flowcharts. The structure chart method sets up an abstract model, which gives
the programmer manageable levels of complexity, and the ability to decide on
packaging as part of the design. Today, most CASE packages support a version
of structure charts.
The basic structure chart symbols (see Figure 1) are evolved from the old
flowcharts, so that they have the advantage of least surprise to the user. The
most basic symbol is a simple rectangle for a module. (Modifications to the
module symbol will be discussed later.) Every structure chart starts with a
single, topmost module which is the entry point to the process being
described. Subordinate modules fan out from this module, creating a diagram
which resembles an organizational chart. At the very bottom of the diagram,
the arrows will point to a relatively small number of common library modules
which are used by many of the higher level modules. The overall shape will
tend to be a sort of "onion dome" which may be tall or flat depending on the
nature of the problem.
Notice the word "invocation", rather than "procedure call"; invoking a module
simply means causing it to execute. You can invoke a mocule by calling it,
running it, or dropping into it in normal program flow.
Invocation between modules is shown by downward pointing arrows and only one
module is assumed to be active at a time. The number of arrows leaving a
module is its "fan out" and the number of arrows entering is its "fan in". No
order of execution is implied by the order of the modules at each level. No
internal logic is shown for the modules either. All those considerations are
implementation and not design.
As in flowcharts the invocation lines in a structure chart are pipes for the
flow of both data and control, but unlike a flowchart, the things flowing are
explicitly shown beside the invocation line. Data is represented as an arrow
with an open circle on one end. Control is represented as an arrow with a
solid circle on one end. As a mnemonic, I think of the data arrow as a spoon
with a load of data in it. Some older texts will use simple arrows without
circles for both purposes.
Another symbol which you might see is hybrid data, which has a circled dot on
the end of the arrow. Hybrid data acts as both data and control. In early
versions of Fortran, there was no way of recognizing the end of a card file.
So, we would put a final card in the deck with all nines in one field. The
READ statement was then followed by an IF statement which would jump to an
exiting module when the nines were read. That field was hybrid data. Using
hybrid data is now considered such bad programming practice that you might
never have seen it, especially if you are younger than 25.


Module Symbols


A library module is shown by a rectangle with parallel lines on the sides,
another carryover from the old flowcharts. A library module might be a
subroutine library program, an include file, or a program copied from a
magazine -- you don't have to write it or understand it's insides.
A small diamond on the lower edge of the invoking module (also borrowed from
flowcharting) indicates that the invoking module has the option of making a
selection from several subordinate modules.
Iteration, or loops, are shown with a horizontal loop on the bottom edge. Some
programmers attach a numeral to the loop to show how many times the module
executes. Recursion is shown with an invocation arrow which loops back into
the top of the module from which it came. The mistake most often made in a
recursive module is not showing the parameters on the recursive invocation.
Co-routines are shown by a set of invocation arrows, each going toward the
other routine and making a loop back to the start. This notation is not often
used, because co-routines are rare outside of operating systems or other
machine level programs.


Global Data


The symbol for a global data structure is a circle drawn on the edge of the
module with data and control arrows going in and out of it as needed. These
data areas can be labeled, if it is appropriate. For example, Fortran COMMON
areas and a call to the system time clock would be shown this way.
In the early days of structured programming, we were told to avoid global data
structures because they were dangerous. The data could be changed by any
module without the knowledge and consent of any other module. Larry
Constantine now reports that research shows that parameters should be used for
one level of ordination, that the programmer has a choice at two levels, and
that if data must be passed over three or more levels then global structures
are probably easier to maintain.
A module which is all data, such as a table, can be shown as a rectangle with
outward bulging sides. I like to use the old flowchart device symbols (paper
output, disk files, etc.) if I know how the data is stored. Yes, I know this
is getting into implementation in the design phase, but old habits die hard
and those old symbols are easy to read.
Once you become familiar with the graphic symbols, drawing the diagrams is
really pretty easy; you can get stock templates and graph paper from any
office supply house. Most CASE packages also support most of the symbols
discussed here.


Structure Chart Vocabulary


The scope of control of a module are all of the modules which it invoked
beneath it. The scope of effect of a module are all of the modules to which it
passes control decisions. Scope of effect is not always a subset of scope of
control, but we would prefer that it were.
Flow that goes uphill is called efferent and flow that goes downhill is called
afferent -- confusing at best. Look up the difference between "affect" and
"effect" in a grammar book. Transform flow sends a flow from a boss to a
subordinate and back again; we can assume that the subordinate module changed
it in some way. Coordinating flow takes a flow from one subordinate and passes
through the boss to another subordinate.


Design Metrics


Coupling is a measure of how modules are stuck together. The good news is that
everyone agrees that modules should be loosely coupled. The bad news is that
no one agrees upon the scale to use for coupling.
Gane and Sarson use only three kinds of coupling; data, control and a
catch-all external/content/pathological coupling. Data coupling directly
passes data between the modules, and control coupling directly passes control.
The final catch-all class uses external data structures or strange
invocations. Data coupling was best, afferent control coupling next best,
efferent control coupling a poor third and the reminding ones so bad as to
never be used.
Glenford Myers uses a different scale in his work and has attempted to assign
numeric values to different types of coupling. These numbers let him estimate
the probability that a change in one module would require a change in other
modules of the system. In order of preference, the Myers scale is:
Identity. Two modules are really the same module. This is handy for some
calculations, and you cannot get a tighter coupling.
Content. One module contains the code for the other.
Common. The modules share a common data area, which other modules can also
access. Example: FORTRAN COMMON.
External. The modules share an external data item, which only they can access.
Example: C externals.
Control. The first module directly controls execution of the second. Look for
a control flag being passed.

Stamp. First module passes a data structure to the second module. Example: a
user typed parameter in Pascal.
Data. One module passes a data item to the second. A simple scalar parameter
in almost any programming language.
Yourdon and Constantine suggest this taxonomy:
Common environment coupling -- all modules share a global data structure.
Content coupling, which comes in two flavors:
Lexical inclusion -- the code of one module is completely inside the other.
Partial content overlap -- both modules share an area of common code.
Data coupling -- also called input-output coupling, equivalent to Gane and
Sarson's data coupling.
Control coupling -- the same as Gane and Sarson's control coupling.
Hybrid coupling -- one module modifies the statements of the other.
A second important design metric attempts to measure how well a module is "put
together". Glenford Myers called this quality module strength and Larry
Constantine calls it cohesion.
In the early days of structured programming we tried to keep module size
small. The magic number was either 50 or 100 lines of code, corresponding to a
single or double page printout. The idea was that small modules were easier to
maintain and could be reused more easily.
However, when researchers finally got around to looking at what modules were
reused or modified from one system to another, they found that module size has
little to do with maintainability and almost nothing to do with reusability.
The cohesion or strength of the module is far more important.
The Yourdon and Constantine scale of cohesion, from worst to best:
Coincidental -- Module does several unrelated jobs.
Logical -- Module does several logically related jobs.
Temporal -- Module does jobs executed at the same time.
Procedural -- Module does jobs are part of the same procedure.
Communicational -- Module does jobs which share data.
Functional -- Module does one and only one clearly defined job, like a
mathematical function.
Glenford Myers' scale of strength is almost identical, except that he assigned
numeric values to the scale for use in the estimating algorithm mentioned
earlier. His scale calls temporal strength "Classical", and places
Informational strength between Communicational and Functional strength. A
module with Informational strength has many entry points to functions which
operate on a common data structure within the module -- an Ada package, for
example.


Design Overview


The overall design of a system can be broadly classified as either transform
or transaction centered. In a transform centered design, there is a subsystem
which concerns itself with editing input, a subsystem which concerns itself
with formatting output, and a subsystem which transforms the input into the
output. Look for a transform flow in the structure chart.
In a transaction centered design, there is a subsystem which accepts a range
of inputs, decides what to do with each type and dispatches it to the proper
sub-subsystem. Look for a diamond at a high level in the structure chart.
As a sample problem, consider the puzzle contest in the 1989 February-March
issue of GAMES magazine. The puzzle came in two parts. First, you were to
construct a four by four crossword puzzle grid with six black cells. The grid
must be symmetric about the center point and must have at least one black cell
in every row and column.
Then you were to use the digits zero to nine and fill in the grid. The numbers
formed in the rows and columns were totaled right to left, left to right, top
to bottom, and bottom to top. Finally, the sum of the two horizontal numbers
was divided by the sum of the vertical numbers. The score closest to ten wins
the contest. The sample grid in Figure 2 shows the calculations for one
possible arrangement.
The reasoning that produced my first attempt (see Figure 3) was that we have a
transform centered design, which will build empty grids, fill them with
digits, score them and save the results. So we need a module to build empty
grids, one to test it, one to fill it, one to score it and one to save it.
Since this is my first sketch and will probably be significantly revised, I
have not bothered to show any flows or draw any lower levels. This is just to
get something on paper. Note that filling the grid once it is built will take
a lot of work -- there are 10! possible ways to fill each possible grid.
The immediate problem with this first design is that the main module is too
busy; it needs to delegate authority to subordinates. My second attempt (see
Figure 4) has only three subordinates, (one to get an usable empty grid, one
to fill it, and one to save it). In this revision, I also begin to worry about
some details, like tests for a valid grid.
The second attempt shows some important design principles. Notice that
iterations have been moved to lower levels, rather than nesting them in the
main module. The "Get Grid" module uses a coordinate flow to pass a candidate
grid to a testing module before it passes it to a super-ordinate module. You
will often find it handy to build ladders of such modules to convert raw input
into usable data in a step by step fashion.
The final attempt (see Figure 4) is based on some observations about the
problem itself. The row and column test are really the same except for a
rotation; they can probably use a common subroutine for counting spaces. If
one of these tests fails then we do not have to bother with the other tests,
so I show a diamond on the structure chart. The symmetry test can be removed
completely if we change the grid generating module so that it always blacks
out the cells opposite the center.
Inserting the digits will involve generating permutations of the ten digits,
so we ought to get a library routine for that. Once a grid is filed, we can
rotate it about the major and minor diagonals to give us new grids which do
not have to be re-calculated from scratch.
We do not want to save all the grids, just the best one found so far, so we
need to make another test.
With these refinements, we get the chart in Figure 4. At this point, the code
can be written as soon as we design the interfaces between the modules. The
interface design will answer questions such as, "Should we use just one work
space or pass grids as a user defined data structure between modules?" The
answer will depend on the language used and just how fast we want to make the
program run. With these issues resolved, the programmers can be given
easy-to-understand descriptions of the modules.


A Best Solution


GAMES magazine ran the best solution to the puzzle in their 1989 June/July
issue.
*837
25**
**40
619*
left to right total = 1,521 down total = 153
right to left total = 1,710 up total = 171
Horizontal total = 3,231 Vertical total =324
Score = 9 and 35/36 or 9.972222...
How well did your program do?
Dr. Alan Davis of Vienna, VA, found some methods to stop the scoring when the
answer will clearly be outside of consideration, further improving
performance. He also found that one particular grid produced the best results
and was able to eliminate others. His program took a little over seven hours
to run on a PC.
References:
Constantine, Larry. "Beyond Methodologies," presented at "Software Development
'89" Miller-Freeman Seminars, San Francisco, Feb 1989.
DeMarco, Tom. Structured Analysis And Systems Specification, Yourdon Press
(ISBN 0-917072-07-3) 1978.
Jane, Chris and Sarson, Trish. Structured Systems Analysis, Prentice-Hall
(ISBN 0-13-8544547-2), 1979.
Higgins, Gail. "Structure Charts", Computer Languages, Vol. 4, No. 1, Jan
1987.
Myers, G. J. Reliable Software Through Composite Design, Van
Nostrand-Reinhold, 1971

Yourdon, Ed and Constantine, Larry. Structured Design, Prentice-Hall (ISBN
0-13-854471-9), 1979.
Gordon, Peter, "Imperfect Ten," Games Magazine, February- March, 1989, page
10.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
























































Decoding Group 3 Images


Michael P. Marking


Michael P. Marking has developed systems for aircraft, factory automation,
image manipulation, and secure, distributed transaction processing. He can be
reached at Post Office Box 8039, Scottsdale, Arizona 85252-8039.


Group 3 encoding, used by most facsimile machines and optionally for TIFF
image files, has become a common encoding technique for raster images. Here
I'll describe a scheme for decoding entire bytes of a Group 3 image at once,
rather than a bit at a time. This scheme is considerably faster than
bit-oriented neutral schemes, but requires additional memory (for the decoding
tables).
Group 3 encoding (defined by the CCITT specification listed in the
bibliography) encodes each scan line or row of pixels using a Huffman
technique. The codewords are predefined and fixed. Each codeword represents a
run of black pixels, a run of white pixels, or an end-of-line (EOL). Fill bits
(zeroes) may follow each line to allow the fax imaging mechanism a certain
minimum printing time. TIFF files dispense with the EOL codes, but align each
row on a byte boundary. Other non-fax applications bring their own variations
to Group 3 practice.
Group 3 encoding efficiency varies with the type of image, because the fixed
code set represents a compromise. (One should view the specification as nearly
optimal in an economic or engineering sense, rather than in a mathematical
sense.)
For text and line art images, a ratio between the sizes of the unencoded
(bitmap) and encoded images of from 10:1 to 30:1 is typical. However, for
scanned complex images such as photographs -- without dithering -- expect
ratios from 1.5:1 to 3:1.
Both encoding and decoding require extensive bit manipulation, an activity at
which few general-purpose processors are adept.
My procedure avoids this bit manipulation by decoding all possible inputs in
advance rather than at run time -- much like table-driven CRC computations.
Table 1 lists the codewords as defined by the CCITT specification. Because the
standard defines separate sets of codewords for white runs and for black runs,
during decoding one must know whether the current run is white or black.
(Warning: sometimes images are stored as negatives, requiring inversion of all
bits to create a positive.) Each row is assumed to begin with a white run,
which may be of zero length. There are two classes of codewords: "terminating"
and "makeup". A run is always encoded as zero or more makeup codewords
followed by exactly one terminating codeword. The run's length is the sum of
the lengths represented by the separate codewords.
To understand the decoding process, examine the fragment of a naive decoding
program in Listing 1. This algorithm is simple and compact, but slow.
The best implementation of the routine next_bit (), which returns 0 or 1 as
the next bit of the input image, will vary among machines, but will probably
employ a rotating mask or an indexed set of masks.
Listing 2 shows the faster byte-wide method I promised. This method depends on
two precomputed tables, the branch table and the action table. Like a state
machine, this algorithm examines the next byte of the encoded image during
each cycle, and then (depending on the algorithm's current state) executes
zero or more actions and assumes a new state.
There are 216 states, corresponding to the 216 rows of the branch table. The
incoming byte is used to select the column of the branch table. The value at
the row-column coordinate is the offset into the action table at which the
activity sequence for that combination is located.
Another warning: Some files assume that the bits within a byte are to be
processed beginning with the least significant. In that case, you must flip
the bits end-for-end before using the byte as an index. You can do this
quickly by indexing into an appropriate 256-byte array. Alternatively, you can
build the assumption into the tables.
Each of the states corresponds to a string of zero or more bits that have been
read but which don't correspond to a codeword -- valid or otherwise -- and to
an assumption about whether the next codeword is to represent a white run or a
black run. The zero state corresponds to the assumption made at the beginning
of an image (assuming no leading EOL): no bits yet seen, next run white.
Table 2 lists the action table's opcode definitions which are interpreted by
the routine decode_image(). Each value of the single bytes opcode corresponds
to a certain length of white or black run, to EOL, or to an invalid code
(since not all possible codewords are used). A special value of 255 terminates
the mini-program, and the following byte is the new state. You must provide
routines g3_white_run(), g3_black_run(), g3_invalid_code(), and g3_EOL()
appropriate for your application.
Listing 3 shows the program used to build the tables. Since it's used only
once, I made no attempt to optimize it.
The heart of the program is build_table() which constructs the branch table
and action table entries corresponding to one state, then recursively invokes
itself to explore all possible subsequent states. build_table() returns the
state number corresponding to the resulting entry. It maintains a reference
table of already-computed entries, returning an existing entry where possible.
The varying-length bit strings corresponding to undigested preceding bits are
stored as longs, with the length of the string shifted to the more significant
end of the long.
Since the last entry in each series of actions is the new state of the
decoding machine, and since build_program(), (which returns the value of this
variable) writes to the action table, the location must be remembered and
revisited when build_table() returns with its answer.
The build-table() routine is constructed as two nested loops. The outer loop
steps through the 256 possible values of bytes that might be encountered with
the given prefix. The inner loop appends each bit of one of these possible
bytes to the current working bit string, and determines if the concatenation
is a known code. When a known code is detected, build_table() generates the
appropriate action table entry and nulls the working bit string before
continuing. The next state is based on whatever bit string is left at the end
of the possible byte. All of the bit-level stuff essentially is being done
when the table is built rather than at decode time.
The rest of the table construction program is fairly straightforward and needs
little explanation. A few general points:
First, the code is fairly portable. The main caution regards word sizes and a
proper choice of memory model to correctly address the large arrays.
Second, the arrays may be initialized at compile time to avoid loading them at
execution time. But, not every compiler will fail to complain at
initialization of quarter- megabyte arrays.
Third, error handling has been limited in order not to obscure the algorithm:
Appropriate code should be added.
Finally, with a little knowledge of the code generated by your compiler you
can tune the decode algorithm. For example, there are often fast (but not
portable) ways to calculate the offset into the branch table. If your compiler
doesn't play such games, it may be wise to deceive it into doing so. It may
also be profitable to substitute indexed jumps for the if statement where the
action table is interpreted.
I wrote this program simply to demonstrate the algorithm. A production version
might look somewhat different.
A similar technique may be used with Group 4 encoding (which achieves even
higher coding densities). Group 4 encodes the pixels in a row by describing
the changes or differences from the previous row.
A variety of programs for manipulation of Group 3 and Group 4 images and TIFF
image files, including working versions of the programs described here, will
soon be added to the CUG library as volume number 317.


Bibliography


CCITT, Fascicle VII.3, Recommendation T.4: Standardization of Group 3
Facsimile Apparatus for Document Transmission, Geneva, amended 1984.
CCITT, Fascicle VII.3, Recommendation T.5: General Aspects of Group 4
Facsimile Apparatus, Malaga-Torremolinos, 1984.
CCITT, Fascicle VII.3, Recommendation T.6: Facsimile Coding Schemes and Coding
Control Functions for Group 4 Facsimile Apparatus, Malaga-Torremolinos, 1984.
Aldus Corporation and Microsoft Corporation, Tag Image File Format - Revision
4.0, Seattle/Redmond, 1987.
Table 1
Table 1. Codes for Group 3 Images
For each defined runlength, this table shows the codes and codelengths for
white and for black runs. The code for end-of-line is (12 bits) 0x0010 (0000
0000 0001). Lengths < 64 are terminating, others are makeup.

 Code for White Run Code for Black Run
Runlength len hex binary len hex binary

 0 8 3500 0011 0101 10 0dc0 0000 1101 11
 1 6 1c00 0001 11 3 4000 010
 2 4 7000 0111 2 c000 11
 3 4 8000 1000 2 8000 10
 4 4 b000 1011 3 6000 011
 5 4 c000 1100 4 3000 0011

 6 4 e000 1110 4 2000 0010
 7 4 f000 1111 5 1800 0001 1
 8 5 9800 1001 1 6 1400 0001 01
 9 5 a000 1010 0 6 1000 0001 00
 10 5 3800 0011 1 7 0800 0000 100
 11 5 4000 0100 0 7 0a00 0000 101
 12 6 2000 0010 00 7 0e00 0000 111
 13 6 0c00 0000 11 8 0400 0000 0100
 14 6 d000 1101 00 8 0700 0000 0111
 15 6 d400 1101 01 9 0c00 0000 1100 0
 16 6 a800 1010 10 10 05c0 0000 0101 11
 17 6 ac00 1010 11 10 0600 0000 0110 00
 18 7 4e00 0100 111 10 0200 0000 0010 00
 19 7 1800 0001 100 11 0ce0 0000 1100 111
 20 7 1000 0001 000 11 0d00 0000 1101 000
 21 7 2e00 0010 111 11 0d80 0000 1101 100
 22 7 0600 0000 011 11 06e0 0000 0110 111
 23 7 0800 0000 100 11 0500 0000 0101 000
 24 7 5000 0101 000 11 02e0 0000 0010 111
 25 7 5600 0101 011 11 0300 0000 0011 000
 26 7 2600 0010 011 12 0ca0 0000 1100 1010
 27 7 4800 0100 100 12 0cb0 0000 1100 1011
 28 7 3000 0011 000 12 0cc0 0000 1100 1100
 29 8 0200 0000 0010 12 0cd0 0000 1100 1101
 30 8 0300 0000 0011 12 0680 0000 0110 1000
 31 8 1a00 0001 1010 12 0690 0000 0110 1001
 32 8 1b00 0001 1011 12 06a0 0000 0110 1010
 33 8 1200 0001 0010 12 06b0 0000 0110 1011
 34 8 1300 0001 0011 12 0d20 0000 1101 0010
 35 8 1400 0001 0100 12 0d30 0000 1101 0011
 36 8 1500 0001 0101 12 0d40 0000 1101 0100
 37 8 1600 0001 0110 12 0d50 0000 1101 0101
 38 8 1700 0001 0111 12 0d60 0000 1101 0110
 39 8 2800 0010 1000 12 0d70 0000 1101 0111
 40 8 2900 0010 1001 12 06c0 0000 0110 1100
 41 8 2a00 0010 1010 12 06d0 0000 0110 1101
 42 8 2b00 0010 1011 12 0da0 0000 1101 1010
 43 8 2c00 0010 1100 12 0db0 0000 1101 1011
 44 8 2d00 0010 1101 12 0540 0000 0101 0100
 45 8 0400 0000 0100 12 0550 0000 0101 0101
 46 8 0500 0000 0101 12 0560 0000 0101 0110
 47 8 0a00 0000 1010 12 0570 0000 0101 0111
 48 8 0b00 0000 1011 12 0640 0000 0110 0100
 49 8 5200 0101 0010 12 0650 0000 0110 0101
 50 8 5300 0101 0011 12 0520 0000 0101 0010
 51 8 5400 0101 0100 12 0530 0000 0101 0011
 52 8 5500 0101 0101 12 0240 0000 0010 0100
 53 8 2400 0010 0100 12 0370 0000 0011 0111
 54 8 2500 0010 0101 12 0380 0000 0011 1000
 55 8 5800 0101 1000 12 0270 0000 0010 0111
 56 8 5900 0101 1001 12 0280 0000 0010 1000
 57 8 5a00 0101 1010 12 0580 0000 0101 1000
 58 8 5b00 0101 1011 12 0590 0000 0101 1001
 59 8 4a00 0100 1010 12 02b0 0000 0010 1011
 60 8 4b00 0100 1011 12 02c0 0000 0010 1100
 61 8 3200 0011 0010 12 05a0 0000 0101 1010
 62 8 3300 0011 0011 12 0660 0000 0110 0110
 63 8 3400 0011 0100 12 0670 0000 0110 0111
 64 5 d800 1101 1 10 03c0 0000 0011 11

 128 5 9000 1001 0 12 0c80 0000 1100 1000
 192 6 5c00 0101 11 12 0c90 0000 1100 1001
 256 7 6e00 0110 111 12 05b0 0000 0101 1011
 320 8 3600 0011 0110 12 0330 0000 0011 0011
 384 8 3700 0011 0111 12 0340 0000 0011 0100
 448 8 6400 0110 0100 12 0350 0000 0011 0101
 512 8 6500 0110 0101 13 0360 0000 0011 0110 0
 576 8 6800 0110 1000 13 0368 0000 0011 0110 1
 640 8 6700 0110 0111 13 0250 0000 0010 0101 0
 704 9 6600 0110 0110 0 13 0258 0000 0010 0101 1
 768 9 6680 0110 0110 1 13 0260 0000 0010 0110 0
 832 9 6900 0110 1001 0 13 0268 0000 0010 0110 1
 896 9 6980 0110 1001 1 13 0390 0000 0011 1001 0
 960 9 6a00 0110 1010 0 13 0398 0000 0011 1001 1
1024 9 6a80 0110 1010 1 13 03a0 0000 0011 1010 0
1088 9 6b00 0110 1011 0 13 03a8 0000 0011 1010 1
1152 9 6b80 0110 1011 1 13 03b0 0000 0011 1011 0
1216 9 6c00 0110 1100 0 13 03b8 0000 0011 1011 1
1280 9 6c80 0110 1100 1 13 0290 0000 0010 1001 0
1344 9 6d00 0110 1101 0 13 0298 0000 0010 1001 1
1408 9 6d80 0110 1101 1 13 02a0 0000 0010 1010 0
1472 9 4c00 0100 1100 0 13 02a8 0000 0010 1010 1
1536 9 4c80 0100 1100 1 13 02d0 0000 0010 1101 0
1600 9 4d00 0100 1101 0 13 02d8 0000 0010 1101 1
1664 6 6000 0110 00 13 0320 0000 0011 0010 0
1728 9 4d80 0100 1101 1 13 0328 0000 0011 0010 1
1792 11 0100 0000 0001 000 11 0100 0000 0001 000
1856 11 0180 0000 0001 100 11 0180 0000 0001 100
1920 11 01a0 0000 0001 101 11 01a0 0000 0001 101
1984 12 0120 0000 0001 0010 12 0120 0000 0001 0010
2048 12 0130 0000 0001 0011 12 0130 0000 0001 0011
2112 12 0140 0000 0001 0100 12 0140 0000 0001 0100
2176 12 0150 0000 0001 0101 12 0150 0000 0001 0101
2240 12 0160 0000 0001 0110 12 0160 0000 0001 0110
2304 12 0170 0000 0001 0111 12 0170 0000 0001 0111
2368 12 01c0 0000 0001 1100 12 01c0 0000 0001 1100
2432 12 01d0 0000 0001 1101 12 01d0 0000 0001 1101
2496 12 01e0 0000 0001 1110 12 01e0 0000 0001 1110
2560 12 01f0 0000 0001 1111 12 01f0 0000 0001 1111
Table 2
Table 2. Action Table Mini-Program Opcodes
Each mini-program in the action table consists of a
series of one-byte opcodes. The last opcode is always 255 and is
followed by the new state (row number in the branch table).

Opcode Value Interpretation

 zero end-of-line
 one error (invalid codeword)
 2 to 66 white runs of 0 to 64 bits (by 1)
 67 to 105 white runs 128 to 2560 bits (by 64)
 106 to 170 black runs of 0 to 64 bits (by 1)
 171 to 209 black runs 128 to 2560 bits (by 64)
 210 to 254 (undefined)
 255 following byte is new state number

Listing 1
Listing 1. The bit-wide technique: a straightforward but slow way
decode a Group 3 image


#define INVALID_CODE -1
#define EOL_CODE -2

decode_image () {
...
while (!end_of_file) {
/* loop for each row in the image */
...
while (!end_of_row) {
/* loop for each pair of runs in the row */
int runlength;
while (63 < (runlength = decode_white_run ())) {
if (runlength == EOL_CODE)end_of_row = TRUE;
else if (runlength == INVALID_CODE)
... error condition ...;
else /* a legal white run */
spit_out_white_pixels (runlength);
}
while (63 < (runlength = decode_black_run ())) {
if (runlength == EOL_CODE) end_of_row = TRUE;
else if (runlength == INVALID_CODE}
... error condition ...;
else /* a legal black run */
spit_out_black_pixels (runlength);
}
}
}
}

int decode_white_run () {
if (next_bit ()) /* 1... */ {
if (next_bit ()) /* 11... */ {
if (next_bit ()) /* 111... */ {
if (next_bit ()) /* 1111 */ return (7);
else /* 1110 */ return (6);
}
else /* 110... */ {
if (next_bit ()) /* 1101... */ {
if (next_bit ()) /* 1101 1 */ return (64);
else /* 1101 0... */ {
if (next_bit ()) /* 1101 01 */ return (15);
... omit a lot of ifs and elses and braces ...
if (next_bit ()) /* 0000 0000 001 */ return (INVALID_CODE);
else /* 0000 0000 000... */ {
if (next_bit ()) /* 0000 0000 0001 */ return (EOL_CODE);
else /* 0000 0000 0000 */ return (INVALID_CODE);
... omit a lot of braces ... 
}

int decode_black_run () {
... a lot like the preceding ... }


Listing 2
Listing 2. The byte-wide decoding routines

/* decode routines for group 3 files


char g3_init (void) - initializes the g3 decode routines
char g3_reset (void) - may be called from within one of
the routines g3_white (), g3_black (), g3_error (), or
g3_EOL () to set the decoding state to zero
char g3_decode (void) - decodes a group 3 file

the above routines return nonzero on error, else zero

USER MUST PROVIDE THE FOLLOWING ROUTINES:

short g3_next_byte (void) - returns the next byte in the
image, -1 at EOF
void g3_white (short runlength) - writes a white run of
the specified length to the output file
void g3_black (short runlength) - writes a black run of
the specified length to the output file
void g3_EOL (void) - processes the end-of-line character
void g3_error (void) - process an invalid code
*/

static char initialized = 0, reset = 0;
static unsigned char action_table [220299];
static long branch_table [216] [256];
static short state = 0;
static short column, row;

extern unsigned char reverse_byte [];

char g3_init () {
if (!initialized) /* if this is the first time this
routine is called */ {
long bytes_read;
int action_handle, jump_handle;
... open the file containing action_table and read
into memory ...
... open the file containing branch_table and read
into memory ...
initialized = 1;
}
state = 0; reset = 0;
return (0);
}

char g3_reset () {
state = 0; reset = 1;
return (0);
}

char g3_decode () {
char rcode;
unsigned char *p_action;
short g3byte;
if (!initialized)
if (rcode = g3_init ()) return (rcode);
while ((g3byte = g3_next_byte ()) != -1)
/* while not end-of-image */ {
unsigned char code_byte;
p_action = &action_table [branch_table [state] [g3byte]];
while ((code_byte = *(p_action++)) != 255 && !reset)

/* until state change code */ {
if (!code_byte) g3_E0L ();
else if (code_byte == 1) g3_error ();
else if (code_byte < 67) g3_white (code_byte-2);
else if (code_byte < 106) g3_white ((code_byte-65)*64);
else if (code_byte < 171) g3_black (code_byte-106);
else g3_black ((code_byte-169)*64);
}
state = *p_action;
if (reset) /* if reset flag set during above */ {
state = 0; reset = 0;
}
}
}


Listing 3
Listing 3 - Table build routines

#define BLACK 1
#define WHITE 0
#define MSB_FIRST 0
#define LSB_FIRST 1
#define INVALID_CODE -1
#define INCOMPLETE_CODE -2
#define EOL_CODE -3

struct
{ char x_color; long x_prefix; short x_index;
} index_table [216];
short index_table_fill = 0;

long branch_table_fill = 0, action_table_fill = 0;
char bit_order = MSB_FIRST; /* this is the sequence of bits
in the group 3 data file for which the table will be used */
int action_handle, branch_handle;

/* build decoding tables for group 3 compressed images */
int main () {
branch_handle = open (...file to contain branch_table...);
action_handle = open (...file to contain action_table...);
build_table (0, WHITE);
close (branch_handle); close (action_handle);
}

long append_0 (long prefix) {
return (prefix + 0x10000); /* incr bit strng lngth */
}

long append_1 (long prefix) {
unsigned short prefix_length;
static unsigned short prefix_mask [16] = {0x8000, 0x4000,
0x2000, 0x1000, 0x0800, 0x0400, 0x0200, 0x0100, 0x0080,
0x0040, 0x0020, 0x0010, 0x0008, 0x0004, 0x0002, 0x0001};
prefix_length = 0xFF & (prefix >> 16);
return (prefix + 0x10000 + prefix_mask [prefix_length]);
}

/* add a table entry for the specified prefix */

short build_table (long prefix, char color) {
short table_index;
unsigned short byte_value;
long *branch_record;
if ((table_index = search_offset (prefix, color)) != -1)
/* if already done */ return (table_index);
branch_record = malloc (256 * sizeof (long));
table_index = branch_table_fill++;
index_table [index_table_fill].x_prefix = prefix;
index_table [index_table_fill].x_color = color;
index_table [index_table_fill].x_index = table_index;
index_table_fill++;
for (byte_value = 0; byte_value < 256; byte_value++) {
unsigned char bit_mask;
unsigned short bit_number; /* the first bit examined
is bit zero */
static unsigned char msb_mask [8] = {0x80, 0x40, 0x20,
0x10, 0x08, 0x04, 0x02, 0x01};
static unsigned char lsb_mask [8] = {0x01, 0x02, 0x04,
0x08, 0x10, 0x20, 0x40, 0x80};
long working_prefix;
char working_color;
working_prefix = prefix;
working_color = color;
branch_record [byte_value] = action_table_fill;
for (bit_number = 0; bit_number < 8; bit_number++) {
if (bit_order == MSB_FIRST)
bit_mask = msb_mask [bit_number];
else bit_mask = lsb_mask [bit_number];
if (bit_mask & byte_value)
working_prefix = append_1 (working_prefix);
else working_prefix = append_0 (working_prefix);
if (working_color == WHITE) {
short runlength;
runlength = white_run_length (working_prefix);
if (runlength == INVALID_CODE) {
emit_byte (1); bit_number = 8;
working_prefix = 0;
}
else if (runlength == EOL_CODE) {
emit_byte (0); working_prefix = 0;
}
else if (runlength != INCOMPLETE_CODE) {
if (runlength < 64) emit_byte (runlength + 2);
else emit_byte ((runlength / 64) + 65);
if (runlength < 64) working_color = BLACK;
working_prefix = 0;
}
/* else incomplete code */
}
else /* working_color == BLACK */ {
short runlength;
runlength = black_run_length (working_prefix);
if (runlength == INVALID_CODE) {
emit_byte (1); bit_number = 8;
working_prefix = 0;
}
else if (runlength == EOL_CODE) {
emit_byte (0); working_color = WHITE;

working_prefix = 0;
}
else if (runlength != INCOMPLETE_CODE) {
if (runlength < 64) emit_byte (runlength + 106);
else emit_byte ((runlength / 64) + 169);
if (runlength < 64) working_color = WHITE;
working_prefix = 0;
}
/* else incomplete code */
}
} /* bit number loop */
{ /* descend and update table */
long saved_fill, position_1, position_2;
short newstate;
position_1 = tell (action_handle); /* save position */
emit_state (0); /* hold place in file */
newstate = build_table (working_prefix,
working_color);
position_2 = tell (action_handle);
saved_fill = action_table_fill;
lseek (action_handle, position_1, SEEK_SET);
emit_state (newstate);
lseek (action_handle, position_2, SEEK_SET);
action_table_fill = saved_fill;
}
} /* byte value loop */
lseek (branch_handle, 256L * (long) sizeof (long)
*(long) table_index, SEEK_SET); /* position to row */
write (branch_handle, (char *) branch_record,
256 * sizeof (long)); /* write row of table */
free (branch_record);
return (table_index);
}

void emit_byte (unsigned char value) {
write (action_handle, &value, 1);
action_table_fill++;
}

void emit_state (short newstate) {
emit_byte (255);
emit_byte ((unsigned char) newstate);
}

/* determine if the table for a prefix already exists - if
it does, return its offset in the table, else return -1 */
short search_offset (long prefix, char color) {
short j;
for (j = 0; j < index_table_fill; j++)
if (prefix == index_table [j].x_prefix
&& color == index_table [j].x_color)
return (j);
return (-1);
}

short search_run_length_table (long prefix, long *p_table) {
short table_offset = 0;
long prefix_length, prefix_value;
prefix_length = 0xFF & (prefix >> 16);

prefix_value = 0xFFFF & prefix;
while (p_table [table_offset]) {
if (p_table [table_offset] == prefix_length
&& p_table (table_offset + 1] == prefix_value)
return ((short) p_table [table_offset + 2]);
table_offset += 3; /* move on to next entry */
}
return (INCOMPLETE_CODE); /* no entry found in table */
}

short white_run_length (long prefix) {
static long code table [] = {
8, 0x3500, 0, /* 0011 0101 */
6, 0x1C00, 1, /* 0001 11 */
... lines omitted: fill in from Table 1 ...
12, 0x01F0, 2560, /* 0000 0001 1111 */
12, 0x0010, EOL_CODE, /* 0000 0000 0001 */
9, 0x0080, INVALID_CODE, /* 0000 0000 1 */
10, 0x0040, INVALID_CODE, /* 0000 0000 01 */
11, 0x0020, INVALID_CODE, /* 0000 0000 001 */
12, 0x0000, INVALID_CODE, /* 0000 0000 0000 */
0 /* end-of-table */ };
return (search_run_length_table (prefix, code_table));
}

short black_run_length (long prefix) {
static long code_table [] = {
10, 0x0DC0, 0, /* 0000 1101 11 */
3, 0x4000, 1, /* 010 */
... lines omitted: fill in from Table 1 ...
12, 0x01F0, 2560, /* 0000 0001 1111 */
12, 0x0010, EOL_CODE, /* 0000 0000 0001 */
9, 0x0080, INVALID_CODE, /* 0000 0000 1 */
10, 0x0040, INVALID_CODE, /* 0000 0000 01 */
11, 0x0020, INVALID_CODE, /* 0000 0000 001 */
12, 0x0000, INVALID_CODE, /* 0000 0000 0000 */
0 /* end-of-table */ };
return (search_run_length_table (prefix, code_table));
}
























A Date Object In C++


David Clark


The author has ten years programming experience and is a senior research
scientist in the advanced technology group of Miles Diagnostics, a
manufacturer of medical diagnostics instruments and reagents.


It is something of a paradox that time is so difficult to define and yet we
can measure its passage with more precision than any other physical quantity.
One method to mark places in time is to assign dates to particular days.
Western civilization keeps track of those dates by referring to calendars,
specifically the Gregorian calendar, which was adopted in 1582.
This article describes a date object implemented in C++, which simplifies date
manipulation.


Calendars And Julian Dates


Different cultures create different types of calendars. As far as I know, all
are derived from attempts to measure the passage of the seasons. The Julian
calendar, not to be confused with a Julian date, was one method adopted by the
Romans.
The term "Julian date" takes a lot of abuse. Though a Julian date has a very
precise meaning in astronomy, we will relax that meaning here somewhat. In
general terms, a Julian date is simply an integer representing a particular
day in history. Many computer programs use something like a Julian date,
assigning numbers to dates. However, many of these programs assign different
numbers to the same date. I will use the same numbering system that
astronomers use.
One final point about Julian dates: in astronomy, Julian dates change at noon
instead of midnight. However, the program I describe interprets date changes
as occurring at midnight.


Implementation


The files DATES.HPP, DATES.CPP and DATETEST.CPP in Listing 1 to Listing 3 show
an implementation of a date object called DateObject. I compiled and tested
the object with Zortech C++ v1.07 under MS-DOS 3.30.
The header file DATES.HPP (Listing 1) contains the declaration of the
DateObject type, and several useful constants. DATES.CPP, shown in Listing 2,
contains most of the details of the implementation. Listing 3, DATETEST.CPP,
is a simple demonstration program illustrating some of the ways that
DateObjects may be used.
DateObjects can represent dates ranging from 1 January 4,700 BC to 31 December
25,000 AD. Though the limits are somewhat arbitrary, this range is adequate
for my own uses. The boundaries are due to the implementation of the internal
calculations which convert from day, month and year to a Julian date, and vice
versa. The calculations will overflow the int used to represent the year if
the range is exceeded by a large amount. The range could probably be extended
by representing the year as a long int, but I have not bothered to try it.


Data


The data associated with a DateObject consists of six fields listed at the top
of the DateObject class declaration (Listing 1). Julian is a long int
representing the Julian date. The value BADDATE, #defined above the
declaration of the DateObject, represents an invalid date.
The DateFormatPtr field contains a pointer to a dynamically allocated
character string. The string acts as a template that describes how a
particular DateObject should be converted to a printable string.
The Day, Month, Year and DayOfWeek fields represent what their names imply.
The Year field takes on negative values to represent years B.C. These fields
are somewhat redundant in that all of the information they contain can be
calculated from the Julian date. I have included these fields for my own
convenience, based on the untested assumption that some operations on
DateObjects are faster if the information is kept at hand rather than
recalculated. If storage space is a more pressing concern for you, these
fields could probably be removed with little effort or performance
degradation.


Internal Functions


In addition to the public methods associated with a DateObject, DATES.CPP
(Listing 2) contains functions which are used only within the DateObject code.
The function IsLeap() returns a non-zero value if the year passed to it is a
leap year and returns zero otherwise.
Checking for a leap year is not just a matter of determining if the year is
evenly divisible by four. Leap years occur in the calendar because the time it
takes the Earth to orbit the sun is not an integral number of days. It is
almost 365.25 days -- almost, but not quite. Over a period of 100 years, the
small errors accumulate, growing to almost one extra day. So, years evenly
divisible by 100 are not leap years, unless they are divisible by 400 (which
corrects for even smaller errors). By the same token, there is a special rule
for years evenly divisible by 4000. Beyond that, my program does not care. For
example, the year 2000 is a multiple of four, which would normally be a leap
year. However, it is also a multiple of 100, which is not normally a leap
year. However, it is also a multiple of 400, which means it is a leap year.
DaysInMonth() returns the number of days in a particular month in a particular
year. If the month is February, the function IsLeap() is called to determine
if it is in a leap year. If so, 29 is returned. If the month is not February
or the year is not a leap year, the number of days in the month is looked up
in the array MonthDays and returned.
The function CheckForValidDate() determines if the Day, Month and Year passed
to it represent a valid date. Besides checking that Year is within the limits
of MINYEAR and MAXYEAR, the function checks that Month is valid and that Day
is valid for the Month and Year.
The small function JulianToDayOfWeek() calculates the day of the week from the
Julian date passed as its argument.
The functions DMYtoJulian() and JulianToDMY() convert days, months and years
to Julian dates and vice versa. These functions were taken almost directly
from Numerical Recipes in C by Press, Flannery, Teukolsky and Vettering, pp.
10-13. Note that these functions take into consideration the ten-day gap that
occurred in October of 1582 when converting from the Julian to the Gregorian
calendar. I believe these functions are the only ones which make any direct or
indirect calls to the floating point library. If recovering the memory used by
the floating point calculations is important, these routines could probably be
converted to all integer (or probably long int) operations.


Constructors


Four constructors for DateObjects are declared in Listing 1 and defined in
Listing 2. The constructor used depends on how the instance of a DateObject is
declared. Instances declared without initializers are simply set to the
current date.
The first constructor in the Listing 2 obtains the current date by calling DOS
interrupt 33 (21H), service 42 (2AH). The date is then copied to the correct
fields of the DateObject. No checks for an illegal date are performed since it
is assumed that MS-DOS always returns a legal date. A copy of the default
format string is allocated, if possible, by calling the standard library
function strdup(). The DateFormatPtr field points to that copy. If a copy of
the format string cannot be allocated, the Julian field is assigned the value
BADDATE to indicate that the DateObject is invalid.
The second constructor in Listing 2 is a "copy initializer". It takes as its
single argument the address of another DateObject and copies the data from its
argument into its own data fields. Copy initializers are not used so much by
the programmer as by the compiler -- when a DateObject is a pass-by-value
argument to a function or when a DateObject is returned from a function, for
example.
The last two constructors accept initializers for the day, month and year. One
uses a default format string pointed to by the static variable
CurrentDateFormat (defined near the top of Listing 2), while the other accepts
a format string as one of the initializers. Both of these constructors call
the member function ChangeDate() to copy data from the initializers to the new
instance. These constructors check that the requested date is legal. Dates
like 31 February 1980 will not be accepted. If the initializers would result
in an illegal date, the Julian field is assigned the value BADDATE to indicate
that the DateObject does not contain valid data.
All of the constructors allocate the format string from dynamic memory by
calling the standard library routine strdup(). If the allocation fails,
strdup() returns a NULL pointer and the Julian field is set to BADDATE.



Destructor


The destructor for a DateObject simply releases the memory containing the
format string, pointed to by DateFormatPtr, with a call to free(). NULL
pointers are allowed since free() will accept a NULL argument without harm.


Access Functions


The group of access functions, GetDay(), GetMonth(), GetYear(),
GetDayofWeek(), GetFormat(), and GetJulian(), examine the internal contents of
a DateObject. Note that GetFormat() returns a pointer to a copy of the format
string, not to the format string itself, insuring that the format string
cannot be altered by anything other than DateObject methods and forcing the
programmer to free the unneeded copies.


Overloaded Operators


One of the nice features of C++ is operator overloading, the ability to define
new actions for existing operators. Operator overloading allows you to do date
arithmetic and comparison with the same operators you use with "normal"
operands.
Because DateObjects contain the Julian date, logical comparison operations are
easy. The overloaded comparison operators are implemented inline in Listing 1
and simply perform the relevant comparison on the Julian fields.
The assignment operator copies most of the data from the rvalue, which is
passed by reference, into the lvalue (this). A call to strdup() dynamically
allocates and copies a new format string. The object containing the newly
copied data is returned.
It seems to me that date arithmetic is somewhat analogous to pointer
arithmetic in C. It makes no sense to multiply or divide dates, but some
addition and subtraction operations seem intuitive. For example, it makes
sense to add integral quantities to a date to yield a new date. However, just
as with pointers, it does not make sense to add two dates together. What is
the meaning of "4 July 1776 + 20 July 1969" anyway? This addition should also
be commutative (i.e. integer + Date == Date + integer).
Subtraction with DateObject operands can take two forms. When one DateObject
is subtracted from another, the result is an integral value representing the
number of days between the two dates. In contrast to addition, subtraction is
not commutative. It seems logical to subtract a number of days from a date but
not to subtract a date from a number of days, i.e. "3L - 26 January 1986" has
no meaning that I can see.
The simplest overloaded operator is the subtraction operator used when one
DateObject is subtracted from another. This operator is implemented as an
inline friend operator, declared near the bottom of Listing 1. The operator
simply subtracts the Julian field of the second DateObject from the Julian
field of the first and returns the difference. The returned value represents
the number of days by which the two dates differ.
The subtraction operator is also overloaded in the case where a long is
subtracted from a DateObject. This operation returns a DateObject whose value
preceeds the DateObject argument by the number of days represented by the
long. For example, 20 July 1969 - 3L would yield 17 July 1969.
The binary addition operator, +, has been overloaded by two methods: adding a
DateObject to a long, and adding a long to a DateObject. Both methods yield a
new date which corresponds to the date operand plus a number of days equal to
the long operand. For example, 20 July 1969 + -3L yields 17 July 1969. The
methods specify a long int as one of the arguments, but an int may be used in
an expression since the compiler will promote the int to a long.
The first overloaded operator for addition in Listing 2 is really the
foundation for most of the other overloaded arithmetic operators, and is
implemented as a friend operator. It first adds the long argument to the
Julian field of the DateObject argument. The operator function checks that the
result is within the range MINDATE to MAXDATE. If so, the Day, Month, Year and
DayOfWeek fields are calculated. The operator function initializes the
DateFormatPtr field to point to a copy of the date format string of the
DateObject argument.
The remainder of overloaded arithmetic operators, -, ++, -, += and -=, with
the exception of the friend subtraction operator, are all implemented as
variations of addition.


Formatting


While using Microsoft's Excel spreadsheet, I found one of the most attractive
features to be the user-definable formats that could be assigned to cells.
From that inspiration, I built a simple format interpreter into the
DateObject. The member function DateToString() dynamically allocates a string
and fills it with a printable representation of a DateObject. The format of
the string is controlled by the formatting instructions pointed to by
DateFormatPtr.
Basically, the interpreter recognizes four special characters and several
combinations thereof. These control the format for the day, month and year.
The characters are d for the day, m for the month, y for the year, and \ as an
escape prefix. The number of special characters encountered determines the
exact formatting.
 d The day of the month, usually from 1 to 31.
 dd The day of the month including a leading zero if needed, such as 07.
 ddd A three letter abbreviation for the day of the week, e.g. "Sun" or "Wed".
 dddd The day of the week fully spelled. These run from "Sunday" to
"Saturday".
 m The number of the month from 1 to 12.
 mm The number of the month with a leading zero for months earlier than
October, such as 09.
 mmm A three letter abbreviation for the name of the month.
 mmmm The full name of the month, "April" for example.
 yy A two digit abbreviation for the year. For example, 1953 would be
formatted as 53.
 yyy Formatted as a signed integer from -4700 to 25000.
 yyyy An unsigned integer value representing the year, followed by "AD" for
years greater than zero or "BC" for years less than zero.
\ Used as an escape character so that any of the letters above can be used as
letters in a date format.
The current format is stored in the string pointed to by CurrentDateFormat and
by default is m-dd-yyy. MS-DOS uses this same format to display file and
system dates.


Format Examples


If a DateObject contained the date 24 July 1995, Table 1 shows how the date
would be formatted with several different format strings.
In the first example, the date is in the default format, just as MS-DOS would
display it. In the second example, the order of the day and month have been
reversed and the numeric month has been replaced by an alphabetic
abbreviation. In the third example, the d character is used in two places:
first to format the day of the week, then to generate the day of the month.
Notice the comma after the dddd. Any text that is not recognized as a special
character is simply placed in the output string unchanged. The next two
examples cause only the year and the day of the week, respectively, to be
placed in the string. The last example shows that arbitrary text can be placed
in the format string and passed through to the output string. Notice that the
d in "date" must be preceeded by a \ or it will be replaced by the day of the
month instead.
ChangeFormat() changes the string used to format a DateObject. The function
attempts to dynamically allocate a copy of its argument, the new format
string. If successful, the old format string is released, and a pointer to the
new string is installed in DateFormatPtr field. If the new format string
cannot be allocated, no changes are made.
The function ChangeDefaultDateFormat() is not a DateObject method. It changes
the default format string attached to DateObjects when they are instantiated
without an initializer for the format. The format string used for these
initializations is pointed to by the static variable CurrentDateFormat in
Listing 2. ChangeDefaultFormat() copies its argument to the heap and points
CurrentDateFormat to the copy. The first call to ChangeDefaultDateFormat()
sets the bookkeeping variable OnHeap to a nonzero value. Subsequent calls to
ChangeDefaultDateFormat() will free the old string on the heap.


Miscellaneous



The function ChangeDate() changes the date of a DateObject. The function first
checks that its arguments represent a valid date. If not, zero is returned. If
the requested change is valid, a new Julian date is calculated, and the new
data is copied into the DateObject. ChangeDate() does not alter the date
format string, though ChangeFormat() can be used for that purpose. For a valid
new date, the function returns 1. Note that two of the constructors are
actually implemented with calls to ChangeDate().
The function ValidDate() determines whether a DateObject contains valid data
by checking whether the Julian field has been set to BADDATE.


Demonstration Program


The program in Listing 3 demonstrates how a DateObject might be used. Although
output is accomplished by means of the stream operators, these operators are
not necessary to use DateObjects. Listing 4 shows the output produced by the
program when the system date is set to 7 April 1993.
The program begins by declaring five variables of type DateObject. The first
three are assigned specific dates and formats. Since the last two variables
are not explicitly initialized, when the program starts they are set to the
current system date using the default format.
The variables d1 and d2 are set to the earliest and latest dates respectively
that a DateObject can hold, with the difference between the two assigned to
Diff and displayed in a single statement. The value of Diff is thus the
largest long operand that may be used in an arithmetic operation on
DateObjects.
The variable d3 is initialized to 23 May 1968, which has the nice, round
Julian date 2,440,000. The second group of statements checks whether or not
the variable is assigned the Julian date we know it must have.
The next three groups of statements in Listing 3 illustrate more date
arithmetic. The first group causes an overflow while the second group results
in an underflow. In both cases, the result of the expression is a bad date.
The third group does not cause overflow and returns a valid date, which is
displayed.
The final group of statements in the example first displays the date contained
in d4, which was initialized to the system date using the default format. Next
the format is changed by a call to ChangeFormat. The new format will cause
DateToString() to return a string containing only the week day. Next,
DateObjects are used in a for statement, being preincremented, postincremented
and compared. Note that d5 is never explicitly initialized. It has the same
initial value as d4 since d5 was initialized to the system date by default.
When the for statement is executed, the five days of the week following the
system date are displayed.
The example does not exercise all of the capabilities of the DateObject, but
does give you some idea of their flexibility.


Conclusion


There are a number of additions and improvements that could be made to the
DateObject. An obvious addition would be to overload the stream operators, <<
and >>, to accept dates. The input operator would be somewhat more difficult
because of the variety of formats in which dates can be written as strings.
Reading and writing dates to binary files can be accomplished simply by
storing the Julian field and the format string. All of the other information
can be reconstructed. Additional conversion methods might include functions to
convert between DateObjects and MS-DOS file dates.
As implemented, the DateObject does not implement many of the powerful
features of object-oriented programming (no inheritance or polymorphism for
example). However, it is easy to imagine a DateObject as a descendant of a
more general Time class. In addition to the Gregorian calendar, it should be
possible to create sibling classes for date objects based on the Chinese,
Hebrew, Moslem or Aztec calendars, all descending from an abstract Date class.
Table 1
Format String Result
_______________________________________________________LINEEND____
"m-dd-yyy" "7-24-1995"
"d mmm yyy" "24 Jul 1995"
"dddd, d mmmm yyy" "Monday, 24 July 1995"
"yyyy" "1995 AD"
"ddd" "Mon"
"The \date is m-dd-yyy" "The date is 7-24-1995"

Listing 1
/*
** dates.hpp -- header file for dates.cpp, a date object
*/

#define MINMONTH 1 /* January */
#define MAXMONTH 12 /* December */
#define MINDAY 1
#define MAXDAY 31
#define MINWEEKDAY 0 /* Sunday */
#define MAXWEEKDAY 6 /* Saturday */
#define BADDATE -1L
#define MINYEAR -4700L
#define MAXYEAR 25000L
#define MINDATE 4749L /* 1 January 4700 BC */
#define MAXDATE 10852487L /* 31 December 25,000 */
#define MAXDATESTRLEN 256 /* max length of date format string */

class DateObject {
long Julian; /* julian date */
char *DateFormatPtr; /* pointer to date format string */
unsigned char Day; /* 1 to 31 */
unsigned char Month; /* 1 to 12 */
int Year; /* negative years are BC */
unsigned char DayOfWeek; /* 0 to 6 */
public:


DateObject(void);
DateObject(DateObject &OtherDate);
DateObject(unsigned char InitDay, unsigned char InitMonth, int InitYear);
DateObject(unsigned char InitDay, unsigned char InitMonth,
int InitYear, const char *FormatStr);
DateObject operator = (DateObject &d);
DateObject operator - (long x);
DateObject operator ++ (void);
DateObject operator -- (void);
DateObject operator += (long x);
DateObject operator -= (long x);
int operator == (DateObject &d) { return (Julian == d.Julian); }
int operator != (DateObject &d) { return (Julian != d. Julian);}
int operator > (DateObject &d) { return (Julian > d.Julian); }
int operator < (DateObject &d) { return (Julian < d.Julian); }
int operator >= (DateObject &d) { return (Julian >= d.Julian); }
int operator <= (DateObject &d) { return (Julian <= d.Julian); }
int ValidDate(void) { return (Julian != BADDATE); }
unsigned char GetDay(void) { return (Day); }
unsigned char GetMonth(void) { return (Month); }
int GetYear(void) { return (Year); }
unsigned char GetDayOfWeek(void) { return (DayOfWeek); }
long GetJulian(void) { return (Julian); }
const char *GetFormat(void);
int ChangeDate(unsigned char NDay, unsigned char NMonth, int NYear);
void ChangeFormat(const char *s);
~DateObject(void);
friend DateObject operator + (DateObject &d, long x);
friend DateObject operator + (long x, DateObject &d);
friend long operator - (DateObject &d1, DateObject &d2)
{ return (d1.Julian - d2.Julian); };
char *DateToString(void);
};

int ChangeDefaultDateFormat(const char *);


Listing 2
/*
** dates.cpp -- date object methods
*/

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <dos.h>
#include <string.h>
#include "dates.hpp"

static char ShortMonths[MAXMONTH][4] = {
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
};

static char LongMonths[MAXMONTH][10] = {
"January", "February", "March",
"April", "May", "June",
"July", "August", "September",

"October", "November", "December"
};

static unsigned MonthDays[MAXMONTH] = {
31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31
};

static char ShortWeekDays[MAXWEEKDAY+1][4] = {
"Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"
};

static char LongWeekDays[MAXWEEKDAY+1] [10] = {
"Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday"
};

static const char *CurrentDateFormat = "m-dd-yyy";
static int OnHeap = 0; /* CurrentDateFormat doesn't point to heap */

/*
** ChangeDefaultDateFormat -- change the format string used to
** initialize the DateFormatPtr. Return 1 on success or 0 if
** memory could not be allocated for the new string.
*/

int ChangeDefaultDateFormat(const char *s)
{
char *t;

if ((t = strdup(s)) == NULL)
return (0); /* return failure */
if (OnHeap)
free(CurrentDateFormat); /* release old string */
else
OnHeap = 1; /* now we're allocating on heap */
CurrentDateForamt = t; /* point to new string */
return (1);
}

/*
** IsLeap -- return non-zero if Year is a leap year, 0 otherwise.
*/

static int IsLeap(int Year)
{
return (Year % 4 == 0) && (Year % 4000 != 0) &&
((Year % 100 != 0) (Year % 400 == 0));
}
/*
** DaysInMonth -- return the number of days in Month. Year
** is passed to test for a leap year if the month is February.
*/

static unsigned char DaysInMonth(unsigned char Month, int Year)
{
if ((Month == 2) && IsLeap(Year))
return (29);
else
return (MonthDays[Month - 1]);

}

/*
** CheckForValidDate -- check that Day, Month and Year
** represent a valid date. Return non-zero if so, 0 otherwise.
*/

static int CheckForValidDate(unsigned char Day, unsigned char Month, int Year)
{
if ((Year < MINYEAR) (Year > MAXYEAR) (Year == 0) 
(Month < MINMONTH) (Month > MAXMONTH))
return (0);
return (Day <= DaysInMonth(Month, Year));
}

/*
** JulianToDayOfWeek -- given a Julian date, return the
** day of the week for that date, 0 being Sunday, 6 being
** Saturday.
*/

static unsigned char JulianToDayOfWeek(long Jul)
{
return ((Jul + 1) % 7);
}
/*
** DMYtoJulian -- after validating the date passed as a Day,
** Month and Year, convert it to a Julian date.
**
** The algorithm shown here is swiped directly from "Numerical
** Recipes in C" by Press, Flannery, Teukolsky and Vettering, p. 10.
*/

#define IGREG (15+31L*(10+12L*1582))

static long DMYtoJulian(unsigned char Day, unsigned char Month, int Year)
{
unsigned long Ja, Jm, Jy, Jul;

if (!CheckForValidDate(Day, Month, Year))
return (BADDATE);
if (Year < 0)
Year++;
if (Month > 2)
{
Jy = Year;
Jm = Month + 1;
}
else
{
Jy = Year - 1;
Jm = Month + 13;
}
Jul = (long) (floor(365.25*Jy) + floor(30.6001*Jm) + Day + 1720995);
if (Day + 31L*(Month + 12L*Year) >= IGREG)
{
Ja = 0.01*Jy;
Jul += 2 - Ja + (int) (0.25*Ja);
}

return (Jul);
}

/* JulianToDMY -- convert a Julian date to the appropriate
** Day, Month and Year.
**
** Also swiped from "Numerical Recipes".
*/

#define GREGOR 2299161

static void JulianToDMY(long Jul, unsigned char *Day, unsigned char *Month,
int *Year)
{
long Ja, JAlpha, Jb, Jc, Jd, Je;

if ((Jul != BADDATE) && (Jul >= MINDATE) && (Jul <= MAXDATE))
{
if (Jul >= GREGOR)
{
JAlpha = ((double) (Jul - 1867216) - 0.25)/36524.25;
Ja = Jul + 1 + JAlpha - (long) (0.25*JAlpha);
}
else
Ja = Jul;
Jb = Ja + 1524;
Jc = 6680.0 + ((double) (Jb - 2439870) - 122.1)/365.25;
Jd = 365*Jc + (0.25*Jc);
Je = (Jb - Jd)/30.6001;
*Day = Jb - Jd - (int) (30.6001*Je);
*Month = Je - 1;
if (*Month > 12)
*Month -= 12;
*Year = Jc - 4715;
if (*Month > 2)
--(*Year);
if (Year <= 0)
--(*Year);
}
}
/*
** ChangeDate -- change a date to reflect the new date passed
** in the arguments. If the requested date is not a legal date,
** return a value of 0 without making any changes. If legal,
** return 1. The date format string is not affected.
*/

int DateObJect::ChangeDate(unsigned char NDay, unsigned char NMonth, int
NYear)
{
long t;

t - DMYtoJulian(NDay, NMonth, NYear);
if (t != BADDATE)
{
Julian = t;
Day = NDay;
Month = NMonth;
Year = NYear;
DayOfWeek = JulianToDayOfWeek(Julian);

return (1);
}
return (0);
}
/*
** DateObject -- constructor if no args given. Just initialize
** to todays date.
*/

DateObject::DateObject(void)
{
union REGS regs;

regs.x.ax = 0x2a00; /* DOS get date service */
intdos(&regs, &regs);
Day = regs.h.dl;
Month = regs.h.dh;
Year = regs.x.cx;
DayOfWeek = regs.h.al;
Julian = DMYtoJulian(Day, Month, Year);
DateFormatPtr = strdup(CurrentDateFormat);
if dateFormatPtr == NULL)
Julian = BADDATE;
}
/*
**DateObject -- copy initializer constructor
*/

DateObject::DateObject(DateObject &OtherDate)
{
Day = OtherDate.Day;
Month = OtherDate.Month;
Year = OtherDate.Year;
DayOfWeek = OtherDate.DayOfWeek;
Julian = OtherDate.Julian;
DateFormatPtr = strdup(OtherDate.DateFormatPtr);
if (DateFormatPtr == NULL)
Julian = BADDATE;
}
/*
** DateObject -- constructor when day, month and year initializers
** are provided. The default format string is used.
*/

DateObject::DateObject(unsigned char InitDay, unsigned char InitMonth,
int InitYear)
{
ChangeDate(InitDay, InitMonth, InitYear);
If (Julian != BADDATE)
{
DateFormatPtr = strdup(CurrentDateFormat);
if (DateFormatPtr == NULL)
Julian = BADDATE;
}
}
}

/*
** DateObject -- constructor used when day, month, year and a

** format string initializer are provided.
*/

DateObject::DateObject(unsigned char InitDay, unsigned char InitMonth,
int InitYear, const char *FormatStr)
{
ChangeDate(InitDay, InitMonth, InitYear);
if (Julian != BADDATE)
{
DateFormatPtr = strdup(FormatStr);
if (DateFormatPtr == NULL)
Julian = BADDATE;
}
}

/*
** = -- assignment operator.
*/

DateObject DateObject::operator = (DateObject &d)
{
Day = d. Day;
Month = d.Month;

Year = d.Year;
Julian = d. Julian;
DayOfWeek = d. DayOfWeek;
DateFormatPtr = strdup(d.DateFormatPtr);
if (DateFormatPtr == NULL)
Julian = BADDATE;
return (*this);
}

/*
** + -- addition operator. This is the fundamental operator
** function. All other arithmetic operators returning
** DateObjects call this function. The resulting DateObject
** copies that format string of the DateObject argument.
*/

DateObject operator + (DateObject &d, long x)
{
DateObject sum;

sum.Julian = d.Julian + x;
if ((sum.Julian < MINDATE) (sum.Julian > MAXBATE))
sum.Julian = BADDATE;
else
{
JulianToDMY(sum.Julian, &(sum.Day), &(sum.Month), &(sum.Year));
sum.DayOfWeek = JulianToDayOfWeek(sum.Julian);
sum.DateFormatPtr = strdup(d.DateFormatPtr);
if (sum.DateFormatPtr == NULL)
sum.Julian = BADDATE;
}
return (sum);
}

DateObject operator + (long x, DateObject &d)

{
return (d + x);
}
DateObject DateObject::operator - (long x)
{
return (*this + (-x));
}

DateObject DateObject::operator ++ (void)
{
return ((*this) = (*this) + 1L);
}

DateObject DateObject::operator -- (void)
{
return ((*this) = (*this) + (-1L));
}

DateObject DateObject::operator += (long x)
{
return ((*this) = (*this) + x);
}

DateObject DateObject::operator -= (long x)
{
return ((*this) =(*this) + (-x));
}

/*
** GetFormat -- return a dynamically allocated
** copy of a date's format string
*/

const char * DateObject::GetFormat()
{
return (strdup(DateFormatPtr));
}
/*
** ChangeFormat -- change the date's format string
*/

void DateObject::ChangeFormat (const char *s)
{
char *t;

if ((t = strdup(s)) != NULL)
{
free(DateFormatPtr);
DateFormatPtr = t;
}
}

/*
** DateObject destructor -- simply release the string space occupied
** by *DateFormatPtr. We do not have to check that the pointer is
** non-NULL, since free accepts NULL pointers and does nothing.
*/

DateObject::~DateObject(void)

{
free (DateFormatPtr);
}

/*
** CountLetters -- count the run of letters in s matching
** the first letter in s. Advance s to point to the next
** letter that does not match the first character. Return
** the count. This is a helper function for DateToString.
*/

static int CountLetters(char **s)
{
int n, c;

for (n = 0, c = **s; **s == c; n++, (*s)++)
;
return (n);
}

/*
** DateToString -- convert Self to a printable string
** based on the contents of the format string, DateFormatPtr.
**
** The format string is interpreted somewhat like date
** formats in Microsoft Excel. Most characters are passed
** through unchanged into the result string. However,
** certain special characters are replaced. The special
** characters and their replacements are:
**
** d Replaced by a one or two digit day number, i.e. '2'.
** dd Replaced by a two digit day number with a leading
** 0 if the day is less than 10, i.e. '02'.
** ddd Replaced by a 3 character name for the day of the
** week, i.e. 'Wed'.
** dddd Replaced by the complete word for the day of the
** week, i.e. 'Wednesday'.
** m Replaced by a one or two digit month number, i.e. '2'.
** mm Replaced by a two digit month number with a leading
** 0 if the month is less than October, i.e. '02'.
** mmm Replaced by a 3 character name for the month, i.e. 'Jan'.
** mmmm Replaced by the complete word for the month, i.e. 'January'.
** yy Replaced by a 2 digit year Modulo 100, i.e. '89'.
** yyy Replaced by a 4 digit year number, i.e. '1989'. BC
** dates are preceded by a '-'.
** yyyy Replaced by a 4 digit string representing the absolute
** value of the year number followed by ' AD' or ' BC',
** as appropriate, i.e. '1989 AD'.
** \ Skipped and places the next character into the string

** without interpration. Allows you to put words in the
** output string, i.e. 'To\da\y is ...' will generate
** a string of the form 'Today is ...'.
*/

char *DateObject::DateToString(void)
{
int c, Len, Pos, Count;
char *str, /* a dynamically allocated work string */

*sptr, /* a moving pointer to the next avail char in str */
*ret, /* pointer to the trimmed string returned by the function */
*fptr; /* a pointer into the format string */

if (Julian == BADDATE)
return (strdup("Bad Date"));

Len = strlen(DateFormatPtr);
Pos = 0;
sptr = str = calloc((MAXDATESTRLEN+1), sizeof(char));
fptr = DateFormatPtr;
while (*fptr)
{
switch (*fptr)
{
case 'd' :
case 'D' :
Count = CountLetters(&fptr);
if (Count >= 4)
{
strcat(sptr, LongWeekDays[DayOfWeek]);
while (*sptr)
sptr++;
}
else if (Count == 3)
{
strcat(sptr, ShortWeekDays[DayOfWeek]);
while (*sptr)
sptr++;
}
else
{
if ((Count == 2} && (Day < 10))
strcat(sptr++, "0");
itoa(Day, sptr, 10);
while (*sptr)
sptr++;
}
break;
case 'm' :
case 'M' :
Count = CountLetters(&fptr);
if (Count >= 4)
{
strcat(sptr, LongMonths[Month-1]);
while (*sptr)
sptr++;
}
else if (Count == 3)
{
strcat(sptr, ShortMonths[Month-1]);
while (*sptr)
sptr++;
}
else
{
if ((Count == 2) && (Month < 10))
strcat(sptr++, "0");
itoa(Month, sptr, 10);

while (*sptr)
sptr++;
}
break;
case 'y' :
case 'Y' :
Count = CountLetters(&fptr);
if (Count >= 4)
{
itoa(Year, sptr, 10);
if (Year < 0) /* overwrite minus sign */
strcpy(sptr, sptr+ 1);
while (*sptr)
sptr++;
if (Year > 0)
strcat(sptr, " AD");
else
strcat(sptr, " BC");
while (*sptr)
sptr++;
}
else
{
if (Count == 2)
{
if ((Year % 100) < 10)
strcat(sptr++, "0");
itoa((Year % 100), sptr, 10);
while (*sptr)
sptr++;
}
else
{
itoa(Year, sptr, 10);
while (*sptr)
sptr++;
}
}
break;
case '\\' :
fptr++; /* skip over the \, and ... */
default :
*sptr = *fptr; /* copy the character */
sptr++; /* point to next empty char */
fptr++;
break;
}
}
ret = strdup(str); /* make a "trimmed" copy */
free(str); /* free our work string */
return (ret); /* return the "duped" string */
}


Listing 3
/*
** datetest.c -- test program for DateObjects
*/


#include <stdio.h>
#include <stream.hpp>
#include "dates.hpp"

void main(void)
{
DateObject d1(1, 1, -4700, "d mmmm yyyy");
DateObject d2(31, 12, 25000, "d mmmm yyyy");
DateObject d3(23, 5, 1968, "d mmm yyy");
DateObject d4; /* initialize to todays date */
DateObject d5;
long Diff;

cout << "DateObjects can span dates from " << d1.DateToString() << "\n";
cout << "to" << d2.DateToString() << ", or ";
cout << (Diff = (d2 - d1)) <<" days.\n\n";

cout << "A good reference date is" << d3.DateToString() << "\n";
cout << "which should have a Julian date of 2440000.\n";
cout << "If we call GetJulian() to make sure, we find\n";
cout<< "that it does";

if (d3.GetJulian() != 2440000L)
cout <<" not";
cout << ", in fact, equal 2440000.\n\n";

cout << "Attempting to add" << Diff <<" days to" <<
d3.DateToString() << "\n";
cout << "yields a" << (d3 + Diff).DateToString() << ".\n";

cout << "Attempting to subtract " << Diff <<" days from" <<
d3.DateToString() << "\n";
cout << "yields a" << (d3 - Diff).DateToString() << ".\n";

cout << "However, adding" << Diff <<" days to" <<
d1.DateToString()<< "\n";
cout << "yields" << (d1 + Diff).DateToString() << ".\n\n";

cout << "Today is "<< d4.DateToString() << ".\n";
d4.ChangeFormat("dddd");
cout << "The next five days are:\n";
for (++d4 ; d4 <= d5 + 5; d4++)
cout << d4.DateToString() << "\n";
cout << "\n";
}


Listing 4
DateObjects can span dates from 1 January 4700 BC
to 31 December 25000 AD, or 10847738 days.

A good reference date is 23 May 1968
which should have a Julian date of 2440000.
If we call GetJulian() to make sure, we find
that it does, in fact, equal 2440000.

Attempting to add 10847738 days to 23 May 1968
yields a Bad Date.
Attempting to subtract 10847738 days from 23 May 1968

yields a Bad Date.
However, adding 10847738 days to 1 January 4700 BC
yields 31 December 25000 AD.

Today is 4-07-1993.
The next five days are:
Thursday
Friday
Saturday
Sunday
Monday




















































Calling C Functions From SQL


Victor Volkman


Victor R. Volkman received a B.S. in computer science from Michigan
Technological University in 1986. His areas of interest include database
internals, compiler semantics, and graphics applications. He is currently
employed as software engineer at Cimage, Inc. of Ann Arbor, Michigan. He can
be reached at the HAL 9000 BBS, 313-663-4173, 1200/2400 baud.


Structured Query Language (SQL) based programs are an increasingly popular
method for manipulating relational databases. Since the introduction of the
ANSI SQL standard in 1986, more and more users are demanding SQL solutions for
their database problems. This is because standard SQL is a platform
independent solution for database applications. SQL defines a high-level
English-like set of commands which perform SELECT, PROJECT, JOIN, UPDATE,
INSERT, DELETE, and other operations against databases [1].
However, SQL -- as its name implies -- was originally developed to function as
a query language, and the current SQL standard fails to provide the tools
needed for the nuts and bolts of most applications. Functions such as handling
data entry screens, report writing, and user-defined menus are notably absent.
The developer has two options: he can either implement an Embedded SQL (ESQL)
[2] solution, or use a fourth generation language (4GL) solution. ESQL often
requires scores, if not hundreds, of lines of code just to perform a single
SQL query. Additionally, debugging is complicated by the addition of the ESQL
preprocessor. The alternative to ESQL is a fourth generation language (4GL). A
4GL system allows SQL statements to be incorporated directly into its
programming environment. Furthermore, 4GL programs are often executed by
interpreters -- meaning that the development cycle is shorter than for
complied ESQL solutions. With the addition of a C interface, a system like
Informix combines the rapid development environment of a 4GL with the power
and flexibility of C.
Informix-4GL is an SQL-based fourth generation language with extensions for
functions, while/do loops, for/next loops, case/switch, and many other
structured elements. Figure 1 is an example Informix-4GL program which
displays the entire CONTACT database to the screen. Informix-4GL offers a
robust development environment across many platforms including PC-DOS and most
versions of UNIX. Additionally, it has built-in report writing, window
management, menuing, and form-handling features.
A customizable interface for C functions allows them to be called from within
the Informix-4GL interpreter. You need this C interface to augment the native
fourth generation language with complex or otherwise unsupported functions.
Informix-4GL contains statements to manipulate relational tables in any
conceivable fashion, but lacks a library of routines to open, close, read, and
write ordinary ASCII files. This oversight is easily corrected by installing a
set of functions to access the built-in C stream I/O library (fopen, fclose,
fgets, etc.). Since stream I/O is familiar to C programmers, this library can
serve as a gentle introduction to the Informix-4GL/C interface. But before
delving headlong into the C interface, you should attain a certain level of
familiarity with the Informix-4GL environment.


The Informix-4GL RDS Environment


The Rapid Development System (RDS) for Informix-4GL is a "layered"
environment. Several layers of programs must be loaded before an application
may actually be executed. Figure 2 shows the memory map of a custom mailing
labels application. Although the example environment shown in Figure 2 is for
real-mode PC-DOS, the protected-mode version is logically equivalent. The
first component is the SQL Engine, a resident collection of database functions
available to all system and user programs. The next layer is the 4GL Runner, a
program which interprets 4GL programs and calls the SQL Engine for database
functions. The final component is the compiled 4GL User Application, a p-code
file produced by the 4GL Program Compiler (FGLPC). (For the remainder of this
article, when you see "4GL," please understand that this means Informix-4GL
only.)


4GL Data Types


Informix-4GL supports a variety of data types for use with 4GL programs (see
Figure 3). The SMALLINT, INTEGER, and CHAR correspond exactly to the shor,
long, and char types in C. All base types may be grouped into arrays and
records as desired, and you may create arrays of records as well. The most
notable omission to 4GL is the absence of any pointer concepts. All program
variables are instantiated with the DEFINE statement. Scoping rules are much
the same as C since local variables override global variables of the same
name. Here are some examples of legal 4GL variable declarations:
define last_name char(30)
define age smallint
define units_sold integer
define amt_owed decimal (16,2)

define employees array[10] of record
define emp_name char(30),
define salary float
end record
Informix-4GL automatically performs as much data conversion as it can during
expression evaluation so that type casting is generally unneccessary. However,
a failed type conversion will cause a run-time error.
The string data type in 4GL has a few differences from its C counterpart.
First, strings in 4GL are not terminated by a null character; instead, they
are padded with trailing blanks to fill out their required size. The blank
padding, however, can be trimmed from string variables in any expression by
adding the CLIPPED suffix. Second, returning a string longer than 512
characters from a function causes a run-time error. This is a limitation of
the Informix parameter-passing mechanism, since strings can normally be up to
32,767 chars long.


4GL Functions, Variables, And Parameters


Function declarations in Informix-4GL are quite similar to those in C. The
function average_num() defined below receives two parameters, computes the sum
and average, and returns them to the caller:.
function average_num(val1, val2)
define val1 smallint
define val2 smallint
define sum float
let sum = val1+ val2
return sum, sum / 2.0
end function
Both 4GL and C use the same basic parameter passing system: pass-by-value.
This means the arguments to a function are evaluated and then copied to the
called function. Since 4GL lacks pointers, parameters are essentially the same
as local variables with initial values. Values must be returned to the caller
directly in the RETURN statement rather than by modifying parameters. This
difference in semantics requires a change in the syntax of function calls. An
example calling sequence for average_num might be as follows:
call average_num(12,5)
returning sum_val, avg_val
Any type of variable may be a parameter or return value with the exception of
arrays. 4GL ascertains that each function receives the correct number of
arguments and returns the number of values that the calling function was
expecting. However, Informix-4GL functions written in C may receive a variable
number of parameters.


Interfacing Informix-4GL To C



The development of a custom application with C functions involves creating
both a 4GL program and a custom runner. The development cycle for the mailing
labels custom application appears in Figure 4. User-written C functions are
linked with Informix-supplied libraries to produce a new p-code interpreter.
Once built, a runner can execute any number of different 4GL applications
without modification.
The userfuncs[] structure array in FGIUSR.C (see Listing 1) serves as a
catalog of user-supplied C functions. The cfunc_t structure supplies
everything the runner needs to know to call a function: its 4GL name string,
its function pointer, and the number of arguments it expects. The userfuncs[]
array is terminated by a record of null values which flags the end of the
list. Functions expecting a variable number of arguments are indicated by
setting the argument count equal to -max, where max is the maximum allowable
count. Remember, a function must be declared before a pointer to it can be
referenced.
The final component is the actual set of functions listed in your userfuncs[]
list. In the mailing labels application, these functions are contained in
STDIO4GL.C (see Listing 2). Informix's calling convention [3] specifies a
stack as the model by which parameters are passed. An Informix parameter stack
is a Last-in First-Out (LIFO) queue analogous to a CPU stack. Specifically,
"push" operations add to the stack and "pop" operations take away from the
stack.
Parameters are pushed from left-to-right in a 4GL function call. Functions
receiving parameters must then pop them off in a right-to-left order. C
functions are allowed to access the 4GL parameter stack through a set of
library functions. The complete list of parameter stack function is presented
in FGIPROTO.H (see Listing 3). The only formal argument to the C function
itself is an integer containing the parameter count (nargs). 4GL callable C
functions must always send the number of 4GL arguments pushed as the return
value.


A Stream I/O Library For 4GL


The file STDIO4GL.C contains 4GL callable C functions that provide access to
the standard I/O stream library. The functions fgl_fopen, fgl_fclose,
fgl_fgets, and fgl_fputs together allow you to read and write ASCII text
files. In describing fgl_fopen, we'll revisit each aspect of the
Informix-4GL/C interface.
The fgl_fopen function, like every other 4GL callable C function, has nargs as
its only parameter. The nargs value indicates how many 4GL parameters are
waiting on the 4GL stack. Since the entry for fgl_fopen() in userfuncs[] looks
like this:
"fgl_fopen", fgl_fopen, 2,
the runner knows there should be exactly two parameters, i.e. nargs=2. If
there are a different number of parameters, then the runner will signal a
run-time error and not call the function. Thus, there is no reason for
fgl_fopen() to examine nargs. The first two lines of fgl_fopen() functions to
pop strings from the stack:
popquote (mode, sizeof (mode) - 1);
popquote (filename,
sizeof(filename) -1);
The first parameter tells where to put the string and the second parameter
tells how large the destination string can be (excluding the terminating
null). If the value being popped contains more characters than are requested,
they will be lost. The best defense against this is to allow for the longest
reasonable string in your destination variable. If the value being popped
contains fewer characters than are requested, the destination string will be
padded with blanks. Blank padding fopen()'s filename and mode arguments yields
unpredictable results, so the values are clipped before use:
clipped (mode);
clipped (filename);
If you examine the mailing labels application in LABELS.4GL (see Listing 4),
you can verify that the parameters were indeed popped off in the reverse order
from the original call:
call FGL_fopen (filename, "w")
returning filehandle
Back in the STDIO4GL.C file, the next three lines of fgl_open() actually open
the file and return the handle.
ascfile = fopen(filename,mode);
retlong ((long) ascfile);
return 1;
Although casting the FILE pointer to a long type may seem suspect, it is
actually quite safe for most environments. Informix-4GL for MS-DOS requires
that the large program model is used (code > 64K, data > 64K). Since 4GL does
not support a native pointer type, our method seems to provide the most
convenient solution. The alternatives would be either to maintain a list of
filehandles or to restrict the user's access to one file at a time.
fgl_fopen()'s last task is to return a count of the number of parameters
pushed.
The remainder of the stream I/O functions in STDIO4GL.C are similar in most
respects to fgl_fopen(). The only function which returns more than one value
is fgl_gets(). This function returns both a string value and an error code.
It's crucial to remember that parameters are popped right-to-left, but pushed
left-to-right. A sample call such as:
call fgl_fgets(filehandle)
returning err_code, buffer
requires return values in the following order:
retshort(err_code);
retquote(buffer);
return 2;


Variable Number Of Parameters


As mentioned earlier, the userfuncs [] array tells the runner how many
arguments a function expects. A negative value in the cf_nargs indicates the
function can accept a variable number of arguments. One example of a function
which needs variable arguments is a function to return the maximum value of a
set of numbers. The function fgl_max() in STDIO4GL.C will take up to nine
arguments. The following example shows four parameters being used.
call fgl_max(4,2,1,3) returning
max_val
The only difference in coding style betwen fgl_max() and the other 4GL
callable functions is that fgl_max() acts upon the value of nargs. A function
must pop all of its parameters, regardless of whether it has a fixed or
variable number of parameters. The stack will be corrupted if unused
parameters are left on it.


A Mailing List Application


The mailing labels application in LABELS.4GL (see Listing 4) demonstrates one
possible utilization of a 4GL callable stream I/O library. This application
essentially uses the stream I/O library as a poor man's report writer. The
LABELS.4GL program reads from the CONTACT table of the CARDFILE database and
writes the result to LABELS.PRN. The application executes a SELECT on the
entire CONTACT table after opening the output file. Next, exactly six lines of
output are written for each CONTACT record. Notice that some of the lines
contain newline escapes (\n) for extra linefeeds to ensure proper address
spacing. Finally, the output file is closed and control returns to the
operating sytem.


Conclusion


One of the most frequent complaints by detractors of fourth-generation
languages is "Sure, I would use a 4GL, but it doesn't support ____." (fill in
the blank). I have discovered, however, that with the addition of a modest set
of 4GL-callable C functions, I can implement every feature a developer could
want. I have constructed 4GL-callable libraries for a DESQview multi-tasking
shell, NetBIOS and Sun PC-NFS based database servers, and a hardware key
programmer. Informix-4GL RDS offers both the raw power of SQL with the agility
of C to deliver a knockout blow. 
Footnotes
[1] Lusardi, David. The Database Expert's Guide to SQL, McGraw-Hill, New York,
New York, 1988.
[2] Pass, E.M. "Embedding SQL Commands in Your C Source", The C Users Journal,
May 1989. pp. 105-113.

[3] Informix-4GL Rapid Development System Reference Manual, 1988, Informix
Software, Menlo Park, CA, pp. 1-58ff.
Figure 1 Sample INFORMIX-4GL program
main
 define contact_no integer
 declare label_cur cursor for
 select * from contact
 order by last_name, first_name

 foreach label_cur into label_rec.*
 let contact_no = contact_no + 1
 display "Name: ", label_rec.first_name, label_rec.last_name
 end foreach
 display "Found ", contact_no, "records."
end main
Figure 2 INFORMIX-4GL RDS Example Run-time Environment for PC-DOS
Figure 3 Summary of INFORMIX-4GL Data Types
Data Description and Usage
Type
-------------------------------------------------------
CHAR(N) Character string, 1 - 32,767 chars
SMALLINT 16 bit signed integer (short)
INTEGER 32 bit signed integer (long)
DECIMAL BCD decimal, up to 32 significant digits
SMALLFLOAT Decimal number with 8 significant digits
FLOAT Decimal number with 16 significant digits
SERIAL Unique 32 bit signed integer (long)
DATE Number of days since 12/31/1899 (long)
Figure 4 INFORMIX-4GL RDS Development Cycle for PC-DOS

Listing 1
*/

typedef struct
{
char *cf_name; /* name of function */
int(*cf_ptr)(); /* pointer to the function */
short cf_nargs; /* number of arguments, < 0 means variable */
} cfunc_t;

int fgl_fopen(int);
int fgl_fclose(int);
int fgl_fgets(int);
int fgl_fputs(int);
int fgl_max(int);

cfunc_t usrcfuncs[] =
{
"fgl_fopen", fgl_fopen, 2,
"fgl_fclose", fgl_fclose, 1,
"fgl_fgets", fgl_fgets, 1,
"fgl_fputs", fgl_fputs, 2,
"fgl_max", fgl_max, -9,
0L, 0L, 0
};


Listing 2
#include "string.h"
#include "stdio.h"

#include "stdlib.h"
#include "fgiproto.h"

#define FGL_BUFSIZE 256

/* 4GL syntax is CALL OpenAsciiFile(filename,mode) returning filehandle */
fgl_fopen(nargs)
int nargs;
{
char filename[80];
char mode[10];
FILE *ascfile;

popquote(mode,sizeof(mode)-1);
popquote(filename,sizeof(filename)-1);
clipped(mode);
clipped(filename);
ascfile = fopen(filename,mode);
retlong((long)ascfile);
return 1;
}


/* 4GL syntax is CALL fgl_fclose(filehandle) */
fgl_fclose(nargs)
int nargs;
{
long filehandle;

poplong(&filehandle);
fclose((FILE *)filehandle);
return 0;
}


/* 4GL syntax is
CALL fgl_fgets(filehandle) returning err_code, buffer */

fgl_fgets(nargs)
int nargs;
{
long filehandle;
char buffer[FGL_BUFSIZE];
short err_code=0;
int len;

poplong(&filehandle);
memset(buffer,0,sizeof(buffer));
if (fgets(buffer, sizeof(buffer)-1, (FILE *)filehandle) != NULL) {
len = strlen(buffer);
buffer[len-1] = '\0'; /* trim \r */
}
else
err_code = -1;
retshort(err_code);
retquote(buffer);
return 2;
}




/* 4GL syntax is
CALL fgl_fputs(buffer,filehandle) returning err_code */

fgl_fputs(nargs)
int nargs;
{
long filehandle;
char buffer[FGL_BUFSIZE];
short err_code=0;
int len;
short err_code=0;
int len;

poplong(&filehandle);
popquote(buffer,sizeof(buffer)-1);
clipped(buffer,(char)' ');
err_code = fputs(buffer,(FILE *)filehandle);
fputc('\n',(FILE *)filehandle);
retshort(err_code);
return 1;
}


/* 4GL syntax is
CALL fgl_max(arg1, arg2, ...) returning max_val */

fgl_max(nargs)
int nargs;
{
int i;
short test_val;
short max_val = -32767;

for (i=0; i<nargs; i++) {
popshort(&test_val);
if (test_val > max_val)
max_val = test_val;
}

retshort(max_val);
return 1;
}


clipped(str) /* trim trailing blanks */
char *str;
{
char *blank;

blank = str + strlen(str) - 1;
while (*blank == ' ')
*blank-- = '\0';
}


Listing 3
*/


/* SOURCE: INFORMIX 4GL REFERENCE MANUAL, VOL I, PAGE 1-59 */

void popint(int *); /* Popping Functions */
void popshort(short *);
void poplong(long *);
void popflo(float *);
void popdub(double *);
void popquote(char *, short);
void popdec(struct dec_t *);

void retint(int); /* Pushing Functions */
void retshort(short);
void retlong(long);
void retflo(float *);
void retdub(double *);
void retquote(char *);
void retdec(struct dec_t *);


Listing 4
#

database cardfile

main
define label_rec record like contact.*
define filehandle integer
define err_code smallint
define buffer char(256)
define filename char(80)
define max_val smallint

display "Labels v1.00a -- (c) Victor R. Volkman"
call fgl_max(4,2,1,3) returning max_val
display "max value was ",max_val
let filename = "labels.prn"
call FGL_fopen(filename,"w") returning filehandle

declare label_cur cursor for
select * from contact
order by last_name, first_name

foreach label_cur into label_rec.*
let buffer = "\n", label_rec.first_name clipped, " ",
label_rec.last_name
CALL FGL_fputs(buffer,filehandle) returning err_code
let buffer = label_rec.address
CALL FGL_fputs(buffer,filehandle) returning err_code
let buffer = label_rec.city clipped, "",label_rec.state clipped,
" ", label_rec.zip, "\n\n"
CALL FGL_fputs(buffer,filehandle) returning err_code
end foreach
CALL FGL_fclose(filehandle)

end main



































































An Applied File I/O Tutorial: Text-Based Disk Routines


Leor Zolman


After spending the first half of his life in Hollywood, CA and the second in
Boston, MA (where he happily discovered Thai restaurants), Leor Zolman now
resides directly between those two cities in beautiful Lawrence, KS, where he
has a tremendously enjoyable time hacking DOS and Xenix systems for CUJ (but
really misses Thai restaurants.)


In the first two installments of this tutorial series, I presented the
framework, user-interface and record-editing portions of a small,
special-purpose database program. Now we arrive, finally, at the central point
of the tutorial: the storage and retreival of database records to and from the
mass storage device.


Text Or Binary?


The fundamental design of the read_db() and write_db() functions depend on
whether we store the data as human-readable, ASCII text or as a straight (raw)
binary image. If efficiency is a major concern, then binary mode is the right
choice: the data will take up less space on disk, and require the minimum
amount of format translation during input and output operations. To make the
routines easier to debug, however, the ASCII format is often more appropriate:
storing the data in human-readable form allows for convenient visual
inspections (using type or any text editor) after the data has been written to
disk.
Other considerations need also be taken into account when deciding upon a disk
format. Will there ever be a need for the data to be read in by other
applications, such as spreadsheets or fullblown database management systems?
If so, the text format is probably the better choice.
Depending on which of these two approaches is chosen, a different set of
standard library functions will be required. For the ASCII approach, the
line-oriented formatted I/O functions fprintf() and fscanf() will do all the
work. The binary approach will rely on the byte-oriented functions fread() and
fwrite().


The Interface


Both versions of read_db() and write_db() interface with their calling routine
(the main function) via the same set of external data:
RECS: the array of pointers to data records. The actual array name is recs,
with RECS being a defined synonym for recs. This synonym will ease a future
transition to dynamic array allocation.
n_recs: the number of records currently stored in RECS.
The only additional piece of information specified by the caller is the file
name to be used for the operation.
write_db() does not return any value; either the data is written correctly, or
the error is diagnosed within write_db() and the program aborts through a call
to the error() function. See the Exercises section at the end of this article
for more on write_db() error recovery.
read_db() returns the number of records loaded.
For convenience, the header file for this mini-database system is reproduced
in Listing 1.


The read_db() Function


Listing 2 shows the read_db() and write_db() functions written to work with
ASCII data files.
read_db() begins by defining several control variables, along with a set of
temporary variables for containing record field values in transition from disk
to memory array storage.
The control variables are:
fp: the file pointer for buffered input
rec_no: the number of records read in so far
rp: a temporary record buffer pointer
nitems: the number of items read in each line of input
read_db() starts out by setting the max_recs variable to the maximum number of
records that can be handled by the system (line 31). Since the recs array is
defined statically to contain MAX_RECS elements, the value of max_recs will
never change in this version; we'll make better use of max_recs later when
adding dynamic array allocation to the system.
To open the input file, the fopen () library function is called with a mode of
"r" to specify text mode. NULL (0) is returned if the file cannot be opened.
The text format chosen for the data consists of a single line of ASCII text
(terminated by newlines) per record, with whitespace (one space character)
between each field of a record and no other whitespace permitted.
The main read loop (lines 39-68) opens with a call to the fscanf function. The
format specification contains one format conversion specifier of the
appropriate type for each field of the record, and the remaining parameters
are the locally-defined temporary variables. Because the parameters to the
scanf family of input functions must all be pointers, the & operator must be
used on the scalar parameters in order to generate pointers to those objects.
The character array parameters (last and first) do not need the & operator
applied to them, because array names used alone are equivalent to pointers to
their first elements.
fscanf returns a value telling how many items were actually matched from the
input, and we assign that value to the variable nitems. If the value was EOF,
then a normal end-of-file has been reached and we fall out of the loop, close
the file, and return with the total number of records loaded (rec_no,
initialized to zero at the top of the function.)
If nitems ends up with any value other than 7, then something unexpected was
encountered in the input; a warning message is printed and the reading loop is
exited.
Having avoided all the possible error conditions from the fscanf() call, we
are ready to deal with a valid record's worth of new data. The first step is
to get some memory in which to store the data; the alloc_rec() function
(Listing 3, reproduced from the MDBUTIL.C listing shown in the April '90
issue) does this for us. alloc_rec() returns either a valid pointer to the
needed block of memory, or NULL if the memory could not be allocated. This
return value is assigned to the rp variable.
Using the rp pointer, each field value is copied into the memory block
obtained from alloc_rec. Then the address of this memory block is installed in
the RECS array and the record counter rec_no is incremented (line 67) for the
next loop iteration.
The write_db() function is set up much like read_db(). Since the fprintf()
function takes values, not pointers, for its list of parameters to write,
there is no need for & operators (in fact, using them would cause incorrect
results.) The return value from fprintf() is checked only for a single
negative error flag, in accordance with fprintf()'s definition.
The only additional feature of interest in write_db() is the use of a
temporary file for writing the output text. This practice insures against
losing both the results from the current session and the previously stored
database data file in the event of a catastrophic failure during the output
writing process. Only if the temporary file is written without incident, is
the previous version of the file erased and the temporary file renamed. To be
especially safe, a check is performed on the return value of the rename() call
(line 110-116) in case the new filename is not accepted; in this case, the
user is given the opportunity to enter other names until the rename() call
succeeds.
Next time, I'll show how to implement the read_db() and write_db() functions
using raw binary images instead of ASCII data.


Exercises



1. The %s format conversions used to read in the first and last name fields
effectively prevent those fields from being able to contain space characters.
What if a two-part first or last name needs to be represented, or a middle
name? A way to extend the system to allow strings with embedded spaces would
be useful, and this can be accomplished by changing the text format to require
a delimiter character between each field item. A special form of format
conversion may be used in place of "%s" to read such variable-legth string
fields: if the delimiter character chosen were the vertical bar, for example,
then a scanf call to read a single line containing a variable-legth string
(terminated by a vertical bar character) followed by an integer would look
like:
scanf("%[^] %d\n", string, &i);
The sub-sequence %[^] tells scanf to match all characters up to but not
including the first character encountered. The final in the format sequence
says to then skip the character in the input stream.
Modify the mini-database to allow spaces within the first name and last name
fields, using this technique.
2. As written, if an error occurs during the main loop of a write_db() call it
is possible to lose all data modified in the current session. Modify the
program to recover gracefully from a file output error, so that the user has a
chance to try again on, say, a different drive. Note that write_db() is
currently called from both the SAVE and QUIT main menu options.

Listing 1
1: /*
2: * MDB.H (Generalized Static/Dynamic Array Version)
3: *
4: * Program: Mini-Database
5: * Written by: Leor Zolman
6: * Module: Program Header File
7: */
8:
9: #define TRUE 1
10: #define FALSE 0
11:
12: /*
13: * Prototypes:
14: */
15:
16: int do_menu(struct menu_item *mnu, char *title);
17: void write_db(char *filename);
18: int read_db(char *filename);
19: void edit_db(char *db_name);
20: void backup_db(void);
21:
22: void error(char *msg);
23: struct record *alloc_rec(void);
24: void free_up(void);
25:
26: /*
27: * Data Definitions:
28: */
29:
30: struct record { /* Database record definition */
31: char active; /* TRUE if Active, else FALSE */
32: char last[25], first[15];/* Name */
33: long id; /* ID Number */
34: int age; /* Age */
35: char gender; /* M or F */
36: float salary; /* Annual Salary */
37: };
38:
39: #define MAX_RECS 1000 /* Maximum number of records */
40:
41:
42: #ifdef MAIN_MODULE /* Make sure data is only */
43: #define EXTERN /* DEFINED in the main module, */
44: #else /* and declared as EXTERNAL in */
45: #define EXTERN extern /* the other modules. */
46: #endif
47:
48:
49: EXTERN struct record *recs[MAX_RECS]; /* Array of ptr */
50: #define RECS recs /* to struct of type record */
51:

52: EXTERN int n_recs; /* # of records in current db */
53: EXTERN int max_recs; /* Max # of recs allowed */
54:
55: struct menu_item { /* Menu definition record */
56: int action_code; /* Menu item code */
57: char *descrip; /* Menu item text */
58: };


Listing 2
1: /*
2: * MDBFTXT.C
3: * Program: Mini-Database
4: * Written by: Leor Zolman
5: * Module: File I/O, Text Representation Version
6: */
7:
8: #include <stdio.h>
9: #include <stdlib.h>
10: #include "mdb.h"
11:
12: /*
13: * Function: read_db
14: * Purpose: Load_an existing Database from disk
15: * Parameters: Name of Database to load
16: * Return Value: NULL on error, else # of records.
17: */
18: int read_db(char *filename)
19: {
20: FILE *fp; /* File pointer */
21: int rec_no = 0; /* # of records read */
22: struct record *rp; /* Single record ptr */
23: int nitems;
24: int active; /* Temporary variables */
25: char last[26], first[20]; /* to hold the */
26: long id; /* values of fields */
27: int age; /* during file input */
28: char gender;
29: float salary;
30:
31: max_recs = MAX_RECS;
32:
33: if ((fp = fopen(filename, "r")) == NULL)
34: {
35: printf("Database not found.\n");
36: return NULL;
37: }
38:
39: while (1)
40: { /* Read one record (one line) of data: */
41: nitems = fscanf(fp, "%d %s %s %ld %d %c %f\n",
42: &active, last, first, &id, &age,
43: &gender, &salary);
44: if (nitems == EOF) /* stop reading on EOF */
45: break;
46: if (nitems != 7) /* Check for bad record */
47: {
48: printf("Warning: Last record ignored ");
49: printf("(matched only %d items!)\n", nitems);

50: break;
51: }
52: /* Allocate memory for one record: */
53: if ((rp = alloc_rec()) == NULL)
54: {
55: printf("Out of memory loading Database.\n");
56: return NULL;
57: }
58: /* rp points to the memory area */
59: rp->active = active; /* assign field values: */
60: strcpy(rp->last, last);
61: strcpy(rp->first, first);
62: rp->id = id;
63: rp->age = age;
64: rp->gender = gender;
65: rp->salary = salary;
66: /* Save pointer to memory area in */
67: RECS[rec_no++] = rp; /* RECS, and bump count */
68: }
69:
70: fclose(fp); /* Finished reading input file */
71: return rec_no; /* Return number of records read */
72: }
73:
74: /*
75: * Function: write_db
76: * Purpose: Write current Database to disk
77: * Parameters: Name of Database
78: * Return Value: None
79: */
80: void write_db(char *filename)
81: {
82: FILE *fp;
83: int rec_no, result;
84: struct record *rp;
85:
86: /* Write into temporary file first: */
87: char *tempname = "TEMPFILE.$$$";
88: if ((fp = fopen(tempname, "w")) == NULL)
89: {
90: printf("Can't open temporary file %s ", tempname);
91: printf("for writing.\n");
92: return;
93: }
94:
95: printf("Writing Database %s To Disk...\n", filename);
96: /* Each loop iteration writes one record: */
97: for (rec_no = 0; rec_no < n_recs; rec_no++)
98: {
99: rp = RECS[rec_no]; /* set rp to next rec */
100: /* write rec. in ASCII */
101: result = fprintf(fp, "%d %s %s %ld %d %c %f\n",
102: rp->active, rp->last, rp->first, rp->id,
103: rp->age, rp->gender, rp->salary);
104: if (result < 0) /* Check for error */
105: error("Error writing output database.\n");
106: }
107:
108: fclose(fp); /* close temporary file */

109: remove(filename); /* remove old version */
110: while (rename(tempname, filename) == -1)
111: { /* if renaming didn't work..*/
112: printf("Error renaming temp file: %s\n",
113: _strerror(NULL));
114: printf("Please enter a new filename: ");
115: gets(filename); /* try for a legal filename */
116: }
117: printf("Database written successfully.\n");
118: }


Listing 3
1:
2: /*
3: * Function: alloc_rec
4: * Purpose: Allocate memory for a Database record,
5: * checking for an allocation error
6: * Parameters: None
7: * Return Value: Pointer to memory, or NULL on error
8: */
9:
10: struct record *alloc_rec(void)
11: {
12: struct record *temp;
13:
14: if ((temp = malloc(sizeof(struct record))) == NULL)
15: return NULL;
16: else
17: return temp;
18: }
19:































Generating Source For <float.h>


Timothy Prince


Timothy Prince has worked 25 years in aerodynamic design and computational
analysis. He first wrote software in BASIC in 1967 on a GE225 with KSR32
terminals. His first renovation of old FORTRAN came shortly thereafter,
leading to more such projects, including seminars on adapting code to modern
architectures. He received a B.A. in physics from Harvard and a Ph.D. in
mechanical engineering from the University of Cincinnati. He can be contacted
at 39 Harbor Hill, Grosse Pointe Farms, MI 48236.


The standard for ANSI C defines several header files which are not supported
under many current compilers. Most of these header files may be created by
reference to the library documentation of a given compiler and a good
reference on ANSI C.
Two header files contain most of the definitions of the size of the arithmetic
types supported by an ANSI C compiler. The integer types are described in
<limits.h>, and the appropriate parameters may be looked up in the
documentation of most systems. Floating point types are described in
<float.h>. On BSD systems, similar territory is covered by <values.h>. Since
this concept may be new to many C programmers, I will describe a C program
which may be run to create <float.h> without depending on non-standard headers
such as <values.h>, and some of the reasons why one might wish to use such a
file.
Although the ANSI standard permits all manner of expressions in <float.h>, its
utility is not severely compromised by restricting the values to constants.
Additionally, if <float.h> values are constants, then in many cases they can
be acted upon by integer comparisons and can be used to control conditional
compilation. Technically, some of the example macros shown are non-portable.
The ANSI committee didn't restrict <float.h> parameters to #if expressions.
This allows the environment to change at runtime. For example, if the rounding
mode could be changed by a function call, then FLT_ROUNDS would become a
runtime function rather than a function only of the compiler options. Or, the
committee may have meant to remove some of the architecture dependence from
<float.h>, making it independent of the target hardware. It seems to me that
such independence defeats the purpose of header files. An example similar to
the unsatisfactory behavior permitted by ANSI C occurs in the BSD <values.h>,
where DSIGNIF is defined in terms of sizeof(double), preventing the
preprocessor from using DSIGNIF to control conditional compilation.
In my view, a compiler should be able to select appropriate header files,
rather than having the header file select constants from a library which the
compiler has chosen according to the options in effect. If we have a portable
program which can put these parameters into the header file in a form which
the preprocessor can use, we can overcome these restrictions.


Relationship To Floating Point Standards


Two IEEE draft floating point standards prescribe the system behaviors which
are described in <float.h>. These are P754, for 32/64-bit floating point, and
P854, for any width floating point. One might expect to be able to write
<float.h> directly from the P754 standard. In practice, not only are there
variations in whether long double differs from double, but also in whether the
implementors choose to apply IEEE rounding in long double or double precision.
The Cray, DEC, and IBM traditions, predating the IEEE standards activity, also
dictate certain shortcuts which increase speed at the cost of accuracy.
Fortunately, there are ways to test the arithmetic by running C code on the
target system, so that float.h may be created without depending on the style
of C support.


Computing <float.h> Parameters


The first characteristic to be determined is the radix, or base, for floating
point arithmetic. IEEE P754 requires base two, with P854 permitting also base
10. Only with these bases is there a reasonable degree of associativity in
computational arithmetic, so that the expression .5*(x+y) evaluates the same
as x*.5+y*.5 for all values of x and y between 2*DBL_MIN and .5*DBL_MAX.
IBM-compatible mainframes, which use base 16 arithmetic, are the most common
environment which uses a non-IEEE radix. Early versions of Aztec C used a
radix of 256 so that normalizations could be performed by byte string moves,
leading to a waste of up to a full byte out of each number.
The test code exploits the properties of floating point arithmetic to
determine the limiting parameters needed in <float.h>. The radix is found by
determining the smallest power of 2 which is not affected by adding 1 (due to
the limitations of the precision). This is a large integer, such as 253 for
IEEE P754 double. Then the next floating point value is greater by the radix.


Accuracy


The float.h parameters DBL_EPSILON, FLT_EPSILON, LDBL_EPSILON are the
difference between 1 and the next larger value of double, float, and long
double. Corresponding parameters DBL_DIGITS etc., give the approximate
equivalent number of significant digits. An example of the use of these would
be conditional compilation to select the data type which should provide a
required level of accuracy on various systems. The code to find these numbers
resembles the code used to find the radix.
Since long double is not supported by many compilers, the type register double
is assumed to be equivalent for non-ANSI compilers. The code presented here
has been tested successfully on a variety of systems, all of which satisfy the
requirement that all register doubles within a loop remain in long double
precision. Long double generally is not supported in log(), atof(), and
printf().


Rounding


ANSI C provides for a parameter specifying the type of rounding behavior,
defined as whether rounding is performed in addition operations. The implicit
assumption that this behavior does not depend on data type is not strictly
true. For instance, double arithmetic on 8087 and 68881 systems may be rounded
first to 64 bits and then to 53 bits precision, causing slight errors. Certain
older systems may have better rounding in float precision than double or
register double. By performing these tests in long double, we can use
"float.h" as a guide to wringing the last bit of accuracy out of long double
arithmetic. Performing intermediate steps in higher precision than the final
destination will provide maximum accuracy in other cases.
There is no apparent concern in the ANSI C standard for the distinction
between IEEE-style rounding, where numbers are rounded to the nearest even
when they fall exactly on a rounding threshold, and DEC-style rounding, where
the threshold numbers are rounded away from zero. Likewise, ANSI C does not
care, as the IEEE standard does, whether one's or two's complement is used for
floating point. Unlike <limits.h>, <float.h> does not tell whether the range
of negative numbers may be larger than the range of positive numbers. To my
knowledge, the most serious practical implication of ignoring these points in
<float.h> is the inability to detect certain obsolete machines which switch
from FLT_ROUNDS == 0 behavior for positive and small negative values to
FLT_ROUNDS == 3 for larger negative numbers.
You will notice, by examining the C code, that the rounding test expressions
could be simplified to a point where they will indicate that rounding is not
performed. Aside from the test for subgrd, which certain K&R compilers may
change to the point where it cannot be passed, known compilers are close
enough to ANSI C that this should not be a problem. We are interested in how
the system executes a C program, so it is immaterial whether the compiler or
the hardware cause incorrect rounding.
The addition and subtraction tests generate the expressions
1 + .5*(1+LDBL_EPSILON),
1 + .5*(1-LDBL_EPSILON),
1 - .5*(1+LDBL_EPSILON)/FLT_RADIX,
1 - .5*(1-LDBL_EPSILON)/FLT_RADIX,
and check to see how they are rounded. These tests cannot be passed unless the
arithmetic is carried out exactly before rounding (as in the case of float
arithmetic carried out in double precision) or a "sticky" bit is used to
record the loss of bits shifted out. Similar tests with the _EPSILON removed
could be used to distinguish IEEE from DEC-style rounding.
Tests have been added to determine whether multiplication and division are
rounded in a manner similar to addition. These tests check only whether the
largest possible fraction of _EPSILON is lost in rounding each side of 1.
Many compilers choose to convert the vector by scalar division operation to an
inversion outside the loop with multiplications inside, gaining speed at the
cost of accuracy. The compiler may even be set up so that it passes the IEEE
conformance tests which do not exercise vector operations.
Similar tests are available for sqrt(), but have not been included here
because conditional compilations are not likely to be based on the correctness
of sqrt().


Range



Most systems have different ranges for various floating point types, with
(perhaps total) loss of accuracy when numbers exceed the range. It may also be
useful to know what range of exponent values as well as how much accuracy is
possible when setting up format strings for printf().
To test for the values of DBL_MIN.. et al, we start with (1+DBL_EPSILON...)
and divide repeatedly by FLT_RADIX until a loss of accuracy occurs which
prevents us from recovering the previous value by multiplying by FLT_RADIX.
This sequence of operations finds the "minimum normalized value" for each data
type, which is the smallest value for which the characteristics DBL_DIGITS...
are valid. On some systems, this value is the smallest non-zero number
available. If there is gradual underflow, the smallest non-zero number will be
_EPSILON times this size. If these partially underflowed numbers are
supported, as the IEEE standard demands, they may not (as on the 8087) be
allowed as a divisor, so portable programs will not rely on them. In order to
avoid complaints from log() when it gets a very small argument, we calculate
the logarithm for the long double case from the loop count. Thus we need not
depend on availability of logl(). The extra runtime taken by this slow and
simple code on a 68881 is insignificant for code which is to be run only a few
times per compile.
To find DBL_MAX... we try powers of FLT_RADIX until we get an overflow,
indicated by reaching a value which does not increase when multiplied again by
FLT_RADIX. We must be prepared for the program to abort, since that is the way
DEC works. If the code keeps running, we go on and do the same for all the
data types. On modern systems, there is a special bit pattern obtained at
overflow, which printf() displays as "Inf"(inity). The code will detect the
Inf result because it will not change as we try to back down to smaller
magnitudes. Previous values are saved so that DBL_MAX may be displayed as the
number obtained before Inf is obtained twice. The computations use negative
numbers because some older systems used two's complement for floating point as
well as for integers. On such systems the maximum negative number could be
exactly a power of the radix, with the maximum positive number (which we are
searching for) being slightly less.


Accuracy Of Conversion To And From Decimal


Compilers may have trouble digesting the MAX and MIN decimal values if they
happen to convert to binary values outside the range. The IEEE standards
specify ability to convert within .47 _EPSILON, but this is possible only when
the printf() and scanf() functions use extended precision. For example, the
IEEE-specified ability to convert to within .47 DBL_EPSILON is implemented in
the 8087 by the use of long double, so that double conversions should be
accurate. Although long double has over 19 significant digits, only 17 or 18
digits will be converted correctly to decimal. The number 17 complies with the
IEEE standard, and the correct conversion of integers to 18 digits complies
with COBOL compiler standards. On a machine which implements long double as
the same as double, as many significant digits may be lost in printf() as
there are in the exponent. The IEEE standard makes no demands on the ability
to convert long double to decimal, so C libraries cannot be expected to
support it without casting to double.
Several of the atof() or scanf() functions I have tried were inadequate, so
one is included in the code. The results written into float.h are adjusted
away from the limits according to the amount of error found when nearby
numbers are formatted by sprintf() and checked by atof(). The code, of course,
trusts the compiler to use an adequate scheme for reading the constants from
<float.h>! The difficulty of precisely converting constants may be another
reason for the ANSI committee's caution against expecting portable constants
in <float.h>. The standard allows for <float.h> constants to be in hexadecimal
or other non-standard form, so that compilers may read them without error.


Conclusion


The ANSI C header file <float.h> may be obtained by running the test code on
the target system (see Listing 3 and Listing 4 for sample output). The
floating point arithmetic characteristics which we might wish to test when
performing a conditional compilation may be pretested and summarized in
<float.h> so that code may be written to work to best advantage on a variety
of architectures. This is a good way to explain the use of roundabout
expressions which are needed to overcome numerical difficulties.
References
Cody, W. J. et al. "A Proposed Radix- and Word-length-independent Standard for
Floating-point Arithmetic," IEEE MICRO reprint, 1984. This reference sets out
the P854 draft standard, with some explanations.
Stevenson, David et al. "A Proposed Standard for Binary Floating-Point
Arithmetic," IEEE COMPUTER reprint, 1981.
Plauger, P.J., Brodie, J. Standard C, Microsoft Press 1989. This reference
explains the reasons for the P754 standard.
Coding For Accuracy
Here is an example of how accuracy might be preserved for non-IEEE radix while
allowing normal systems to use fast code:
#include <float.h>

#if FLT_RADIX != 2 && FLT_RADIX != 10
y= ((b>=0)==(c>=0))? a*b+a*c : a*(b+c);
#else
y = a*(b+c);
#endif
Of course, such code is more desirable when b and c are known to be of the
same sign, so that the runtime conditional may be eliminated. Such attempts to
retain the accuracy which would be lost by non-standard arithmetic, cost more
time than is gained by faster hardware.
A textbook case in which the effects of inaccurate division may be eliminated
by careful coding is in the final calculation of a rational approximation for
exp() (see Listing 1).
The odds term is known to be much smaller than the evens term, so the
inaccuracy in division is buried by performing the addition of .5 or 1 last
when addition is known to be more accurate. On non-pipelined systems, the
second alternative may be as fast as the first.
Exploiting Range Information
This example shows how to use range parameters to determine whether the usual
DEC and IBM style heroic measures need be taken when calculating the
hypotenuse of a triangle:
float x,y;
double tmp1,tmp2;
#if DBL_MAX_10_EXP > 99 «« DBL_MAX_10_EXP > FLT_MAX_10_EXP
hyp= sqrt (x*(double)x+y*(double)y);
#else
hyp= ((tmp1=fabs(x))>(tmp2=fabs(y)))?
tmp1*sqrt((y/x)*(y/x)+1):
tmp2*sqrt((x/y)*(x/y)+1);
#endif
You don't want to have your code abort in the middle of a three hour job
because of an unrecoverable overflow or divide by zero, but you also want to
avoid writing code which spends 20 percent of its CPU time calculating the
hypotenuse. The IEEE precision extensions preclude this problem, but, with
conditional compilation, we can still run the conservative method on the
machines which require it.

Listing 1
#if FLT_RADIX != 2 && FLT_RADIX != 10
#if FLT_ROUNDS == 1
#if DIV_ROUNDS == 1
y= ldexp((evens+odds)/(evens-odds),m);
#else
y= ldexp(odds/(evens-odds) + .5,m+1);
#endif
#elif FLT_ROUNDS == 0 /* make it round up */
y= ldexp((odds*2/(evens-odds)+(odds>0?
.5*(1+DBL_EPSILON):.5)) + .5, m );
#else /* just make sure it gets promoted to maximum precision */
y= ldexp(((long double)evens+odds)/((long double)evens-odds),m);
#endif

#else /* undesirable radix, keep intermediate calculation in best zone
** pow(radix,n) * frac where .5 <= frac <= 1 */
y= odds<0?ldexp(odds*2/(evens-odds)+1,m):
ldexp(odds/(evens-odds)+.5,m+1);
#endif


Listing 2
/* Program to create "float.h" by running tests on target system
** using techniques from Kahan a Karpinski Paranoia code
** some of which are credited to W F Cody
** written by T C Prince Sept 1989
** tested on:
** SunOS 3.5 -ffpa
** SunOS 4.0 -fsoft, -f68881, -f68881 -fstore
** Multiflow 7/200 cc 1.6.2 -checkout, -01, -03
**
** long double is simulated on non-ANSI compilers as register double
*/
#include <math.h>
#include <stdio.h>
/* functions which narrow precision to float and double */
#ifdef __STDC______LINEEND____
float storeff(double x); 
{
return x;
}


double storef(long double x);
{
return storeff(x);
} /* narrow to float storage */


double store(long double x);
{
return x;
} /* force narrowing to double storage */

#else
long storeff(x)
long x;
{
return x;
}


double storef(x)
register double x;
{
union {
float f;
long l; 
} xx;
xx.f = x;
xx.l = storeff(xx.l);
return (double)xx.f;
}



double store(x)
register double x;
{
return x;
} /* force narrowing to double storage */


#define SEEK_SET OL
#endif
#define wp854(radix,precis) ((int)(log(radix)/log(10.)*precis+2))
/* calculate width in decimal digits according to IEEE P854 */
#define max(x,y) ((x)>(y)?(x):(y))
#define min(x,y) ((x)<(y)?(x):(y))
#define ODD(i) ((i)&1) /* like Pascal */
#include <ctype.h> /* check what's there, it may not be too swift */

double atof(dstr) char *dstr; /* any correctly working atof() may be used */
{
int c,
/* difference from exponent of accumulated integer to final value */
exp = 0,

neg_val = 0,/* leading negative sign ? */
e_exp = 0, /* number after e/E +/- */
neg_exp = 0;/* - after E/e ? */
#ifdef __STDC______LINEEND____
long double x,retval = 0;
#else
register double x,retval = 0;
#endif
while (isspace(c = *dstr)) /* skip leading space */
dstr++;
switch (c) { /* optional sign */
case '-':
neg_val=1;
case '+': /* fall-through */
dstr++;
}

while (isdigit(c = *dstr++)) {
/* add up converted values before dec point */
retval = 10 * retval + (double)(c - '0');
}
if ( c == '.')
while (isdigit(c = *dstr++)) {
--exp; /* and after dec pt */
retval = 10 * retval + (double)(c - '0');
};
if ((c ' ')== 'e' ) { /* found an exponent lead-in */

switch (*dstr) { /* sign field */
case '-':
neg_exp = 1;
case '+': /* fall-through */
case ' ': /* many FORTRAN environments generate this! */
dstr++;
default:

; /* fall-through */
}
if (isdigit(c = *dstr)) /* found exponent digit */
do
e_exp = 10 * e_exp + c - '0';
while (isdigit(c = *++dstr));
}
if (!neg_exp) {
/* we count on pow having a positive integer power case;
** a production shared library would use exp as an index into a table
** of powers of 10, but we don't know the size of the table yet;
** to reduce error and keep down the table size, we would have only
** positive powers, but partially underflowed numbers would require
** 2 steps */
if ((exp += e_exp) > O)
retval *= pow(10., (double)exp);
else if (exp < 0 ) /* don't introduce extra error */
retval /= pow(10., (double)(-exp));
} else { /* this should avoid unnecessary underflow */
if(ODD(exp = e_exp-exp))retval /= 10;
for(x=10;exp>1;retval /= x)
do
x *= x; /* generate 100,10000,... */
while(!ODD(exp >>= 1));
}
return (neg_val ? -retval : retval); /* apply sign */
}
#ifdef __STDC______LINEEND____
long double cvtest(x, wpi)
int wpi; /* ieee width */
long double x;
{ /* find error in conversion to dec and back */
char dec[50];
sprintf(dec, "%.*Lg", wpi, x);
return atof(dec);
}
#else
double cvtest(x, wpi)
int wpi; /* ieee width */
double x;
{
char dec[50];
sprintf(dec, "%.*g", wpi, x);
return atof(dec);
}


#endif
/* begin main */
main()
{
FILE * stream; /* file handle for "float.h" output */
long overfpos;/* position in file for DBL_MAX */
/* internal results: subtraction guarded? widths associated with precison */
int subgrd, count, precis, wpd, wpf, wpl;
/* internal results corresponding to _RADIX, _EPSILON, _EPSILON/RADIX,
1/_EPSILON */
double radix, ulpmd, ulppd, ulpmf, ulppf, ulppl, ulpml, w;
/* internal variables in largest accessible precision */
#ifdef__STDC______LINEEND____

long double y, z, v, sat;
#else
register double y, z, v, sat;
#endif
/*
**
** RADIX */
double y1, y2, h, v1[16], v2[16]; /* internal test values */
stream = fopen("float.h", "w+");
/* find smallest power of 2 which is unaffected by adding 1 */
w = 1;
do
w += w;
while (store(w + 1) - w >= 1);
/* radix: smallest amount by which w may be increased */
y = 1;
while (!(radix = store(w + (y += y)) - w))/* stop as soon as non-zero */
;
/* assume radix is same regardless of precision */
fprintf(stream, "#define FLT_RADIX\t%d\n", (int)radix);
/*
**
** PRECISION */
w = 1;
precis = 0;
do {
++precis;
w *= radix;
} while (store(w + 1) - w == 1); /* stop when error seen */
ulppd = (ulpmd = 1 / w) * radix;
wpd = wp854(radix, precis); /* corresponding decimal digits with guard */
fprintf(stream,
"#define DBL_EPSILON\t%.*g\n#define DBL_DIGITS\t%d\n#define DBL_MANT_
DIG\t%d\n",
wpd, ulppd, (int)ceil(-log(ulppd) / log(10.)),precis);
w = 1;
precis = 0;
do {
++precis;
w *= radix;
} while (storef(w + 1) - w == 1);
ulppf = (ulpmf = 1 / w) * radix;
wpf = wp854(radix, precis);
fprintf(stream,
"#define FLT_EPSILON\t%.*g\n#define FLT_DIGITS\t%d\n#define
FLT_MANT_DIG\t%d\n",
wpf, ulppf, (int)ceil(-log(ulppf) / log(10.)),precis);
w = 1;
precis = 0;
do {
++precis;
z = (w *= radix) + 1;
} while (z - w == 1);/* try to force order of operations */
ulppl = (ulpml = 1 / w) * radix;
wpl = wp854(radix, precis);
fprintf(stream,
"#define LDBL_EPSILON\t%.*g\n#define LDBL_DIGITS\t%d\n#define
LDBL_MANT_DIG\t%d\n",
wpl, ulppl, (int)ceil(-log(ulppl) / log(10.)),precis);
/* ** MULTIPLICATION ROUNDING */
/* multiplication rounding test not ANSI-defined
** all rounding tests based on highest available precision

** and thus will vary with compiler options which change precision
** some known results:
** 68881 (Sun 4.0)
** multiply and divide are rounded in register precision, but compiler
** may round to double instead under certain circumstances, including
** -fstore, they are rounded in double. add/subtract are rounded in
** register precision but suffer from double rounding in double storage
** 8087
** all double operations suffer from double rounding
** register precision not accessible with most compilers
** Multiflow
** operations are rounded, but compiler converts v[]/y to v[]*1/y
** which suffers from double rounding
** -checkout (fc 1.6.2) fouls up addition rounding test
** Convex
** compiler reorders operations, often losing 1 bit of precision
** IBM 360 architecture
** all operations are chopped
** Cray-like architectures (including Convex)
** vector/scalar division suffers from double rounding
** or the pre-inversion itself may suffer from multiple rounding
*/
z = ulpml;
y = ulppl;
/* check for adequate guards to permit round up */
if (!(y1 = (1 + y) * (1 - y) - 1) & !(y2 = (1 + y + y) * (1 - y) - 1 - y))
fprintf(stream, "#define MUL_ROUNDS\t1\n");
else if (y1 < 0 & y2 == -y)
fprintf(stream, "#define MUL_ROUNDS\t0\n");
else
fprintf(stream, "#define MUL_ROUNDS\t-1\n");
/*
** DIVISION ROUNDING */
/* test rounding of division, not ANSI C required */
for (count = -1; ++count < 16; ) { /* initialize vectors where loops cannot
combine */
v1[count] = 1 - ulpmd - ulpmd;
v2[count] = 1 + ulppd + ulppd;
}/* now do register rounding tests */
if (!(y1 = (1 - z - z) / (1 - z) - 1 + z) & !(y2 = (1 + y + y) / (1 + y) - 1 -
y) )
fprintf(stream, "#define DIV_ROUNDS\t1\n");
else if (y1 < 0 & y2 < 0)
fprintf(stream, "#define DIV_ROUNDS\t0\n");
else
fprintf(stream, "#define DIV_ROUNDS\t-1\n");
/*
** VECTOR BY SCALAR DIVISION */
/* results may depend on compiler options
** same tests as above but memory to memory double arithmetic */
y1 = 1 - ulpmd;
y2 = 1 + ulppd;
for (count = -1; ++count < 16; ) {
v1[count] /= y1;
v2[count] /= y2;
}
if (!(y1 = v1[0] - y1) & !(y2 = v2[0] - y2) )
fprinf(stream, "#define VECTOR_BY_SCALAR_DIV_ROUNDS\t1\n");
else if (y1 < 0 & y2 < 0)
fprintf(stream, "#define VECTOR_BY_SCALAR_DIV_ROUNDS\t0\n");
else

fprintf(stream, "#define VECTOR_BY_SCALAR_DIV_ROUNDS\t-1\n");
/*
**
** ADD/SUBTRACT ROUNDING -- ANSI REQUIRED */
/* find out if add/subtract are guarded; put in file ? */
/* K&R C allows cheating here but it probably doesn't matter
** unless the compiler fouls up like Convex which forms (1+z)-1
** resulting in z*2 thus calculating subgrd == 0 */
subgrd = 1 - (1 - z) == z & radix - (radix - y) == y;
/* test rounding of add/subtract with expressions close to threshold
** If rounding test fails, it may mean that the rounding threshold
** varies, as with fast software floating point, or it could be double
** rounding as on 8087 and 68881 */
if (1 + y * (1 - y) == 1 & 1 - z * z == 1 - z)
/* much more detailed tests would be needed to distinguish FLT_ROUNDS == 3
** for obsolete machines which switch from 0 to 3 at -1 or -.5*/
fprintf(stream, "#define FLT_ROUNDS\t0\n");
else if (1 + y == 1 + (y + .5) * y & 1 == 1 + (.5 - y) * y &
1 - z == 1 - (y + .5) * z & 1 == 1 - (.5 - y) * z & subgrd)
fprintf(stream, "#define FLT_ROUNDS\t1\n");
else
fprintf(stream, "#define FLT_ROUNDS\t-1\n");
/*
** MINIMUM NORMALIZED VALUES */
/* test for partial underflow, turn off trapping */
z = (h = 1 / radix) * (1 + ulppd);
/* loop, dividing by radix, until irreversible result obtained
** there are ways to make this much faster */
do
y = z;
while (store(z *= h) * radix == y);
/* if conversion errors, adjust away from threshold
** using precautions to avoid abrupt underflow on non-IEEE machine */
y *= fabs(cvtest(y * radix, wpd) / (y * radix) - 1) + 1;
fprintf(stream,
"#define DBL_MIN\t\t%.*g\n#define DBL_MIN_10_EXP\t%d\n#define
DBL_MIN_EXP\t%d\n",
wpd, y, (int)(log(y) / log(10.)),(int)(log(y)/log(radix)+.5));
z = h * (1 + ulppf);
do
y = z;
while (storef(z *= h) * radix == y);
y *= fabs(cvtest(y * radix, wpf) / (y * radix) - 1) + 1;
fprintf(stream,
"#define FLT_MIN\t\t%.*g\n#define FLT_MIN_10_EXP\t%d\n#define
FLT_MIN_EXP\t%d\n",
wpf, y, (int)(log(y) / log(10.)),(int)(log(y)/log(radix)+.5)); 
z = h * (1 + (y = ulppl));
/* count used to obtain log base radix in case log() would underflow */
count = 0;
do {
++count;
y = z;
} while ((z *= h) * radix == y);
y *= fabs(cvtest(y * radix, wpl) / (y * radix) - 1) + 1;
/* K&R libraries may fail to display meaningful values */
#ifdef_STDC_____LINEEND____
fprintf(stream,
"#define LDBL_MIN\t%.*Lg\n#define LDBL_MIN_10_EXP\t%d\n#define
LDBL_MIN_EXP\t%d\n",
#else
fprintf(stream,

"#define LDBL_MIN\t%.*g\n#define LDBL_MIN_10_EXP\t%d\n#define
LDBL_MIN_EXP\t%d\n",
#endif
wpl, y, (int)((1-count) / log(10.) * log(radix) ),1-count);
/*
** MAXIMUM NORMALIZED VALUES */
/* keep overwriting overflow threshold in case of abort, but turn off
trapping if possible */
overfpos = ftell(stream);
z = radix * (y = -radix); /* - numbers may work best for 2's complement */
/* loop until multiplying by radix doesn't make it more negative */
do {
v = y;
if (fseek(stream, overfpos, SEEK_SET))
printf("seek failure in overflow search\n");
fprintf(stream,
"#define DBL_MAX\t\t%.17g\n#define DBL_MAX_10_EXP\t%d\n#define
DBL_MAX_EXP\t%d",
-z, (int)(log(-z) / log(10.)),(int)(log(-z)/log(radix)+.5)); -
if (fflush(stream))printf("flush failure in overflow search\n");
z = store((y = z) * radix); } while (z < y);
sat = (-y);
if ((z = (y = v * (radix * ulppd - radix)) + v * ((1 - radix) * ulppd)) <
sat)y= z;
if (y < sat)v = y;
if (sat - fabs(v) < sat)v = sat;
/* if conversion errors, adjust away from threshold */
v -= fabs(cvtest(v * .75, wpd) - v * .75) * radix;
if (fseek(stream, overfpos, SEEK_SET))
printf("seek failure after overflow search\n");
fprintf(stream,
"#define DBL_MAX\t\t%.*g\n#define DBL_MAX_10_EXP\t%d\n#define
DBL_MAX_EXP\t%d\n",
wpd, v, (int)(log(v) / log(10.) ),(int)(log(v)/log(radix)+.5)); 
fprintf(stream, "#ifndef HUGE_VAL\n#define HUGE_VAL\t%g\n#endif\n", sat);
z = radix * (y = -radix);
do {
v = y;
z = storef((y = z) * radix); } while (z < y);
sat = (-y);
if ((z = (y = v * (radix * ulppf - radix)) + v * ((1 - radix) * ulppf)) <
sat)y= z;
if (y < sat)v = y;
if (sat - fabs(v) < sat)v = sat;
v -= fabs(cvtest(v * .75, wpf) - v * .75) * radix;
fprintf(stream,
"#define FLT_MAX\t\t%.*g\n#define FLT_MAX_10_EXP\t%d\n#define
FLT_MAX_EXP\t%d\n",
wpf, v, (int)(log(v) / log(10.) ),(int)(log(v)/log(radix)+.5));
z = radix * (y = -radix);
count = -1;
do {
++count;
v = y;
z = (y = z) * radix; } while (z < y);
sat = (-y);
if ((z = (y = v * (radix * ulppl - radix)) + v * ((1 - radix) * ulppl)) <
sat)y = z;
if (y < sat) v = y;
if (sat - fabs(v) < sat) {
++count;
v = sat; }
v -= fabs(cvtest(v * .75, wpl) - v * .75) * radix;
/* K&R libraries may fail to display meaningful values */
#ifdef __STDC______LINEEND____
fprintf(stream,

"#define LDBL_MAX\t%.*Lg\n#define LDBL_MAX_10_EXP\t%d\n#define
LDBL_MAX_EXP\t%d\n",
#else
fprintf(stream,
"#define LDBL_MAX\t%.*g\n#define LDBL_MAX_10_EXP\t%d\n#define
LDBL_MAX_EXP\t%d\n",
#endif
wpl, v, (int)((count+2) * log(radix) / log(10.) ),count+2);
exit(0); }


Listing 3
/* "float.h" for Multiflow Trace cc 2.1.3 */
#define FLT_RADIX 2
#define DBL_EPSILON 2.2204460492503131e-16
#define DBL_DIGITS 16
#define DBL_MANT_DIG 53
#define FLT_EPSILON 1.1920929e-07
#define FLT_DIGITS 7
#define FLT_MANT_DIG 24
#define LDBL_EPSILON 2.2204460492503131e-16
#define LDBL_DIGITS 16
#define LDBL_MANT_DIG 53
#define MUL_ROUNDS 1
#define DIV_ROUNDS 1
#define VECTOR_BY_SCALAR_DIV_ROUNDS -1
#define FLT_ROUNDS 1
#define DBL_MIN 2.2250738585072067e-308
#define DBL_MIN_10_EXP -307
#define DBL_MIN_EXP -1021
#define FLT_MIN 1.17549449e-38
#define FLT_MIN_10_EXP -37
#define FLT_MIN_EXP -125
#define LDBL_MIN 2.2250738585072067e-308
#define LDBL_MIN_10_EXP -307
#define LDBL_MIN_EXP -1021
#define DBL_MAX 1.7976931348623055e+308
#define DBL_MAX_10_EXP 308
#define DBL_MAX_EXP 1024
#ifndef HUGE_VAL
#define HUGE_VAL Inf
#endif
#define FLT_MAX 3.40282347e+38
#define FLT_MAX_10_EXP 38
#define FLT_MAX_EXP 128
#define LDBL_MAX 1.7976931348623055e+308
#define LDBL_MAX_10_EXP 308
#define LDBL_MAX_EXP 1024


Listing 4
/* "float.h" for 68881, Sun C 4.0 */
#define FLT_RADIX 2
#define DBL_EPSILON 2.2204460492503131e-16
#define DBL_DIGITS 16
#define DBL_MANT_DIG 53
#define FLT_EPSILON 1.1920929e-07
#define FLT_DIGITS 7
#define FLT_MANT_DIG 24
#define LDBL_EPSILON 1.08420217248550443401e-19
#define LDBL_DIGITS 19

#define LDBL_MANT_DIG 64
#define MUL_ROUNDS 1
#define DIV_ROUNDS 1
#define VECTOR_BY_SCALAR_DIV_ROUNDS 1
#define FLT_ROUNDS 1
#define DBL_MIN 2.2250738585072019e-308
#define DBL_MIN_10_EXP -307
#define DBL_MIN_EXP -1021
#define FLT_MIN 1.17549449e-38
#define FLT_MIN_10_EXP -37
#define FLT_MIN_EXP -125
#define LDBL_MIN 0
#define LDBL_MIN_10_EXP -4931
#define LDBL_MIN_EXP -16382
#define DBL_MAX 1.7976931348623155e+308
#define DBL_MAX_10_EXP 308
#define DBL_MAX_EXP 1024
#ifndef HUGE_VAL
#define HUGE_VAL Inf
#endif
#define FLT_MAX 3.40282347e+38
#define FLT_MAX_10_EXP 38
#define FLT_MAX_EXP 128
#define LDBL_MAX -Infinity
#define LDBL_MAX_10_EXP 4932
#define LDBL_MAX_EXP 16384





































Standard C


A Matter Of Interpretation




P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standards committee.




Introduction


I have just returned from a meeting of X3J11, the committee that developed the
ANSI standard for C. That committee is now charged with answering questions
about the standard. Anyone who needs some help interpreting the C standard can
write a letter to the X3 Secretariat. The committee is to meet from time to
time to review these requests for interpretation and to draft responses. This
was the first interpretation meeting, since the C standard has just been
approved by ANSI.
A certain amount of committee time went to learning a new way of doing
business. Through most of its history, X3J11 concerned itself with writing the
standard for C. If we found something confusing or unpopular, we focused on
finding better wording to replace what was there. For the last year or so, the
focus was on achieving closure. We were quick to make changes that clarified
the intent of the standard or fixed obvious typos. We were much slower to make
changes that were substantive (that changed the definition of the C language),
unless the problem was too severe to ignore.
Now the committee is in a different position. It cannot change a single word
of the ANSI C standard. It doesn't matter how confusing it might be. It
doesn't matter if the committee had a different intent in the past. It doesn't
matter if it has a really neat idea for how to do things better now. The C
language is defined by what the ANSI standard now says. And it's likely to
stay that way for at least the next five years.
What the committee can (and must) do is help others understand the ANSI C
standard. A request for interpretation through official channels must be
answered. The answer must be approved by a roll call vote. At least two thirds
of those voting must vote in favor of the answer. The answer is then forwarded
to the X3 Secretariat for further processing.
The X3 Secretariat places each answer in one of three classes:
Class A -- Either the answer is found in one or more cited sections of the
standard or the answer is outside the scope addressed by the standard.
Class B -- The committee acknowledges that the standard contains a
typographical error which could lead to an ambiguity. The ambiguity is
resolved by citing earlier X3J11 documents to defend the interpretation put
forth by the committee.
Class C -- The committee must interpret the standard by "interpolation or
extrapolation."
If X3 deems that the response belongs in Class A, it simply forwards the
response to the inquirer. Otherwise, X3 must approve the interpretation by a
30-day letter ballot among its members. Approved responses are disseminated to
the general public in the form of Technical Information Bulletins. Note that
even a Technical Information Bulletin does not change the meaning of the ANSI
standard. It merely supplies additional information to help people interpret
the standard.


The Easy Stuff


The hardest part about shifting gears is realizing that the standard can no
longer be changed. No matter how many years people spend reviewing and
refining a technical document, infelicities still filter through. The old
reflex of rewording to fix must now be stifled. We have to live with what's
there, not rephrase our way out of problems.
We also have to stop worrying about how we want the C language to be. That's
cast in concrete now. We can discuss what we think the standard says. If there
is ambiguity, we can debate whether it can be read the way we intended. Our
choices are rather limited.
These constraints simplify in many ways the job of interpretation. Many
readers of the standard have the same attitude that the committee is trying to
outgrow. They see an area that is confusing or surprising and immediately
focus on how to fix it. What should be a simple request for interpretation
turns into a suggestion for change.
The committee has a simple answer to any such suggestions: "Thank you for
sharing, but the time for changes has passed. You'll have to wait for the next
revision of the C standard to address those issues." By X3's criteria, that's
a solid Class A response. It is outside the scope of the current standard.
Other responses are a little harder to formulate, but are still relatively
easy. So long as a reasonable reading of the standard supports the committee's
interpretation, X3 will still consider them Class A. The committee mainly has
to point to supporting words within the document.
Here is a fairly simple example. One of the committee members asked for a
clarification of what an implementation can do with #pragma preprocessing
directives. Many vendors use pragmas as the basic mechanism for altering the
semantics of a C program to provide a variety of extensions. But some people
on the committee have long felt that a strictly conforming program can safely
contain pragmas. These two viewpoints lead to a fundamental conflict.
If a strictly conforming program can contain pragmas, then what you can do
with them must be severely circumscribed. You can turn listings on or off. You
can turn debugging control on or off. But you can't change any behavior
dictated by the C standard for a strictly conforming program.
Whatever the intent of the members who voted for pragmas in their current
form, the standard is now clear. Section 3.8.6, page 94, line 34, says that a
#pragma preprocessing directive "causes the implementation to behave in an
implementation-defined manner." It does not restrict what parts of the
language may change. A strictly conforming program, however, may not use any
implementation-defined language features that can alter the output it
produces. Ergo, no strictly conforming program can contain a pragma (at least
not one that survives conditional preprocessing).
If an implementation doesn't have to worry about changing strictly conforming
programs, it has considerable license. A conforming implementation can
tolerate all sorts of extensions within conforming programs. It just can't
alter the behavior of a strictly conforming program in nonstandard ways.
This was an easy interpretation for the committee to make. It mostly involved
reading the standard and drawing obvious conclusions. Those conclusions need
not support the understandings agreed to in past committee meetings. They
merely have to follow directly from a reading of the C standard. That's the
easy part of interpretation.


A Little Harder Case


Now let's look at a slightly tougher case. Another one of the committee
members asked for clarification of the strtoul function. We added it to the
Standard C library as a companion to the older strtol. The formatted input (or
scan) functions must now call strtoul for unsigned integer conversions. An
implementation can no longer just call strtol and convert the result to
unsigned. Even if the bits come out right, as they usually do on most two's
complement machines, the overflow checking does not.
You can get overflow when converting an input field with strtoul. Type in a
12-digit number, for example, on a machine where unsigned long int is 32-bits
and the function will set errno to ERANGE. You can't get overflow when you
perform arithmetic involving unsigned operands, but you can when converting to
an unsigned type. Converting a floating type to an unsigned type can also
overflow.
Fine. All that is clear enough. Now let's add a small wrinkle. You can convert
a field that contains a negative number with strtoul. The committee explicitly
added this latitude because so many people habitually write -1 as shorthand
for the largest unsigned value. (They probably shouldn't, for a variety of
reasons. But they do.)
So the question is, can adding a minus sign to a field cause strtoul to report
overflow? If so, what is the lowest value that causes overflow? The standard
doesn't explicitly state what happens. The result is a matter of
interpretation.
One interpretation is that you can't convert any negative field to unsigned
without causing overflow. That is certainly true of floating point values. Any
value less than or equal to -1.0 in principle causes overflow. (Since this is
undefined behavior, an implementation has certain latitude. It can, for
example, just go ahead and produce the low-order bits of the negative number,
as some folks would prefer.)
A less extreme view is that a negative value must be representable as a signed
long int. Overflow occurs, then, only if the field value is less than
LONG_MIN. That would give strtoul the same behavior for negative values as
strtol.
The most tolerant view is that adding a minus sign to a field can never change
whether it overflows. Any magnitude between zero and ULONG_MAX converts
without overflow. Negating the resulting unsigned long int is a well defined
operation that never overflows.
So which of these is the proper interpretation? (Or is there yet another?) The
description of strtoul (Section 4.10.1.6) contains only two statements that
bear on the treatment of negative field values. On page 153, lines 40-41, it
says, "If the subject sequence begins with a minus sign, the value resulting
from the conversion is negated." And on page 154, lines 3-4, "If the correct
value is outside the range of representable values, ULONG_MAX is returned, and
the value of the macro ERANGE is stored in errno."
Now, this suggests that writing a minus sign in the field is not completely
silly. Otherwise, it would say so. That rules out the first interpretation.
And the value resulting from the conversion can be any value that can be
stored in an unsigned long int. That rules out any notion that strtol rules
should somehow apply.

Negation is well defined for unsigned quantities. It never produces a value
outside the range of representable values. A reasonable reading of the text,
then, leads to the third interpretation. Overflow occurs only when the
magnitude is too large to represent as an unsigned long int.
You can't really falut the standard for not spelling this behavior out in
detail. A standard that tries to do so for all cases would, in fact, be
unreadable. For precision, you have to depend upon an economy of statement.
Never repeat a requirement if you can avoid it, lest you say the same thing
two slightly different ways. Rely on orthogonality of concepts to specify
combinations too numerous to describe separately.
Admittedly, the description of strtoul could have been better. Had we
anticipated any confusion over the use of minus sign, we could have chosen our
words more carefully. Just a slight rephrasing would have emphasized how
overflow is determined and how negation is performed. We didn't get is
perfect, but we got it close enough to support a reasonable interpretation.


A Still Harder Case


Now let me describe a real interpretation problem. This one occupied
considerable air time at the last X3J11 meeting. It remains unresolved. Two
rather different interpretations are supported by the standard. Neither can be
clearly favored by recalling the intent of the committee. The matter wasn't
addressed in sufficient detail in past meetings.
What started the discussion was a request for interpretation from the Free
Software Foundation. It seems that the GNU C compiler wants a certain latitude
in how it returns a value from a function returning a structure type.
Depending on your philosophy, that latitude may or may not be a good idea.
To ensure that generated code is reentrant, each call to a function returning
a structure type must provide the storage for the return value. That storage
can be a secret temporary on the stack. Often, however, the return value is
immediately stored in a data object. It seems silly to have the function copy
the return value into the temporary, then have the caller copy the value again
to its final destination.
An obvious optimization is for the caller to pass the address of the final
destination to the function as the place to store the return value. It
eliminates a double copy which, for large structures in particular, can be
expensive. It looks, on the surface, like nothing but a good idea.
One problem can arise, however. Say the data object you are assigning to
overlaps the structure from which the function obtains its return value. If it
is exactly the same data object, the copy operation is redundant but should
work correctly. Now say that the overlap between source and destination is
only partial. Then, depending on how the bytes are copied, you can get a
curdled result.
A partial overlap of structures is rare but not impossible. A union can
contain two members each of which contains an instance of the same structure
type. If the instances do not lie at exactly the same offset within each
member, they can overlap partially. For example,
struct x {.....};
union y {
struct x member1;
struct {
char c2;
struct x s2;
} member2;
} u;
u.member1 = u.member2.s2;
It so happens that the standard directly addresses assignments of this form.
Section 3.3.16.1, page 54, lines 33-35 says, "If the value being stored in an
object is accessed from another object that overlaps in any way the storage of
the first object, then the overlap shall be exact and the two objects shall
have qualified or unqualified versions of a compatible type; otherwise the
behavior is undefined."
That clearly states that this assignment is chancy. It warns the programmer
not to indulge in such practices, lest the program do something unexpected and
undesirable. It also gives implementors considerable latitude. They can go
ahead and get the "right" answer anyway. Or they can allow such practices to
generate bad code. In either case, an implementation has no obligation to warn
the programmer. This is strictly a case of caveat emptor.
But what if you take the same declarations as above and add:
struct x frs(struct x);
u.memberl = frs (u.member2.s2);
The function may choose to return its argument as the value of the function.
If it does so, is it covered by the same caveat as for direct assignment? In
other words, is the result undefined if the source of the function value
happens to overlap the eventual destination for that value?
You can see why an implementor would want the same caveat to apply. It gives
the translator permission to make the obvious optimization without worrying
about any subtle consequences. Should the function return a value that
overlaps its destination, the copying operation is allowed to screw up. It is
up to the programmer to avoid situations where an unfortunate overlap might
occur.
On the other hand, you can see why a programmer might want the caveat not to
apply. It is one thing to be restricted in how you write assignments. At least
both left and right hand operands are more or less obvious. But function calls
can obscure any connection between left and right hand side. It is a greater
burden to check code for subtle situations where partial overlap might occur.
Two fundamental philosophies are at war here. On the one hand we have the
folks who believe that C should execute as fast as possible. The translator
should err on the side of efficiency. Leave it up to educated users to avoid
dangerous case. On the other hand we have the folks who believe that C should
be made into a safer language. Don't permit subtle errors to arise just to
make the language easier to optimize in all areas.
The caveat I quoted above was added to assignment statements purely for the
sake of the former camp. We decided that structure assignment should not have
to generate fail safe copying code. It could assume no overlap or exact
overlap of operands.
It is not clear, however, that the committee intended the caveat to extend to
implicit assignments. Returning a value from a function is such an implicit
assignment. The committee was clear that functions returning structures must
be reentrant. That prevents them from returning values by storing them in a
static data object. (Some existing C compilers generate code that does this.)
Requiring reentrancy does not exactly require copying the value to a
temporary.
At least not to everybody. It is a matter of interpretation.
And that's not the end of it. The issue may have been raised in terms of
functions returning structures, but it goes much further. If a function call
can muck up the copying of a value, why not another operator as well? It is
hard to extend the license to fail a little bit without extending it a lot.
And there is nothing in the words I quoted that is peculiar to structure types
alone. I have certainly written C compilers for small machines that copy
double operands around the same way that structures are copied. If structure
copies can work incorrectly, so too can double copies. So, in fact, can copies
of practically any data type.
So the request from the GNU folks raises serious issues. Whatever the
committee decides will likely fall well outside Class A. It will take more
discussion, careful thought, and clear reasoning to develop an acceptable
interpretation. It won't be easy.


Conclusion


As always when I criticize the C standard, I feel the need to put matters in
perspective. Focusing on the things that aren't nice makes them loom much
larger than they really are. Great expanses of the standard read just fine.
They describe a language that is largely unchanged from the days of Kernighan
and Ritchie's first description. They describe the language with a laudable
precision.
Remember that people request interpretations only when they get confused or
surprised. Confusion and surprise arise in those rare places where the wording
of the standard is spongy. With all the people proofreading drafts these last
couple of years, you can bet that the spongy places tend to describe second
order aspects of the language.
X3J11 was complimented by ANSI on the relative cleanliness of the final draft.
ANSI found few places where they required typographical fixes or changes in
style. For all the standard has been critically examined over the past year or
so, it has generated relatively few requests for interpretation. I think it's
in pretty good shape.
Please keep that perspective in mind as you digest this column. Try to keep it
in mind for at least another month. I will probably remind you next month
anyway. That's when I discuss what's really wrong with the ANSI C standard.
















Dr. C's Pointers(R)


The exit And abort Functions




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091
or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.




Introduction


The library functions abort and exit don't seem very exciting or complicated,
and in fact, they aren't. However, since they are often found in production
programs, you should understand them, since using them incorrectly may
compromise all kinds of disk and memory data. Consider the often-used example
of memory allocation:
if (ptr = malloc(size)) ==
NULL) {
fprintf (stderr, "Can't allocate
memory\n");
exit (1);
}
Terminating a program when memory cannot be allocated may be a reasonable
thing to do in some cases but not in all. Consider the case where you have
processed some part of a set of related transactions. You try to allocate
memory and this fails. Simply terminating the program would leave the database
logically compromised. You must either complete the rest of the transaction or
backout the part already done. Either way, you have cleanup to do.
Similarly, your program might be just one of several executing programs that
share the same piece of memory so, again, you may need to logically clean up
prior to terminating. You might even have to send a message to one or more
programs telling them you are "going down" so they don't queue up new work for
you.


abort vs. exit


While abort and exit both cause a program to terminate, exit does so in a
controlled manner whereas abort does not.
According to the ANSI C Standard, "The abort function causes abnormal program
termination to occur, unless the signal SIGABRT is being caught and the signal
handler does not return. Whether open output streams are flushed or open
streams closed or temporary files removed is implementation-defined. An
implementation-defined form of the status unsuccessful termination is returned
to the host environment by means of the function call raise(SIGABRT)."
On the other hand, "The exit function causes normal program termination to
occur. If more than one call to the exit function is executed by a program,
the behavior is undefined.
"First, all functions registered by the atexit function are called, in the
reverse order of their registration.
"Next, all open output streams are flushed, all open streams are closed, and
all files created by the tmpfile function are removed.
"Finally, control is returned to the host environment. If the value of status
is zero or EXIT_SUCCESS, an implementation-defined form of the status
successful termination is returned. If the value of status is EXIT_FAILURE, an
implementation-defined form of the status unsuccessful termination is
returned. Otherwise the status returned is implementation-defined."
Listing 1 creates a temporary file via tmpfile; it creates and writes to
TEST.DAT; it also writes to stderr. It is terminated by either abort or exit
on command.
The abort path produces the following output:
Enter A (abort), E (exit): A
error message to stderr
Abnormal program termination
The file TEST.DAT is empty since the output buffer was not flushed prior to
closing. The temporary file created by tmpfile remains in the directory.
(Using three different compilers under MS-DOS produced the three different
named temporary files _TEMPA.TMP, TEMP0001.$$$, and TMP1.$$$.)
When the more orderly exit path is chosen, the output is:
Enter A (abort), E (exit): E
error message to stderr
the temporary file is deleted, and TEST.DAT contains the following data:
message to data file
This example simply serves to demonstrate that these functions indeed behave
correctly, at least on the systems I checked. (Having worked for some years
now with quite a few C implementations, I have been bitten numerous times
because I believed marketing literature or technical documentation that said
"ANSI-conforming" or "This is exactly how it works." What they often mean is
"This should conform to the ANSI Standard" and "This is how it is supposed to
work." There's a big difference.)


Registering An Exit Handler


A big advantage with exit is that you can write an exit handler to intercept
calls to exit. See the example in Listing 2.

The atexit function in stdlib.h allows you to register an exit handler. This
function was invented by the ANSI C committee based on prior art. (At least
one vendor had a similar function called onexit which, for a time, existed in
a draft ANSI standard along with a typedef onexit_t.)
According to the ANSI C Standard, "The atexit function registers the function
pointed to by its argument, to be called without arguments at normal program
termination.
"The implementation shall support the registration of at least 32 functions.
"The atexit function returns zero if the registration succeeds, nonzero if it
fails."
In this example atexit registers the three functions eh1, eh2, and eh3, which
are called in reverse order on exit. Each exit handler must have a void
argument list and return type. (You can actually register the same function
multiple times.)
Note that you cannot deregister an exit handler once it's been registered. In
this case (and perhaps in others too), you need register only one function and
when it gets control, have it invoke any others itself. Thus, the exit handler
can call any combination of functions it needs to handle the state of the
program it finds. For example:
main()
{
printf("reg eh3: %d\n", atexit(eh3));
}

void eh3(void)
{
printf ("Inside eh3\n");
eh2 ();
eh1();
}

reg eh3:0
Inside eh3
Inside eh2
Inside eh1
Note that there is no call to exit in main in this example. The ANSI C
Standard requires program execution that falls off the end of main to
implicitly call exit with an undefined value. Similarly, returning from main
with an explicit return value results in calling exit with that value as its
argument. This means you will always enter your exit handler at least once
even if you never call exit explicitly.
A minor inconvenience is that an exit handler cannot find out the value
actually passed in to exit. It's possible you want the handler to behave
differently according to the situation that lead to exit being called.
However, to access this information, you must use a global variable and
initialize it with the exit value as well.
You may be tempted to trap a call to exit and replace it with your own exit
value. This is not possible since a handler that calls exit itself potentially
produces an infinite loop. In any event, the behavior of calling exit more
than once at run-time is undefined.
void eh1(void)
{
printf ("Inside eh1\n");
exit(0); /* ???? */
}
It should be obvious you should not call exit directly from an exit handler.
However, since an exit handler can call any other function in the whole
program, you must be sure not to call any that eventually do call exit.


Framework For An Application


Let's consider the case where we have four potential areas that can be
compromised when exit is called. The global variable flags contains four 1-bit
bit-fields which represent the current state of these areas (assuming each has
a binary state). Initially, all states are clear; however, they can be set
during the program when part of a particular transaction is done, for example.
The example in Listing 3 simulates two such "compromised states."
Using bit-fields allows status flags to be packed densely and to be dealt with
by name without having to mess with direct bit manipulations. However, the
exit handler must explicitly test for each one of them. It would be much
faster and would generate less code if the exit handler could test the bit
fields in a loop. However, you cannot have an array of bit-fields, and
besides, the bit-fields might not be the same width anyway. Listing 4 shows an
alternate solution.
handler is an array of flags and since we can't have an array of bit-fields,
unsigned chars are used instead. Now there's not much point in being able to
loop through this array if you need custom code to do different things for
each subscript, so I've added a function pointer for each flag. That way, each
flag has its own processing function registered in the array via its
initializer. You can also change the value of any function pointer at run-time
as you like. (Of course, if you never need to, make pfun a const member.)
You might have noticed the macro NUMELEM. I find this a very useful macro to
have in my toolbox.h header that I cart around on different projects. It works
for any type of array (including multi-dimensional ones) and hides a messy
looking expression along the way. By the way, the expansion should reduce to a
compile-time integer constant so don't be concerned about it generating large
amounts of code.


Intercepting Aborts


Since abort doesn't clean up after itself and it doesn't let you get control,
calling it can cause you grief. You might well respond, "If that's so, then
don't call it." In practice, it's not quite so simple. For example, you might
(Heaven forbid) be calling a third-party library function that does call
abort, in which case, you are "dead in the water."
More close to home, while maintaining a production program you might use the
assert macro to trace a suspected bug. Unfortunately, if assert fails, it
calls abort. Obviously, this might leave you with compromised data on disk or
in memory. What you really need is an abort you can redirect to an exit. Well,
it just so happens that I have one for sale. Just for today only though and
only one per customer, please! Stand back now; no pushing.
You can intercept a call to abort using signal to catch a SIGABRT, as in
Listing 5.
When the abort path is selected, the output is:
Enter A(abort), E(exit): A
Assertion failed: c!= 'A', file abort.c, line 22
Inside abort_hand
Inside eh
whereas with exit, you get something like:
Enter A(abort), E(exit): E
Inside eh
This approach lets you turn an otherwise disastrous abort into one that's
controlled and in the process, generates a defined exit status code. However,
you still can't find out where the abort call came from within the handler
even though the filename and line number are written to stderr.
Again, watch out for infinite looping. Don't call abort from within the exit
(or abort signal) handler or any of the functions it calls.



Miscellaneous Issues


Provided you can get access to the relevant file pointers, you should be able
to do I/O to already-open files from an exit handler. You should also be able
to open new files.
You should always consider the possibility of interrupts occurring when you
are in an exit handler. Since the program is in the process of terminating,
you might want to use signal to ignore new interrupts. This doesn't solve all
possible problems though. For example, multiuser systems usually provide some
way for a privileged program to abort another program. This often results in a
SIGKILL signal being generated. Unfortunately, most implementations don't
allow you to catch a SIGKILL signal. (If they did, you could catch it and
ignore it and keep right on spreading your virus around that 500 megabyte
disk, hence you can ask to ignore it, but it won't be.)
There is also a SIGTERM signal that you can catch. The standard says almost
nothing about how it might be generated (other than by using raise) so you
should consult your library documentation for more information on this and
other signals.
And as with all code that involves asynchronous operations, you should pay
particular attention to using the volatile keyword, to ensure that the value
of the object you access actually reflects it latest state.

Listing 1
#include <stdio.h>
#include <stdlib.h>

main()
{
FILE *out, *tmp;
char ch;

printf("Enter A (abort), E (exit): ");
ch = getchar();

tmp= tmpfile();
fwrite("abcdefgh\n", 1, 9, tmp);

out = fopen("TEST.DAT", "w");
fprintf(out, "message to data file\n");

fprintf(stderr, "error message to stderr\n");

if (ch == 'A')
abort();
else
exit(0);
}


Listing 2
#include <stdio.h>
#include <stdlib.h>

void eh1(void);
void eh2(void);
void eh3(void);

main()
{
printf("reg eh1: %d\n", atexit(eh1));
printf("reg eh2: %d\n", atexit(eh2));
printf("reg eh3: %d\n", atexit(eh3));

exit(0);
}

void eh1(void)
{
printf("Inside eh1\n");
}

void eh2(void)

{
printf("Inside eh2\n");
}

void eh3(void)
{
printf("Inside eh3\n");
}

reg eh1: 0
reg eh2: 0
reg eh3: 0
Inside eh3
Inside eh2
Inside eh1


Listing 3
#include <stdio.h>
#include <stdlib.h>

#define SET 1
#define CLEAR 0

void eh(void);

struct status_flags {
unsigned f0 : 1;
unsigned f1 : 1;
unsigned f2 : 1;
unsigned f3 : 1;
} flags = {
CLEAR, CLEAR, CLEAR, CLEAR
};
main()
{
atexit(eh);
flags.f2 = 1;
flags.f3 = 1;
}

void eh(void)
{
if (flags.f0 == SET)
printf("f0 needs cleanup\n");
if (flags.f1 == SET)
printf("f1 needs cleanup\n");
if (flags.f2 == SET)
printf("f2 needs cleanup\n");
if (flags.f3 == SET)
printf("f3 needs cleanup\n");
}

f2 needs cleanup
f3 needs cleanup


Listing 4
#include <stdio.h>

#include <stdlib.h>

#define SET 1
#define CLEAR 0
#define NUMELEM(a) (sizeof(a)/sizeof(a[O]))

void eh(void);
void cf0(void); /* cleanup functions */
void cf1(void);
void cf2(void);
void cf3(void);

struct status_flags {
unsigned char flag;
void (*pfun)(void);
} handler[] = {
{CLEAR, cf0},
{CLEAR, cf1},
{CLEAR, cf2},
{CLEAR, Cf3}
};

main()
{
atexit(eh);

handler[1].flag = SET;
handler[3].flag = SET;
}

void eh(void)
{
int i;

for (i = 0; i < NUMELEM(handler); i++) {
if (handler[i].flag == SET)
(*handler[i].pfun) ();
}
}

void cf0(void)
{
printf("Inside cf0\n");
}

...

void cf3(void)
{
printf("Inside cf3\n");
}

Inside cf1
Inside cf3


Listing 5
#include <stdio.h>
#include <stdlib.h>

#include <signal.h>
#include <assert.h>

void eh(void);
void abort_hand(void);

main()
{
char c;

atexit(eh);
if {signal(SIGABRT, abort_hand) == SIG_ERR) {
printf("Can't register abort_hand\n");
exit(1);
}

printf("Enter A (abort), E (exit): ");
c = getchar();

assert(c != 'A');
}

void eh(void)
{
printf("Inside eh\n");
}

void abort_hand(void)
{
printf("Inside abort_hand\n");
exit(2);
}































Questions & Answers


Back To Fundamentals




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102; Durham; NC 27707.


You may fax questions for Ken to (919) 493-4390. When you hear the answering
message, press the * button on your telephone. Ken also receives email at
kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP).


Q
I am a brand-new subscriber, and also almost a beginner in the "C" language. I
have wrestled with it off and on since the CP/M days ("BDS C") and now with
Borland's Turbo C (Versions 1 and 2), and have had only moderate success in
getting it to run for my purposes.
My "game" is number-crunching, for engineering and scientific applications. I
have Turbo Pascal (now using Version 5.5) working very well; I would like to
get "C" under reasonable control too!
I have about 15 books on "C" and not one has the answer to my rather simple
problem. Briefly, I want to achieve in "C" what I can in Pascal as in the
following statements;
x := 5; ( "x" was declared
as integer)

y := 6; ( "y" was declared
as integer)

z := x + y; ( "z" was declared
as integer)

WRITELN(LST,'The answer is ',z:10);

( The value of "z"
is printed on the
line printer)
This is what I need.
One of your knowledgeable staff can undoubtedly set me straight on how to do
that in Turbo C Version 2.0.
My machine is a KAYPRO PC with two 360k floppy drives, a 20 MB Seagate
built-in hard disk, running at 4 or 8 MHz. My printer is the old Epson FX80
which is still running like new. I use either MS/DOS 3.2 or PC/DOS 3.2 (I have
both -- currently running the latter).
Can you give me an example in "C" of how to do the above simple operation? It
will be much appreciated.
Bill Whitcraft
Natick, MA
A
You just need a 16th book -- C Language for Programmers by yours truly,
available in bookstores or from QED Information Sciences, P.O. Box 181,
Wellesley, MA 02181. This book was written specifically to help with the
transition from Pascal, FORTRAN, Basic, Cobol, and PL/I to C. It was based on
the K&R standard, but has been revised to incorporate the ANSI standard. Your
answer is on page 107 and is included here.
To output to any device with C, you open the appropriate file and use the
appropriate file output function. The name of the printer file varies from
system to system, but the standard name under MS/DOS is LST. The code in
Listing 1 accomplishes the same end as your Pascal version.
The fopen() function returns a pointer to a data type named FILE. It is
described in <stdio.h>. The fprintf function works like the printf() function,
except that its output goes to the device associated with the pointer that is
the first argument. The fclose() function closes the device. If you forget to
call fclose(), the device will be closed automatically if your program exits
normally.
The %10d is a format specifier that states that you are passing fprintf() an
integer and you want it printed out as a decimal number, 10 characters long.
By default, the number is printed right justified, with spaces for padding.
For a left-justified result you would use %-10d.
I've found in my classes that Pascal programmers tend to make a common mistake
with the formatted print functions (printf() and fprintf()). They use the
Pascal WRITE syntax, which intermixes the strings to be printed out with the
data items. From the example in the book, you might write in Pascal:
WRITELN('I=', I:3, 'Str=' STR:7, 'F= ', F:5:1, 'C=', C:1);
The compiler knows the data types of each variable and does the appropriate
conversion.
In C, this would be written as:
printf("I=%3d Str %7s F=%5.1f C=%c\n", i, str, f, c);
where i is an integer, str is a string, f is a float and c is a char.
Q
I am in the process of reading your article in the February, 1990 issues.
I find most "explanations" of C confusing and incomplete. I am thinking that a
number of readers might also be confused.
(So you don't think I'm totally green I have been programming, engineering,
and teaching others to do same since the days of the Intel 4004.)
Upon some reflection the part that is the most confusing is what the compiler
is trying to do. (to help???!!!)

Of course many authors either don't know what the compiler is trying to do or
assume it doesn't matter. Since I know that's not the case with you I thought
I would share some of my confusions with you.
What would be of help to me is explanations which analyze the assumptions made
by the compiler at each step of the way and how the source code modifies them.
I have taken the liberty of marking up some pages from your recent article in
hopes that you might tell me where I am wrong, point me to lucid descriptions
which might help, help other confused readers, or maybe all of the above.
Thanks in advance for your help.
John M. Harrison
Concord, NH
A
In answering questions, I tread a fine line between too much detail and not
enough. Too much and it may be boring or Robert will complain that there isn't
room. Too little and I might as well not answer the question. One way I can
tell which way I'm leaning is from your feedback. If others have a question or
comment on my answers, please do not hesitate to contact me.
I will give more explanation on my answer and reply to your comments on my
listing. The question revolved around how to print out the values in a
structure without having to code each field separately.
Let's review the problem first. The user had an array of pointers to
structures that looked like:
struct {
char firstname[MAX_FIRSTNAME+1];
char lastname[MAX_LASTNAME+];
char homephone[MAX_HOMEPHONE+1];
char workphone[MAX_WORKPHONE+1];
char areacode[MAX_AREACODE+1];
char street [MAX_STREET+1];
char city[MAX_CITY+1];
char state[MAX_STATE+1];
char comments[MAX_COMMENTS+1];
} *record[MAX_RECORDS];
He wanted to print each element of the array in a for loop, such as follows,
but was unsure of what to use, as shown here:
show_record ()
{
int i:

for(i = 0; i < NUM_OF_FIELDS; i++)
{
printf("%s\n",record[current_record]-> ??? );
}
}
Next let me review some of may coding conventions. I always use a #define in
lieu of numeric constants. The simple substitution it allows makes for a more
readable and maintainable program. I also try to include a comment on the
#define line that explains the value's usage. Because of the consistent use of
capital letters for the name of the #define, the actual #defines themselves
are sometimes not listed in the code. For example, the reader who posed this
question did not give the values for MAX_FIRSTNAME, and so forth, since they
were immaterial to the problem. I try to use a long variable name which
describes the purpose of the variable and therefore omit comments that only
rehash its use. Ellipses are employed to show code that simply repeats in form
what previously is shown. For example, I coded the reader's structure as:
struct s_record
{
char firstname[MAX_FIRSTNAME + 1];
char lastname[MAX_LASTNAME+1];
...
};
struct s_record *record[MAX_RECORDS];
The remaining members in the structure template duplicate the types of the
members listed. If the other members had been doubles or pointers, then they
would be explicitly declared. The term field is often used to represent one of
the members of the structure, when the structure is used as a data record.
I broke out the declaration of the array of pointers from the template. This
set of declarations operates the same as the reader's. However the template is
more usable with other files, since it does not contain a variable
declaration.
In the answer, I described a static variable that has a constant address that
could be used with an array of pointers to the addresses of the members (see
Listing 2).
Each member in print_record is an array, so the name of the member represents
an address. record_field_address[] contains the address of each field (member)
in print_record. Using that array, then the records could be printed using the
code in Listing 3.
Note that the variable record[] is an array of pointers to structures of type
s_record. record[current_record] is a pointer to structures of type s_record.
*record[current_record] represents the actual structure pointed to. So
print_record = *record[current_record] copies the entire structure, that is
the sizeof (struct s_record) number of bytes.
As I mentioned in the answer, a new offsetof() macro created by the ANSI
standard can help. (If you don't have offsetof, see the answer to the next
question). Its syntax is:
#include <stddef.h>
offsetof( type, member-name)
The type is a structure type and the member-name is a member in the structure.
Instead of keeping the address of individual members in an array, only the
offsets from the start of a structure are stored. For example:
#define NUMBER_FIELDS 9
size_t record_offsets[NUMBER_FIELDS] =
{
offsetof(struct s_record, firstname),
offsetof(struct s_record, lastname),
...
/* Remainder of the fields */
};
size_t is a typedef supplied in <stddef.h>. It is usually an int or unsigned
int, but could be a long or unsigned long. If the values you will store do not
approach the maximum for an integer, then size_t's data type is unimportant.
Using the offsetof array, show_record could look something like Listing 4.
record[current_record] is a pointer to structures of type s_record. The (char
*) cast converts the pointer to structure to a pointer to char. This cast does
not change the actual value that is assigned. It simply changes the type of
the pointer. This conversion to a char pointer is necessary for the next
calculation. If you simply printed

record[current_record] + record_offsets[i]
you would be getting the address of something which is
record_offsets[i] * sizeof(struct s_record)
after the beginning of record.
The example code in my original column:
pc = (char *) &record[current_record];
was wrong. record[current_record] is a pointer, so the original expression
would yield the address of the current_record element in the array. Usually I
embed a hint about a varable's type within its name. For example, I would call
this array p_record[] or something similar (p for pointer.) The name record
was simply copied from the reader's example.
If the calling sequence of show_record was changed so that it expects a record
(or an address of a record), it would be a more useful function. (See Listing
5)
Once again, the east transforms the pointer to a structure of s_record type to
a pointer to char. The & operator on variable record disappeared from the code
listing between creation and appearance. In the original column it, appeared
as pc = (char *) record, which would have given a compiler error. This
function receives one record from the calling function and prints it. I've
changed the name of the parameter to a_record to make it clearer that this is
not the same as the original array. With the original name of record, the
compiler would ignore the global declaration for record[] and simply use the
parameter throughout the function.
If you passed an address of a record, then the function would look like
Listing 6. I've changed the array's name again to make sure it does not get
confused with the original array.
Q
In February 1990 issue, you have given a short cut method of accessing
individual elements of structure. I am using AT&T 3B2 machine running UNIX.
Can you suggest a similar method which can be adopted under UNIX, because we
don't have offsetof function?
Jaspal Singh
Riyadh
Saudi Arabia
A
As described in the response above, you could use a static variable, which
will have constant addresses and set up an array of pointers to those address.
That is shown in the previous answer. If you needed the offsets, you could
simply use the differences in the addresses for the elements as in Listing 7.
The casts are necessary to transform the address of the structure into a
pointer to char. Otherwise the compiler would not allow the subtraction, since
we would be subtracting pointers to different types. record_field_offsets
contains the offset in bytes to each member.
The purpose of the ANSI offsetof macro is to eliminate having to declare a
variable simply so that these offsets can be calculated. (KP)
Q
While reading MS-DOS files in UNIX, I get Control M character as a last
character. Sometimes files read are not correct as if some portion of the next
file is present in the current read file, can you tell me how to correct this
problem?
Jaspal Singh
Riyadh
Saudi Arabia
A
When a MS-DOS text file is written, every \n character is replaced with a
carriage-return/linefeed pair. When the file is read under MS-DOS, the pair is
replaced back with a single \n character. The values for this pair are
control-M and control-J.
UNIX does not perform this replacement. There is a single character written
for every \n character. Usually this is a control-J. When you read the file
under UNIX, the additional control-M is not translated and you see this in the
input.
MS-DOS uses a control-Z to signify the end of a text file. Any additional
characters beyond this EOF character are not part of the logical file and
should be ignored. They are there to fill up a disk sector.
UNIX uses the actual size of the file to designate the end of the file,
although a control-D character can be used. When UNIX reads the file, it
simply inputs the control-Z as a regular character and keeps reading through
the rest of the file.
If you dump the file in binary mode under MS-DOS, you will eliminate the
translation of the \n character. You should also write a control-D as the last
character of the file.
If you do not have access to the program that orginally created the file, you
could write a translation program that reads the MS-DOS created file and
performes these translations (see Listing 8). For brevity, Listing 8 ignores
error returns.


Reader's Replies:




Stringizing Constants:


In your column in the March 1990 C Users Journal you asked for a solution to
getting manifest constants into string form.
I haven't found a way to get a manifest constant into a static string using
ANSI "string-izing", but there are other solutions.
1. Use sprintf as part of program initialization to build the string:
#define MAX 10
char errmsg[80];
void init()
{
sprintf(errmsg, "The maximum is %d", MAX);
}
This extra code should not be important.
2. If there are lots of codes and associated strings, the above approach is
wasteful and hard to maintain. A solution we have used is to build an ASCII
file containing code names and their strings. A separate program, called out
in a MAKE file, uses that file to build two include files, one with an enum to
make the values, and the second containing an array of the strings in the same
order as the enum constants. The error function is called with, say, the error
code (one of the enum values) and a second value which might be a number or a
string pointer for more specific data. Only the special text file is
maintained; the header files are automatically generated whenever the text
file is changed. Any files including the headers are automatically recompiled
whenever the header files are changed (see Listing 9).
Finally, in your example of using manifest constants for printf field widths,
ANSI C provides a simple solution: use an asterisk (*) in place of the field
width or precision (or both), and provide the value as a type int parameter to
printf. Your example becomes:
printf("\n Record is %*s %*s",
WFIRST_NAME, record.first_name,
WLAST_NAME, record.last_name);
Another example, for printing E-format floating-point:
#define E_WIDTH 20
#define E_PREC 8

printf("%*.*E", E_WIDTH, E_PREC, val);
Steve Clamage
TauMetric Corporation
San Diego, CA
Thanks for your reply. Your use of the * for the field width works for the
instance I provided. For the actual problem I was facing, I really wanted to
put the format specifier strings into a table. (KP)


More on Stringizing


I sent you mail yesterday regarding your March 1990 column on string-izing
manifest constants. I suddenly realized that there is an example of how to
string-ize a manifest constant in the ANSI standard itself, in the examples
following section 3.8.3.5. In the 7 Dec 1988 draft, it is on page 93. You just
need one extra level of indirection. Here is a small example:
#define q(x) # x/* just string-ize the argument */
#define quote(x) q(x) /* expand the argument, then
string-ize */
#define MAX 10
char *p = "MAX = " quote(MAX) ; /* results in "MAX = 10" */
Steve Clamage
TauMetric Corp
That'll do the trick. The answer is obvious now that you mentioned it. As I
reread the standard again, the replacement will take place as you suggest.
Reading the standard is practically like reading the law and sometimes as
confusing. I'm looking for a compiler that meets the ANSI standard completely
so I can test out all my assumptions, rather than try to figure them out from
the text.
The steps for this are:
quote(MAX) Call to macro
quote(10) Expansion for argument x as it will appear in the replacement
string. This occurs if the argument in the replacement string is not preceded
by # or ## or followed by ##.
q(10) Value of the macro, as argument is substituted.
#10 macro named q is found and argument x is substituted.
"10" Quote operator applied.
Note that the same argument may or may not be expanded in the replacement
string, depending on the presence of a # or ## operator. This makes it easy to
write a debug macro that works for both variables and macros. For example:
#define debug_int(name) \
printf("\n Value of " #name " is %d", name);
makes the following code:
#define MAX 10
int var;
debug_int (var);
debug_int (MAX);
turn into:
int var;
printf("\n Value of var is %d", var);
printf("\n Value of MAX is %d", 10);
Thanks for your help (KP).


Reader's Requests:




Linking FORTRAN TO C


I've written you once before concerning a Turbo C question, and I appreciated
your answer (before I got your reply, I discovered an easier way to draw XOR
lines -- using the Turbo setwritemode() function).
Well, I've got a couple more Turbo C questions, both pertaining to
compatiblility with MicroSoft products.
Some of my clients have written large amounts of code in MS FORTRAN, and we
wish to tie their code to C to utilize the graphics capabilities that we have
written. The only problem is that we cannot get MS FORTRAN and Turbo C to
cooperate.
We called Borland, and they said that "it can't be done", and that even trying
to link Turbo C and MS C together works only about 10% of the time.
Given a preference, I lean toward Turbo C, and to that end, I've developed a
quite extensive graphics library which calls the Turbo's graphics functions.
Naturally, I'd prefer not to have to re-write it again (it started out as a
Lattice library calling the IBM (GSS) VDI drivers).
Have you or any of the other readers of the C Users Journal tried and/or had
any success linking Turbo C modules with MS FORTRAN?
In a similar vein, some of our customers plan to use Windows 3.x and or OS/2
when they are available to run large 32-bit programs. Have you heard any
rumors, etc. as to if or when Borland will offer this support?
Steve Nelson
Mansfield, Texas
Not I, but perhaps some of our readers have. (KP)



UNIX Interprocess Communications/Graphics


I am doing commercial programming in C using AT&T 3b2 machine, using UNIX
3.51.
My problem is that when my program is loaded from more than one terminal, it
corrupts the files, therefore I am interested in knowing more about "inter
process communication", can you suggest a good book on the subject.
I am also interested in developing graphics application on my system. I have a
VDI interface toolkit, but I want to learn more, so if you can suggest
something here too, it will be highly appreciated.
Jaspal Singh
Riyadh
Saudi Arabia


X Window Books


For those of you who have been hearing the X Window buzzword, but want more
information, I have just been reading a series of books put out by O'Reilly &
Associates, 632 Petaluma Avenue, Sebastopol, CA 95472. They publish a series
of books on both X Windows and UNIX. They include: X Protocol Reference
Manual, Xlib Programming Manual, Xlib Reference Manual, X Protocol Reference
Manual, X Window System User's Guide, Xview Programming Manual, and Managing
UUCP and Usenet. (KP)


The Great Name / Obscure Code Contest


As announced in previous columns, this contest has begun.
Send examples of the worst names or abbreviations that you have seen in other
people's programs (or even your own). Include both the name and a description
of what it is supposed to represent. The best (or worst) examples will be
published here, with credit for your submission. The name of the programmer
who actually wrote the code in which the name is used will not be mentioned
without his/her express permission.
Here are a couple more entries:
int spl_flag;
long o_count;
They both come from a text parsing program.

Listing 1
#include <stdio.h>

function ()
{
FILE *file_printer; /* Pointer to a file */
int x, y, z;

x=5;
y=6;
z=x + y;

/* Open the printer */
file_printer = fopen("LST","w");

/* Print the line */
fprintf(file_printer, "The answer is %10d", z);

/* Close the printer */
fclose(file_printer);
}

Listing 2
static struct s_record print_record;
/* Declare a static record */

#define NUMBER_FIELDS 9 /* Number of fields in record */

char *record_field_address[NUMBER_FIELDS] =
{
print_record.firstname,

print_record.lastname,
...
/* Remainder of the fields in the same format */
}; /* Addresses of each field */

Listing 3
show_record()
{
int i;

/* Copy in the record to be printed */
print_record = *record[current_record];

for (i=0; i < NUMBER_FIELDS; i++)
{
printf("%s\n", record_field_address[i]);
}
}

Listing 4
show_record()
{
int i;
char *pc;
pc = (char *) record[current_record];
for (i=0; i < NUMBER_FIELDS; i++)
{
printf("%s\n", pc + record_offsets[i]);
}
}

Listing 5
show_record (a_record)
/* Prints out a record */
struct s_record a_record;
{
int i;
char *pc;
pc = (char *) &a_record;
for (i=0; i < NUMBER_FIELDS; i++)
{
printf("%s\n", pc + record_offsets[i]);
}
}

Listing 6
show_record(p_a_record)
/* Prints out a record, whose address is passed */
struct s_record *p_a_record;
{
int i;
char *pc;
pc = (char *) p_a_record;
for (i=0; i < NUMBER_FIELDS; i++)
{
printf("as\n", pc + record_offsets[i]);
}
}


Listing 7
static struct s_record print_record;

#define NUMBER_FIELDS 9
int record_field_offsets[NUMBER_FIELDS] =
{
print_record.firstname - (char *) &print_record,
print_record.lastname - (char *) &print_record,
...
/* Remainder of the offsets */
};

Listing 8
#include <stdio.h>

#define MS_DOS_EOF 26
#define MS_DOS_CR 13

main(argc, argv)
/* Translate MS-DOS text file to UNIX file */
/**** Does not check for errors *****/
/* Usage translate file-in file-out */
int argc;
char *argv[];
{
int c;
FILE *file_in, file_out;

file_in = fopen(argv[1],"rb");
file_out = fopen(argv[2],"wb");

while (1)
{
c = fgetc(file_in);
switch(c)
{
case EOF:
case MS_DOS_EOF:
/* All done */
goto end;
break;
case MS_DOS_CR:
/* Ignore the CR value */
break;
default:
fputc(c, file_out);
break;
}
}
end:
fclose(file_in);
fclose(file_out);
exit(0);
}

Listing 9
Text File:

TOOBIG Value cannot be larger then %d

BADNAME The name %s is not defined
PUNT Unknown error, aborting

Sample output of special program:

header1:
enum errkind { TOOBIG, BADNAME, PUNT };
header2:
char *errstring[] = {
"Value cannot be larger then %d",
"The name %s is not defined",
"Unknown error, aborting",
};

Sample Usage:

/* ANSI prototype */
extern void error(enum errkind, char*);
error(TOOBIG, (char*)MAX);
error(BADNAME, str);
error(PUNT, 0); /* the 0 is ignored in this case */

The error() function is a small printf-like function
which is very easy to write.







































On The Networks


Portable Graphics?




Sydney S. Weinstein


Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and president
of Datacomp Systems, Inc., a consulting and contract programming firm
specializing in databases, data presentation and windowing, transaction
processing, networking, testing and test suites and device management for UNIX
and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron
Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the
Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do
Internet addressing).


As is often stated, one of the nice things about standards is that there are
so many of them. This holds especially true for graphics formats. Every
software and hardware manufacturer has come up with their own ideas about what
makes a good graphics format. However, you sit there with their computer and
fancy bit-mapped display, but with no way to look at all those wonderful
pictures. It seems that the software for each machine, or even different
software for the same machine, can handle only a few of the formats.
To make matters worse, the graphics file often contains things your machine
cannot handle, and you need a way to restructure the file. This is where the
portable bitmap utilities come into play. One of the major recent
distributions in comp.sources.unix is the Extended Portable Bitmap Toolkit by
Jef Poskanzer (jef@well.sf.ca.us).
Unfortunately, this tool set invents another format, called a portable format.
However, the tool set converts many other formats to each other via the
portable format. The tool set comes in four parts: PBM, for bitmaps (one bit
per pixel); PGM, for greyscale images; PPM for full-color images; and PNM, for
content-independent manipulations on any of the three formats.
PBM is able to convert the following formats to the portable format: Xerox
doodle brushes, CMU window manager, Group 3 Fax, Sun Icon, GEM .img, MacPaint,
Bellcore MGR, PC Paintbrush, PICT, Sun Raster, X10 or X11 bitmap and X10 or
X11 window dump. In the other direction it can convert from the portable
format to ASCII graphics, BBN BitGraph graphics, CMU window manager, Group 3
Fax, Graphon graphics, Sun Icon, HP LaserJet, MacPaint, Bellcore MGR,
Printronix, Sun Raster, X11 bitmap, X10 bitmap, X11 window dump. PBM also
comes with a tool set that includes:
pbmlife -- apply Conway's rules of Life (a cell growth simulation) to a
portable bitmap; pbmmake -- create a blank bitmap of a specified size; pbmmask
-- create a mask bitmap from a regular bitmap; pbmpaste -- paste a rectangle
into a portable bitmap; pbmreduce -- reduce a portable bitmap N times using
Floyd-Steinberg algorithm; pbmupc -- create a Universal Product Code bitmap.
PGM is able to convert the following formats to the portable graymap: FITS,
Usenix FaseSaver, HIPS, PostScript "image" data, raw grayscale bytes, TIFF. In
the other direction it can convert from the portable format to: FITS, portable
bitmap, Encapsulated PostScript. PGM utilities include:
pgmenhance -- edge-enhance a portable graymap; phmhist -- print a histogram of
the values in a portable graymap; pgmnorm -- normalize contrast in a portable
graymap.
PPM can convert from GIF, Amiga IFF ILBM, Igm-whatnot, MTV ray-tracer, QRT
ray-tracer, Sun Raster, TrueVision Targa and X10 or X11 window dump formats to
portable pixmap format. From portable pixmap it can convert to GIF, Amiga IFF
ILBM, portable graymap, color Encapsulated PostScript, Sun raster, and color
X11 window dump formats. PPM utilities include:
ppmarith -- perform arithmetic on two portable pixmaps; ppmconvol -- general
MxN convolution on a portable pixmap, includes a script to use this utility
for smoothing enlarged images to reduce the block pixel effect; ppmcscale --
scale the colors in a portable pixmap; ppmpat -- create a pretty pixmap;
ppmquant -- quantize colors down to a specified number; ppmrotate -- rotate a
portable pixmap; ppmscale -- scale a portable pixmap; ppmshear -- shear a
portable pixmap; and ppmhist -- print a histogram of a portable pixmap.
PNM includes:
pnmcat -- concatenate portable anymaps; pnmcrop -- crop all like colored
borders off a portable anymap; pnmcut -- select rectangular region from a
portable anymap; pnmenlarge -- enlarge a portable anymap N times; pnmflip --
perform one or more flip operations on a portable anymap; pnminvert -- invert
a portable anymap; pnmpaste -- paste a rectangle into a portable anymap; and
pnmtile -- replicate a portable anymap into a specified size.
That's a lot of details, but what can you do with it, especially if your
format is not one listed? An advantage of a portable format is that you only
have to write a program to convert your format to and from the portable
format. Then using the portable format as a base, access to all the utilities
and all the other formats is possible. The package has grown by this method,
and I am sure Jef wouldn't mind hearing about more formats if you also have
the conversions ready.
A tool like this one makes it easier to combine those several very useful
graphics tools you have that don't talk the same file format. Plus it includes
some very useful tools of its own, including smoothing. Also included is a
list of other image processing software that is available on the networks.
PBMplus was distributed in December as comp.sources.unix Volume 9 Issues 17
thru 34.


More Life In Games


Not only does PBMplus come with a life utility, comp.sources.games also had
two postings of the Conway population simulation. For those without the bitmap
graphics abilities and only standard terminals or screens, Volume 8 Issue 85
contained a life games that works using curses or termcap on standard
terminals. It was adapted from a version written by Leor Zolman, and was
contributed by John Rupley (rupley!local@cs.arizona.edu). Leor's version from
the 1970's is also included, designed for 8080 computers. Some interesting
predefined patterns are also included along with Makefiles for both BSD and
Sys5 versions of UNIX.
An alternate version based on skewed 2x2 squares was also posted as Volume 8
Issue 87 of comp.sources.games. However, not much documentation was included,
so I also don't know what skewed 2x2 squares are. I include it here only for
completeness. It was contributed by Paul F. Dietz (dietz@cs.rochester.edu).
Another recent games posting is Shuffle, a game of logic. In Shuffle a
randomly swapped row of the numbers one through nine is presented on the
screen along with a set of four possible ways of swapping them. Using
combinations of those four ways, you must sort the numbers in ascending order.
If you perform the sort in less switches than the computer used to mix them
up, you advance to the next level of difficulty. If not, the game is over and
you can try again. It's not as easy as it sounds. The four switches each
rotate four numbers. The first reverses the order of the first four numbers
such that 1 2 3 4 becomes 4 3 2 1. The second switch positions two through
five in the same rotation. The third positions five through eight and the last
positions six through nine. Shuffle is Volume 8 Issue 74.
Even Tic-Tac-Toe was in for an upgrade recently. In this new version of the
old standby, Eric Lechner (lechner@ucscb.ucsc.edu) has provided a four-by-four
playing board with wrap-around edges. Once again, the goal is to get four
pieces in a row, either horizontally, vertically, or diagonally. Note that
with wrap-around edges, the diagonals can be off center. Thus, there are no
corners on the board. This version of the game was developed to prove the
logic before committing it to hardware. tttt is Volume 9 Issue 33.
If you notice a lot of holes in the Issue numbers reported versus those
posted, the past two months saw a lot of patch (corrections and enhancements)
postings to existing games. Also, some of the games had severe copyright
restrictions. I will try to report on Freely Distributable Software only.


Anyone Seen




comp.sources.unix?


Rich Salz's comp.sources.unix is still sporadic, but did have a set of
postings in early February. First was a new version of squeeze, a file
compression program using Miller-Wegman encoding, a variant on Ziv-Lempel
encoding that can produce about 10 to 40 percent smaller output on text files
than compress, the standard Ziv-Lempel program on UNIX. Dan Bernstein
(brnstnd@stealth.acf.nyu.edu) submitted squeeze1.711 as Volume 21, Issue 1.
Ted Campbell (dukeac!tcamp) submitted the ro text formatter with many of the
capabilities of nroff. Based on a long line of CP/M and MS-DOS formatters, it
traces its roots to the ROFF formatter described in Kernigan and Plauger's
Software Tools. Both MS-DOS and UNIX makefiles are provided. ro includes many
of the features of AT&T's nroff as well as adding some of its own. A complete
manual, in ro format is included. (Volume 21, Issues 3-5).
Find is the UNIX utility to walk (search) a directory tree, looking for files
matching a set of criteria. Its syntax is difficult to use for more than the
simplest set of matching criteria. Kenneth Stauffer
(stauffer@cpsc.UCalgary.CA) has written Rh, a directory-walker that uses C
expressions, allowing for much more complicated matching criteria. Rh is
Volume 21, Issues 6 and 7.
The largest posting of this set was rayshade. Craig Kolb (craig@weed -
eater.math.yale.edu) provided this new version of a raytracing program.
Rayshade reads an ASCII file that describes the scene to be rendered and
produces a Utah Raster RLE format file of the ray-traced image. It includes
box, cone, cylinder, height field, polygon, sphere, superquadratic, flat and
Phong-shaded triangle primitives as well as composite objects. It supports
point, directional and extended (area) light sources and can do texturing and
bump mapping of objects. Anti-aliasing and linear transformations are also
provided. It even includes options for rendering stereo pair images. A large
posting, it is Volume 21 Issues 8-18.


comp.sources.misc Activity


Though comp.sources.unix has been quiet, the same cannot be said for
comp.sources.misc. Brandon Allbery, the moderator of the group, has been
chugging out the postings on a regular basis. A perl program to analyze the
massive log files created by running Netnews was submitted by Johan Vromans
(jv@mh.nl). Scanlog.pl reports on the number of news articles on a per-host
basis that are received, posted, transmitted, control-type articles, cancels,
junked, and duplicates. It also reports on non-local groups referenced and new
groups created. However, this version is only for 'B' news log files. (B News
is a version of news, not the language in which it was written.) Scanlog.pl is
Volume 9, Issue 59.

For those running 'C' news (again version, not language) cn_pexpire, Volume 9,
Issue 97, is a rewrite of Dave Taylors pexpire. Jim Pickering (rducky!jrp) has
update pexpire to work with this different family of Netnews sources.
Cn_pexpire, like pexpire, allows for intelligent expiring of news (deleting
old news articles) based on which group the current readers of Netnews
subscribe. Each group in which no readers subscribe is expired immediately,
saving the space for the other groups.
Browse_pds in Volume 9, Issue 66 from Peter Da Silva (ficc!peter) is a
screen-oriented directory browser based on the metaphor of using the VI editor
to edit an ls -l listing. It does not use VI or ls, but is based on similar
commands and output formats. It allows looking at directory contents
screenfull by screenfull, deleting files (both with and without verification),
renaming files, and short and long format directory listings.
Continuing the administration utilities, dtree was submitted as Volume 9,
Issue 85 by David MacKenzie (djm@abyss.eng.umd.edu). It produces a printout of
the structure of a directory tree. Also Issue 87 is a version of chown that
supports recursive tree support and the optional group argument similar to the
BSD versions. It was submitted by Arnold Robbins (arnold@mathcs.emory.edu).
Those trying to save paper, and using postscript, can use mpage, which prints
postscript files in an "n-up" format. Choices are one, two, four, eight and
sixteen pages of output per sheet of paper, with two and four being the most
useful and readable. Mpage is Volume 9, Issues 88 and 89.
Pep, named after "an excellent Norwegian detergent," is a general purpose
filter and file cleaning program. It can expand/compress tabs, convert to and
from several character sets, interpret ANSI escape sequences and remove
unwanted line noise. It supports the ASCII, IBM-PC, Roman8, CP850 and
ISO8859/1 character sets, plus a table mode for defining local character sets.
Included are MS-DOS, UNIX and VMS makefiles. Submitted by Gisle Hannemyr
(gisle@ifi.uio.no) is Volume 9, Issues 92-96.
With all the splits in area codes coming up, John Mundt
(john@chinet.chi.il.us) submitted Volume 9, Issue 99 which takes a table of
exchanges and area codes (the data for the recent Chicago split is included)
and update files changing the phone numbers. It finds phone numbers in the
text and attempts to add area codes where needed or to correct incorrect area
codes.
For those looking for a sophisticated pop-up calculator for their systems, the
old Sun standby calctool has been upgraded to a level of graphics abstraction
to allow it to not only be used under SunView, but XView, X11, NeWS, MGR and
for dumb tty terminals. It's a major release by Rich Burridge
(richb@Aus.Sun.Com) and is Volume 10, Issues 6-11. Several patches have also
appeared for it recently.
A complete b+tree library for variable-length key and variable page sizes
including test programs, "semi-usable" record manager, and a dbm-lookalike was
submitted by Marcus Ranum (mjr@umiacs.umd.edu) as Volume 10, Issues 27-31. The
patches required to make it work under SCO Xenix were posted by Jonathan Bayer
(ispi!jbayer) as Volume 10, Issue 32.
A new version of pcmail was submitted by Wietse Venema (wswietse@lso.win.tue.
ln) as Volume 10, Issues 33-43. It provides a single user with facilities for
creating, sending and receiving electronic mail messages. PC mail supports
various transport mechanisms, including UUCP (GNUUUCP or with the driver
included) and PC/NFS. You can compose messages with any word processing
package or editor that produces clean ASCII text files. It supports RFC 822
(standard UNIX/Internet mail) headers, customized headers and signatures and a
reply command with or without including the text of the original message in
the reply. The mail database can reside on the PC or on a UNIX host.
Ian Stewartson submitted sh_dos, a Bourne Shell workalike for MS-DOS. It
implements the shell as defined in sh(1) of UNIX System V.3 except: background
processes are not supported, certain internal commands with no equivalent in
MS-DOS are not provided (ulimit, time), command hashing and accounting are not
supported, and 8-bit character sets are not supported. sh_dos is Volume 10,
Issues 53-58.
Brandon Allbery submitted one himself, Volume 10, Issue 85. Uformat is a
picture formatter for numeric data either as a stand-alone program (for shell
use) or as a subroutine from within other programs.
Another large distribution was xlisp2.1, an implementation of a lisp
environment. Lisp is a list processing language often used for AI. Gary Murphy
(cognos!gary) submitted this as Volume 10, Issues 88-96.
Lastly for comp.sources.misc, which did have a lot of activity, is version 1.4
of PCcurses, a PC version of the standard UNIX screen and window library
called curses. This version by Bjorn Larsson (bl@infovox.se) was based on a
version called ncurses written by Pavel Curtis at Cornell University. Both the
Turbo C and Microsoft C compilers are supported. PCcurses14 is Volume 10,
Issues 15-21.


Previews From alt.sources


Tile forth, a Forth-83 implementation written in C, was submitted by Mikael
Patel (mip@ida.liu.se). The kernel supports the standard Forth-83 Word Set
except for blocks and is extended for argument binding and local variables,
queue management, null-terminated string functions, exceptions and
multitasking. It is complete with a programming environment. alt.sources is
not moderated, so there are no volume and issue numbers.
Steve Summit (scs@avarice.pika.mit.edu) previewed his stdio, a complete C/UNIX
standard I/O package. It is intended to be compatible with the ANSI C
standard, although it does contain some new functions not in the standard.
A preview of a TCL release to support System V (the original TCL only supports
BSD) was posted by Peter da Silva (ficc!peter). TCL is an embedded tool
command language described in the paper "Tcl: An Embeddable Command Language"
by John Ousterhout of the University of California at Berkeley in the
proceedings of the 1990 Winter USENIX Conferenee. A copy of the paper in
postscript form is in the distribution. It is hoped that the System V changes
will be incorporated into a future TCL release.


Lastly, Elm


Tooting my own horn, Elm, a user-friendly Mail User Agent, has undergone its
next release (now version 2.3). Although there is not a lot new in this
release, many past problems have been fixed, and several new portability
features have been added. Elm 2.3 will be distributed via FTP and a future
posting to comp.sources.unix.




































Documentation Tool Builds Action Diagrams


Dwayne Phillips


The author works as a computer and electronics engineer with the U.S.
Department of Defense. He has nearly completed a PhD in electrical and
computer engineering at Louisiana State University. His interests include
computer vision, artificial intelligence, software engineering, and
programming languages.


C-Doc Documentation Tools is a program that helps you document C source code
on MS-DOS computers. The current version 3.2 also documents C++ programs.
C-Doc requires MS-DOS 2.0 or higher, 256K of memory, and a floppy disk (a hard
disk is best); it retails for $149. C-Doc searches, indexes, and cross
references source code files and then presents the information in a variety of
useful formats. Since C-Doc works on source code files, it will process files
for virtually any C compiler. Learning how to use C-Doc is easy. Because of
the variety of output formats, each programmer or group of programmers will
find something in C-Doc that they need.
There exists a variety of documentation programs on the market, each of which
will read your source code and then generate some type of documentation.
Documentation in this context does not mean "comment lines," but rather a list
of function names, parameters, variables, and function calling hierarchies.
C-Doc, however, can automatically generate and insert "comment lines" into
user source code, as well as producing stand-alone documentation outputs.
Without such documentation programs, you must painstakingly type this
information by hand. Documentation programs are especially helpful when your
boss dumps a pile of strange, undocumented code into your lap and tells you to
figure it out.
With 25 switches and 40 options, C-Doc is a flexible tool. C-Doc invocations
range in complexity from
C-Doc filename.c
to
c-Doc -a1-b40-c4-e74-f1-h1-i$-j0-
k1-10-oxxxx- p60-r4 -s3-t1-w50-x35
z0 file1.c file2.c file3.c file4.c
C-Doc comes with several batch files to use and modify. It also comes with a
c-help program that helps you create batch files. The c-help program is a
lifesaver since the many switches and options overwhelm and confuse you at
first. c-help displays a menu with all the switches on a command line (see
Figure 1). Pressing the letter of a switch displays the options for that
switch and what they will do. Using these explanations, writing custom batch
files becomes easy.
The c-help program is a good complement to the C-Doc manual. The 40-page
manual is punched to fit in a small three-ring binder, but is already bound.
The examples could be more thorough, but are adequate for the audience (C
programmers).
With C-Doc, you can also process a list of source code files by putting the
various filenames into a file and then passing that file's name to C-Doc. I
used C-Doc on a program with 100 source code files and 30,000 lines of code
without any problems.
C-Doc can generate eight different types of output files:
.TOC -- table of contents
.CAL -- caller/called summaries
.TRE -- graphical calling hierarchy
.REF -- all local/global/parameter variables
.LST -- code listing with line numbers and action diagrams
.C$$ -- formatted source code
.HDR -- users/calls and local/global variables by function
.CLS -- C++ class-hierarchy tree diagrams.
The .TOC is an alphabetical table of contents of function names. It gives the
line number and file name where each function appears.
Figure 2 shows an example .CAL file which cross references function calling
relationships. This report is helpful if you are trying to change a function
call and you need to modify every instance of the call. Unfortunately in this
format, using long function names pushes the entire page to the right, often
resulting in extra carriage returns and wasted paper. Version 3.2 allows the
selection of several space-saving formats. The output tables automatically
adjust to the user-specified paper width and to the maximum length of names in
the user programs.
Figure 3 shows the .TRE output, which presents the .CAL information in a
graphical form (a calling hierarchy). C-Doc's graphs are not as elaborate as
those of some other programs which draw boxes around the file and function
names and connect them on a sideways printout.
The .REF files list all the program variables, giving each occurrence of each
variable (local or global and function parameter) with the line number and
file name. Options allow you to cut the otherwise copious information down to
a manageable size. The new version 3.2 also does C++ class-hierarchy tree
diagrams and stores the results in .CLS files.
Unlike some documentation tools, C-Doc also acts as a pretty printer. Figure 5
shows how C-Doc reformats the code in Figure 4 into a more readable style.
Version 3.2 can reformat source code into several different standardized
formats. Figure 6 shows formatted coded with "action diagrams" inserted.
Action diagrams use single lines to outline logic constructs and uses double
lines to outline loops. The action diagrams can be helpful in deciphering
particularly messy code written by someone else. C-Doc can also line number
the source and put its outputs into one long .LST file or into .C$$ files. The
.C$$ files have the same file name but with the .C$$ extension, preventing any
accidental damage to the original source code.
Figure 7 shows a .HDR file, a combination of .CAL and .REF outputs which gives
the user/calls and all variables for each function. C-Doc can take this
information, split it up, and insert the documentation for each function into
the source code just before the function itself. This is one of the best
features of C-Doc.
C-Doc does have some annoying characteristics. First, it places the output on
the left edge of the page when printing. If you place your paper in a three
ring binder, you lose a few words here and there. I wrote a program that tabs
the words over to the right before printing, but C-Doc should be able to do
this by itself.
Second, the C-Doc status display uses a black background with several
different colored letters. There is no method of changing the display colors,
some of which can be painful to stare at. However, version 3.2 does allow the
user to select monochrome (black and white) or color for the status display.
On balance, C-Doc is practical, affordable, and useful. If you write large C
programs spread over dozens of files, if you are part of a programming team or
manage a programming team, if you work in a lab or a shop where programmers
come and go, then you can make productive use of C-Doc.
Figure 1
- The c-help program menu

First letter of switch to be viewed/modified (or 'return') Choice (or !)?

C-DOC -A1-B40-C4-E74-F1-G1-H1-J0-K1-L0-0XXXX-P60-R4-T1-V3-W50-X35-Z0



D Display G Graphics
I Include K Conditional Preprocessing
O Output P Page length & form-feed
N Notices U User title
$ file-list W page Width

C-CALL (MODULES) C-REF (REFERENCES) C-LIST (LISTINGS)
C Caller/Called R References A Action diagram
F File T-Of-C B Begin (or 'E' end) Comment
M Main root C-REF (HEADERS) J Jumbo comments split

T Tree diagram H Headers L List source
V Verbose tracking S Space indents
 Z Zero {} indnt

Figure 2
Caller/Called Summaries

Defined (Internal) Routines, Module XREF (of CALLS/USERs) (1 of 2)
****************************************************************************
----------------------------------------------------------------------------
code_2_smallest_items USERS: 132 combine_and_code_2_smallest_item
code_2_smallest_items CALLS: 220 code_smallest_item
 221 code_next_smallest_item

code_and_write_output_file USERS: 46 main
code_and_write_output_file CALLS: 44 open_files
 46 clear_input_buffer
 47 clear_output_buffer
 56 position_in_file_displacement
 58 read
 67 code_byte
 73 write_output
 75 clear_output_buffer
 80 write_output
 81 clear_output_buffer
 84 clear_input_buffer
 97 write_output
 99 close
 100 close

code_byte USERS: 67 code_and_write_output_file
code_byte CALLS: 179 find_output_code
 185 set_bit_to_1
 190 clear_bit_to_0

code_next_smallest_item USERS: 221 code_2_smallest_items
code_next_smallest_item CALLS:

code_smallest_item USERS: 220 code_2_smallest_items
code_smallest_item CALLS:

combine_2_smallest_items USERS: 133 combine_and_code_2_smallest_item
combine_2_smallest_items CALLS:

combine_and_code_2_smallest_item USERS: 43 create_hoffman_code
combine_and_code_2_smallest_item CALLS: 126 find_smallest_item
 132 code_2_smallest_items
 133 combine_2_smallest_items
 134 print_item_array

Figure 3
Graphical Calling Hierarchy
Click Here for Figure
Figure 4
Original Source Code

for(a=0; a<9; a++){
printf("\na is %d", a);
for(b=0; b<9; b++){

while(a<4){
for(c=0; c<9; c++){
x = a*b + c;
y = a/b;
z = a + b + c;
printf("\n x=%f y=%f z=%f", x,y,z);
}
}
}
}

Figure 5
Source Code Formatted

 for(a=0; a<9; a++){
 for(b=0; b<9; b++){
 while(a<4){
 for(c=0; c<9; c++){
 x = a*b + c;
 y = a/b;
 z = a + b + c;
 printf("\n x=%f y=%f z=%f", x,y,z);
 }
 }
 }
 }
Figure 6
Source Code Formatted with Action Diagrams
Click Here for Figure
Figure 7
Header Information with users/calls and variables
Click Here for Figure































Object-Oriented Program Design With Examples In C++


Ron Burk


Ron Burk has a BSEE from the University of Kansas and has been a programmer
for the past 10 years. He is currently president of Burk Labs, a small
software consulting firm.


He may be contacted at Burk Labs, P.O. Box 3082, Redmond, WA 98073-3082.


C++ is a language that allows object-oriented programming, but which does not
cater to it exclusively; C++ supports other styles of programming as well and
is not typically regarded as a "pure" object-oriented language. Nevertheless,
the surging popularity of both C++ and object-oriented programming is forcing
object-oriented purists who favor Smalltalk and similar programming languages
to turn to C++ to express their ideas. One result of this phenomenon is Mark
Mullin's new book: Object Oriented Program Design With Examples in C++.
The book attempts to capture the design and implementation of a realistic
project: building a corporate database for a fictitious client, the Bancroft
Trading Company. Among the items constructed are a customer database, a
supplier database, a sales database, and an inventory system. The first half
of the book concerns itself with design and tries to avoid language-dependent
issues. The second half of the book focuses on the specifics of implementing
the design in C++. Only part of the code that would be necessary to implement
the design is supplied, but several of the object classes are presented in
some detail.
The author did not intend the book to be an introduction to C++, although half
of the book is devoted to implementing object-oriented designs in C++. For
example, designing object classes in C++ requires attention to the fact that
they will interact with other data types [1], but problems like that are only
mentioned in passing. Also, the additions to the language brought by version
2.0 are not reflected in the book. For example, abstract classes are discussed
but the new language facilities for implementing them are not. In other words,
you should not look to this book for the best examples of C++ usage; the focus
of the book is at a higher level.
The author chose an excellent framework for a book on object-oriented design;
much of the impetus behind the object-oriented movement is its promise of
simplifying the design, implementation, and maintenance of large software
systems. Discussions of object-oriented design that use small examples for
illustration tend to be unrealistic and unenlightening. If you're learning C++
or object-oriented programming, you've probably already seen your share of
examples such as using a class Car to derive sub-classes Jeep and Maserati.
Unfortunately, most of us can't make a living writing programs that implement
automobile taxonomy.
The author had another good idea when he decided to present a realistic design
process, rather than a completed, polished design. In real life, you usually
are not handed a complete, clear set of program requirements at the beginning
of a project. Instead, you may have to ferret out the requirements for
yourself and some may appear at inopportune moments, forcing you to revise
work already finished. Presenting a complete, polished product does not reveal
the tortuous path that led to it. The author made a conscious effort to
present the process, rather than the result, of object-oriented design.
Solving a realistic problem with object-oriented design is important because
books about design methodology all too often select problems that are tailored
to the methodologies' particular strengths. Although the book tackles a
realistic design problem, its solutions are often not as down-to-earth. For
example, the analysis in the book doesn't try to estimate hardware
requirements for the completed system or investigate the expected number of
data items; a large database requires more attention to efficient access
methods than a small one. Also, a significant amount of realism is sacrificed
by assuming that the database to be implemented will fit in memory; the
implementation ignores file I/O, record locking, and related headaches.
One plus for the book is that it is more balanced in its view of
object-oriented programming than other treatises. There is the occasional
lapse, such as offhand promises of reusability, but you can also find passages
such as "object oriented design is not a panacea" and an admonition to view
object-oriented as "an extension of proven techniques". As you study
object-oriented programming books, you have to be ready to separate useful
information from mere hype.
Rather than introduce object-oriented design as a step-by-step, rigorous
methodology, the book presents object-oriented solutions as common-sense
responses to real problems in its case study. This style can be effective (as
it was in Software Tools, by Kernigan and Plauger), but it requires careful
writing, trying to match each design issue with the problem that can best
illustrate it and describing the alternatives that weren't chosen as
adequately as those that were. Object Oriented Program Design doesn't always
succeed in making its points clear.
The common-sense advice in the book occasionally becomes so vague as to appear
contradictory. An example worth detailing relates to whether, and under what
circumstances, it is acceptable for an object to have an internal state that
affects its behavior. The summary of Chapter 1 states, "Objects behave in a
uniform fashion, without regard to the data they contain." On the other hand,
the summary of Chapter 2 states, "Objects may behave differently because of
their contents." It is possible to interpret these statements in a
non-contradictory way, but that is made more difficult by an early object
class design example: Collections.
A Collection object is intended to be a general-purpose collection -- you can
put objects in the collection, delete objects from the collection, and examine
the objects in the collection. Examining the objects is accomplished by making
the Collection object class keep track of a "current" object in the collection
and implementing operations such as "get current", "get first", and "get
next". This has an unfortunate side effect: a function that accesses a
Collection and calls another function which accesses the same Collection may
find its "current" place in the Collection unexpectedly changed. Nothing in
the book indicates whether the author considered this limitation and it casts
doubt on whether he actually intended to say that an object's behavior should
not depend on the data it contains.
The case-study approach of the book complements, rather than replaces, more
theoretical discussions of design. It is unfortunate, then, that the author
included no references. A bibliography would have supplied insight into how
the ideas in the book evolved and provided the opportunity to study particular
ideas in more detail.
The book's C++ implementation is modelled after Smalltalk-like languages and
other styles of usage are not explored. The author discusses weakly-typed
versus strongly-typed systems, but some interesting issues are untouched.
Choosing between a class hierarchy with a single root and a forest of classes,
for example, is a very relevant issue to C++ programmers, since C++, unlike
Smalltalk, does not encourage one choice over the other. The GNU C++ library
[2] is a current example of the latter style while the OOPS library [3] is a
well-known example of the former. It would have been interesting to read the
author's comments on this ongoing discussion.
It is customary in technical books for the author to credit the editor and
technical readers for the correct passages in the book and absolve them of any
blame for the errors. In Object Oriented Program Design, however, the editor
and technical reader let the author down in a couple of respects. First,
programmers deserve books with correct spelling, punctuation, grammar, and
consistent voice and the editor missed many instances of these faults in the
book. Typographical errors in program code are especially irritating when
you're studying a new language. Second, the technical reader should have
objected to the clumsier uses of C and C++. For example, the following
expression was used in lieu of the left-shift operator:
(int) pow((double)bitNo, (double)2)
In another example, two nearly identical functions are defined rather than
simply creating one with a default argument (a C++ language feature).


Summary


Although the book is blessed with good ideas, it is cursed with their flawed
execution. The result is still worth reading, but falls short of its
potential. If you are a C programmer learning C++, this book is a good way to
get a feel for the pure object-oriented approach to using the language, but
you should be aware that this is not the only useful style, nor is the
language used to best advantage in the examples. I like the author's ideas
very much and I hope he writes another book that carries them further.
References
[1] Murray, R.B., Building Well-Behaved Type Relationships in C++,
Proceedings, USENIX C++ Conference, 1988.
[2] Lea, Douglas, libg++, The GNU C++ Library, Proceedings, USENIX C++
Conference, 1988.
[3] Gorlen, Keith E., An Object-oriented Class Library For C++ Programs,
Proceedings, USENIX C++ Conference, 1987.
Object Oriented Program Design With Examples In C++
Mark Mullin
Addison-Wesley 1989
$19.95, 320 pages.




















Publisher's Forum
Earlier this year, we announced the first annual Bad C Pun contest. Our goal
was to rid the world of all those terrible puns that seem to spring
spontaneously from everyone who hears C mentioned.
Response to the contest was great. We received over 100 entries. Stan
Kelly-Bootle was gracious enough to serve as judge and jury. We announced the
winners at Software Development '90, held in Oakland, CA, earlier this year.
We want to emphasize that Stan chose the winning puns. I think you'll see why
we make this point as you read through the "winners."
Third runner up was from Robie MacRae, Somersworth, NH: "To C or not to C.
That is no longer the question."
The second runner-up pun hinted at the future of C. "From C to Shining C++",
from Albert J. Grubis, Bedford, MA.
The winning pun: "C-soned C programmers rely on their C-manship and The C
Users Journal to avoid C-sickness," from John Cromer, Lincolnton, NC. (C why I
wanted you to blame Stan.)
Seven C-soned programmers received fourth place honors:
"I came, I C'd, I conquered" by John Gardenier, Vienna, VA.
"What did Bob Hope say after a malloc? 'Thanks for the memory' ", from Rick
Knoblaugh, Houston, TX.
"How much char could a char star star if a char star could star char", from
Robert H. Dinah, Oakland, CA.
"Cphylis: a computer virus which attacks weakened structures," from C.
Kruchowy, Somis, CA.
From Barry Combs, Denver, CO., "UNIX is Cnik".
And this entry from Jeffrey R. Claus, Santa Monica, CA, was in the form of a
dialogue:
"Q: Are you a qualified programmer?
A: Si.
Q: Have you ever used the UNIX operating system?
A: Si.
Q: What languages have you used?
A: Si."
Can you believe we paid cash money prizes for some of these?
Sincerely yours,
Robert Ward, Editor/Publisher






































New Releases


CUG314 MNP C Library


Written by Gregory Pearson (CA) and submitted by Michael Yokoyama (HI),
Microcom MNP C Library is a set of subroutines that implements the stream mode
(Class 2) of the Microcom Networking Protocol (MNP) link protocol. MNP link
protocol's stream mode works with MNP error-correcting modems or with other
software implementations which use the Microcom MNP Library or other
compatible software. The program is suitable for use with a Microsoft C
application using the small code/small data model. The volume includes C and
assembly source code for the library, a programmer's guide, and a sample
terminal emulation program.


CUG315 FTGRAPH(Fast-Fourier Transform Graphics)


Contributed by Thomas R. Clune (MA), FTGRAPH is a set of utilities for
performing Fourier transforms and inverse Fourier transforms. The program also
performs operations such as multiplication of data files, auto-power spectrum,
crosspower spectrum, correlation from power-spectrum data, and filter
time-domain real data. The result can be displayed on the monitor (Figure 1)
or printer, or saved as an HPGL file.
The minimal hardware requirements are: an IBM PC, XT, AT or clone, 256K or
more RAM, MS-DOS or PC-DOS v2.0 or later. The program will use a math
coprocessor (8087, 80287, or 80387) if present, but does not require it. The
program supports Hercules, CGA, EGA, and VGA graphics cards. A Microsoft (or
compatible) mouse can select an option from the menu. The disk includes a
complete set of C source code including a mouse driver, documentation, and
sample data files such as a 16-cycles square wave data, a gaussian waveform, a
sine and cosine wave data, and a noisy Gaussian curve. The program was
developed using Microsoft C v5.1 (large model). There are some Microsoft
specific features. The program is copyrighted by Eye Research Institute.


CUG316 AS8 Cross Assembler


Contributed by H.G. Willers, this volume includes a cross assembler for Z8
microprocessor. The assembler is based on the code of a cross assembler for a
Z80 processor from DECUS and enhanced with a hashed symbol table and several
bug fixes. The source code compiles under MS-DOS using Mark Williams Let's C
(v3.0.2) and Quick'C (v1.01) and under System 5.3 UNIX for 68020 and
Interactive UNIX for 386. The disk includes C source code, a users guide and
test files.
A reference of cross-assemblers in our library appears in Table 1.


UPDATES




CUG220 Window BOSS


Phillip A. Mongelluzzo (CT) from Star Guidance Consulting has submitted
Revision 02.15.90 of The Window BOSS and Data Clerk. This release includes
support for Lattice C v6.0 and Zortech C v2.0. The Window BOSS now supports
Microsoft, Borland, Lattice, Watcom, Mix Power C, Computer Innovations,
Datalight, and Aztec C.


CUG243 DECUS C Preprocessor


Robert Artigas Jr. (TN) has updated the DECUS C preprocessor to work under
OS/2. The new version still compiles under MS-DOS.


CUG285 Bison For MS-DOS




CUG290 Flex


Robert Artigas Jr. (TN) has updated Bison and Flex, with several minor changes
and enhancements. The new release also includes an OS/2 executable. The
programs now can be compiled under MS-DOS, OS/2 and UNIX systems.


CUG303 MC68K Disassembler


We have recieved John M. Collins' (England) revision of the MC68K Disassembler
via T.W. Kalebaugh (KS). The new version takes a COFF (Common Object File
Format) object file as input and disassembles it.
Figure 1
Table 1

CUG148 by Alexander Cameron
 TI 9900/99105

CUG149 by Will Colley
 Motorola 6800,6801
 RCA 1802, 1805A
 Motorola 6805 Family

CUG190 by Steve Passe
 Motorola 68000

CUG219 by Will Colley
 Intel 8048, 8049
 Rockwell 6502
 Rockwell 65C02

CUG242 by Will Colley
 Motorola 68HC11 Family
 Hitachi 301 Family
 Intel 8051

CUG261 by Stuart Dole
 Motorola 68000

CUG267 by Will Colley
 Intel 8080, 8085
 Signetics 2650
 STM Microelectronis S6 Family

CUG276 by Will Colley
 Zilog Z80
 Motorola 6804 Family
 General Instruments PIC1650 Family

CUG292 by Alan R. Baldwin
 Motorola 6800(6802/6808),
 6801(hd6303), 6804,
 6805, 6809, 6811
 Intel 8085(8080)
 Zilog Z80(hd64180)

CUG316 by H.-G. Willers
 Zilog Z8




















New Products


Industry-Related News & Announcements




AT&T Offers Research Version Of Their Concurrent C


AT&T's UNIX software operation has announced the general availability of a
research version of Concurrent C Release 1.0 and of C++ Release 2.1.
Concurrent C is a superset of C that is tailored for parallel programming. The
language, designed and implemented at AT&T Bell Laboratories by Dr. Narain
Gehani and Dr. William D. Roome, has been licensed until now only as research
technology for universities and other research institutions.
"We're expanding the availability of Concurrent C to help more programmers
experiment with ways to write applications with parallelism for uniprocessor
and multiprocessor systems," said Mike DeFazio, director of system software at
the UNIX software operation.
Concurrent C is the result of an effort to enhance C so that it can be used to
write concurrent programs that can run efficiently on single computers, on
loosely-coupled distributed computer networks, or on tightly-coupled
distributed multi-processors. Concurrent C is an upward compatible extension
to C and C++.
Licenses for concurrent C are available in source code format from AT&T's UNIX
software operation on a 9-track tape in cpio and tar format. The current
implementation assumes machines with a 32-bit word. It also assumes that the C
compiler supports long identifier or "flex" names.
From March 19 to July 31, 1990, educational institutions will be able to
license Concurrent C for $700. The regular educational price is $1000.
The C++ Language System Release 2.1 fixes bugs detected by developers and
reported by customers since the release of R2.0 last June. The new release is
accompanied by the version of the C++ reference manual that Bjarne Stroustrup
has presented to the ANSI C++ committee. The C++ release is available to
software vendors in source code for $25,000. Upgrades from Release 2.0 are
$5000.
Vendors who wish to license either product should contact the UNIX Software
Operation in Greensboro, N.C., at (800) 828-UNIX, in London at (+44-1-)
567-7711, in Tokyo at (+813-) 431-3670, or in New Delhi at (+91-11-) 331-0513.


SMI Adds Design Tool To Better C


Silico-Magnetic Intelligence has expanded its Better C product line with a
design tool, Top-Down Designer (TDD). Better C is a program generator and
suite of tools that support a structured design methodology.
TDD isolates the design phase from logic programming. Once the design is
complete, the parameters are passed to the Better-C generator, which generates
either a module framework or function frame. The programmer codes the logic
into the generated frame. Design data can be saved for future modification.
The TDD package consists of a top-down design tutorial and the interactive
designer. Contact Silico-Magnetic Intelligence, 24 Jean Lane, Chestnut Ridge,
NY 10952. (914) 426-2610.


M++ Mathematical Class Library Now Supports Additional Types


Dyad Software has released v2.0 of M++, a C++ productivity tool that
eliminates the looping structures and memory allocation commands otherwise
needed to create and manipulate arrays.
The new version supports all character, integer and floating point arrays
including complex. Version 2.0 also adds a dynamic memory management system
that improves allocation speed while reducing or eliminating memory
fragmentation problems. The new allocation tools work for both M++ and
user-defined classes.
M++ 2.0 also improves on the speed and extent of the previous version's math
support.
M++ is compatible with AT&T C++ 2.0 compatible compilers and translators,
including Zortech and Glockenspiel. The MS-DOS and OS/2 version is $295.
Source code and UNIX, VMS and Apple versions are also available.
Contact Dyad Software, 13103 Travis View Loop, Austin, TX 78732 (800)
274-9739; FAX (512) 338-5599.


Library Supports Multi-Tasking


A new software library, Divvy, allows programmers to design self-contained
multi-tasking MS-DOS applications without resort to special realtime kernals
or multi-tasking operating systems.
Divvy supplies routines to create tasks, define task priorities, handle flags
and queues, get and change system status, and interface hardware interrupt
service routines into the multi-tasking system. Divvy can support "unlimited"
tasks, flags, and queues -- system memory is the only limit on application
size. Divvy includes multi-tasking versions of standard input and output
routines, so that other tasks can execute while one task waits for operator
input.
Divvy uses a priority-based, non-preemptive scheduler, avoiding the need for
re-entrancy and thus allowing applications to use standard libraries and
MS-DOS calls.
Price is $229. Contact Technical Sales, Drumlin, 1011 Grand Central Ave.,
Glendale, CA 91201 (818) 244-4600; FAX (818) 244-4246.


Magna Carta Adds 386 Support


Magna Carta Software has added support for Watcom C 386 to its C Windows
Toolkit. Watcom C 386 works with MS-DOS extenders and tools from both Phar Lap
and Eclipse. The initial C Windows Toolkit upgrade works only with the Phar
Lap tools. Magna Carta expects to add Eclipse support in about three months.
The C Windows Toolkit user interface library supports the development of
windows, help screens, popup menus and spreadsheet-style menus. Magna Carta
has previously offered support for the 286 Watcom C compiler.
The new version is $199.90; $99.95 as an upgrade. Contact Magna Carta
Software, Box 475594, Garland TX 75047-5594 (214) 226-6909.


Greenleaf Upgrades Math Library



Greenleaf has upgraded and renamed its decimal math library; Business MathLib
has become Financial MathLib. Greenleaf claims the new version increases the
speed of calculations by as much as 500 percent. Greenleaf has also added an
on-line help system that gives users a pop-up, cut and paste, hypertex help
system on-line. Greenleaf also provides help databases for the Norton Help
Engine and the QuickC Advisor.
In addition to precise decimal math, Financial MathLib provides functions to
calculate loan amortization, internal rate of return on bonds, depreciation,
and compound interest. MathLib also includes transcendental and trigonometric
functions, date manipulation, type conversions and an error detection and
reporting system.
The package supports Turbo C, Microsoft C, QuickC, Lattice C, and Zortech C.
Source code is included. Price is $395; upgrades for $65. Contact Greenleaf
Software, Inc., 16479 Dallas Parkway, Suite 570, Dallas TX 75248 (800)
523-9830; FAX (214) 248-7830.


More Diagnostics In New Lint


Release 4.0 of PC-lint, a source code analysis tool from Gimpel Software, adds
70 new diagnostic messages, better handling of library functions, and 30 new
command line options.
New diagnostics include checks for compile-time objects such as macros,
typedefs, and declarations. PC-lint reports compile-time objects not used,
both locally and globally; it will also point out header files not used,
externals that can be made static, and declarations that can be offloaded from
header files.
These new diagnostics depend upon header files which have been designated
"library". PC-lint doesn't expect library headers to be fully utilized, nor
even defined. Version 4.0 also supports the notion of a lint object module
which captures, in binary form, all the external information of a module or
modules, allowing for incremental linting via a make file.
Version 4.0 includes seven new options to customize message suppression.
UNIX-style options are also supported. Increased control over message format
facilitates automatic filtering and processing (e.g. with an intelligent
editor).
Newly added "Elective Notes", will warn the programmer about silent type
conversions caused by prototypes. The new version can also warn about name
clashes occuring within the first "count" characters, inner declarations that
hide outer declarations, and format/type inconsistencies.
PC-lint runs under MS-DOS or OS/2 with a minimum of 196K. Price is $139.
Contact Gimpel Software, 3207 Hogarth Lane, Collegeville, PA 19426 (215)
584-4261.


Major Microsoft Release Includes 'Workbench' Development Environment


Microsoft has announced a major upgrade of its C Professional Development
System. MSC 6.0 includes an integrated development environment, Programmer's
WorkBench, an extensible foundation on which all tools are run. The
environment supports the Source Browser, an optimized C compiler with new
language features, and a redesigned and enhanced version of the CodeView
debugger. Microsoft has also upgraded QuickC and QuickAssembler to be
compatible with the Programmer's WorkBench environment.
Microsoft claims that programs compiled with MSC 6.0 are typically seven to 10
percent smaller and 10 percent faster than with MSC 5.1.
Programmer's WorkBench is an integrated development environment with an open
architecture. Taking a project-centered approach, it integrates the editor,
Source Browser project database, build and add-on tools, and CodeView
debugger. The environment, which is identical under both MS-DOS and OS/2,
remembers details and settings between sessions.
Microsoft is publishing programming interfaces that will allow other vendors
to offer tools that can seamlessly run in the Programmer's WorkBench
environment. Sixty vendors have pledged to support MSC 6.0. (See accompanying
story.)
The new source browser allows programmers to interactively browse through a
project database, reviewing the relationships and use of variables, functions,
definitions and macros.
The new release adds these language features:
Based Pointers. Microsoft says that this new type of pointer combines the
flexibility of a 32-bit pointer with the size and speed advantages of a 16-bit
pointer.
Register-based parameter passing. Programmers can now pass up to three
parameters in registers.
In-line assembler. The accompanying release of CodeView v3.0 can reside in
extended memory, using as little as 20K on a 286 or better. With the new
version, programmers can dynamically record and replay debugging sessions,
including all input. Because CodeView runs as part of Programmer's WorkBench,
it automatically saves all breakpoints, watch windows and other settings
between sessions.
MSC 6.0 requires MS-DOS v3.0 or higher or OS/2 v1.0 or higher and comes on
both 5 1/4 and 3-1/2 in. disks. The package retails for $495; upgrades start
at $125.


New Geograf For Microsoft


One graphics library is now available for Microsoft C and QuickC.
Geograf Level One supports 13 fonts, including Greek, variable line thickness
and several line types. Plots from Geograf can be imported into any program
that accepts HPGL-formatted files. The package includes device drivers for
most graphics cards, dot matrix printers, inkjet printers, laser printers and
pen plotters.
Price is $149. Geograf is also available for MS QuickBASIC and FORTRAN.
Contact GeoComp Corporation, 66 Commonwealth Ave., Concord, MA 01742 (800)
822-2669; FAX (508) 369-4392.


Vendors Announce MSC 6.0 Versions


Several vendors have announced that they will upgrade their products to take
advantage of the new MSC 6.0 development environment. Among the earliest
announcements are:
ù ImageSoft Inc. has released Glockenspiel C++ 2.0 to support the Programmers
WorkBench environment. With this release, reference materials are available
through Microsoft C advisor. The compiler also interfaces to the CodeView
debugger to provide source-level debugging capabilities.
Contact ImageSoft Inc., 2 Haven Ave., Port Washington, NY 11050 (516)
767-2233; FAX (516) 767-9067.
ù Blaise Computing Inc. has upgraded C Tools Plus and C Asynch Manager to be
MSC 6.0 compatible.
The C Tools Plus library includes support for: virtual stackable menus and
windows with mouse input; multiple virtual pop-up help screens; multiline
input editing; EGA, VGA, and MCGA text modes; and enhanced keyboards. C Tools
Plus/6.0 comes with source code and includes an on-line documentation database
that works with Programmer's Work-Bench.
ù C Asynch Manager/3.0 has been upgraded to add modem control and file
transfer features. The new version supports multiple modems, 1K packets, CRC
error checking, true Y-Modem, and auto switching to incoming packet size and
error detection method. The file transfer routines now support background file
transfers.
C Asyngh Manager is $189; C Tools Plus is $149. Contact Blaise Computing Inc.,
2560 Ninth St., Suite 316, Berkeley, CA 94710 (415) 540-5441.
ù Raima Corporation has released a new version of db_Vista III specifically
for the MSC 6.0 environment. Raima reports that porting to MSC 6.0 reduced the
code size of their Windows DLL by 10K and improved the speed of their DBMS by
18 percent.
The db_Vista III package implements a combined relational and network database
model and includes an SQL query and report writer and database restructuring
tools. Contact Raima Corporation, 3245 146th Place SE, Bellevue, WA 98007
(800) 327-2462.
ù Systems & Software, Inc. has released C6toPROM, a programmer's workbench
compatible linker and locator for embedded systems development with MSC 6.0.
C6toPROM converts symbolic and debug information in C EXE files into ROMable
Intel OMF86 modules. Programmers can also use C6toPROM to download programs to
in-circuit emulators.
C6toPROM lists for $149. Contact Systems & Software, Inc. (714) 833-1700.
ù Vermont Creative Software is shipping Vermont Views 2.0 with MSC compatible
support for pull-down menus, window-based data-entry forms with tickertape and
memo fields, scrollable form regions, choice lists, and content-sensitive
help.
Vermont Views 2.0 is available for MSC 6.0, MSC 5.0/5.1 and Turbo C 2.0. Price
is $495. UNIX, Xenix and VMS versions are scheduled for release in the second
quarter.
Contact Vermont Views, Pinnacle Meadows, Richford, VT 05476 (802) 848-7731;
FAX (802) 848-3502.
ù Courseware Applications will upgrade its Chatterbox user interface library
and Drawbridge graphics editor.

The Chatterbox library can be used to create pull-down menu bars and dialog
boxes with both mouse and keyboard support. With Chatterbox, programmers can
create custom menus that include graphics, icons, and custom dialog controls
such as guages and switches.
The Drawbridge graphics editor includes both graphics and text editing
capabilities. The package creates images by generating source-code calls to a
graphics library. The new version of Drawbridge can be integrated with the
Microsoft Programmer's WorkBench.
Chatterbox is $189; Drawbridge is $89. Contact Courseware Applications, 481
Devonshire, Champaign, IL 61820 (217) 359-1878; FAX (217) 359-1880.
ù Creative Programming will upgrade their Vitimin C user-interface library,
and will include support for the Microsoft C Advisor.
Vitamin C for both MS-DOS and OS/2 is supplied with source code and includes
functions for overlapping windows, data-entry forms, menus, and context
sensitive help. A companion product, VCScreen, automates the screen and menu
design process. VSCcreen generates C source code that calls on the Vitamin C
library.
Vitamin C for MS-DOS is $225; $345 for OS/2. Updates start at $25. Contact
Creative Programming, Box 112097, Carrollton, TX 75011-2097 (214) 416-6447.
ù Nu-Mega Technologies has announced a new product, Bounds-Checker, and an
update to MagicCV, their CodeView enhancer. Both tools can be installed as
menu items in the WorkBench environment.
Bounds-Checker uses the 80386 processor to detect any access to memory not
owned by an application program. When a violation is detected, Bounds-Checker
uses the symbolic information created by MSC to show the line that caused the
memory fault.
MagicCV 3.0 adds support for MSC 6.0 and reduces the conventional memory
requirements of CodeView to less than 8K. MagicCV includes: an EMS 4.0 driver
with VCPI support; the ability to load high device drivers and resident
programs; improved ability to debug large programs.
Bounds-Checker is $249; MagicCV 3.0 is $199. MagicCV can be updated for $49.
Contact Nu-Mega Technologies, Box 7607, Nashua, NH 03060-7607 (603) 888-2386.
ù Oakland Group, Inc. announced that release 3.2 of C-scape will be fully MSC
6.0 compatible. C-scape and the accompanying Look & Feel Screen Designer can
be used to create data entry, menu, help and text screens with mouse support.
The package can also be used to prototype and simulate screens, and, when the
design is stable, to generate supporting C source.
Contact Oakland Group, Inc., 675 Mass. Ave., Cambridge, MA 02139 (800)
233-3733.
ù Paradigm Systems announced new versions of Locate and Inside. The MSC 6.0
version of Locate will include ROMable startup code for all memory models
along with response files for producing ROMable subsets of the MSC 6.0
run-time libraries. Locate will support both MSC 6.0 and QuickC.
Inside is a performance analysis tool that supplies information about call
target timing, event timing, stack usage and O.S. usage.
Locate is $295; upgrades start at $35. Inside is $125; upgrades $25. Contact
Paradigm Systems, 258 Main Street, Suite 209, Milford, MA 01757 (508)
478-0499; FAX (508) 478-2032.
ù Pocket Soft announced RTLink/Plus Release 3.1 with MSC 6.0 support,
including CodeView support.
RTLink/Plus is an overlay linker that includes facilities to: profile
performance, utilize caching, tune overlay structures, remove overlay
restrictions on function pointers, support multi-language development.
RTLink/Plus is priced from $295 to $495. Contact Pocket Soft, Inc., 7676
Hillmont, Suite 195, Houston, TX 77040 (713) 460-5600; FAX (713) 460-2651.
ù Sage Software, Inc., has announced that the Polytron Version Control System,
PolyMake, and the System Professional Editor will all be upgraded to include
support for Programmer's WorkBench (PWB). The editor will communicate with
CodeView via PWB state files, allowing it to invoke CodeView and automatically
display relevant source code when CodeView exits. The editor will also be able
to rebuild applications based on the makefile defined in the PWB state file,
and reinvoke CodeView to complete the edit/debug cycle. Sage intends for a
future release of the editor to also incorporate the Microsoft help engine.
Prices are: PVCS for MS-DOS $495; PolyMake for MS-DOS $199; Sage Professional
Editor for MS-DOS and OS/2 $295. Contact Sage Software, Inc., 1700 NW 167th
Place, Beaverton, OR 97068 (800) 547-4000; FAX (503) 645-4576.











































We Have Mail
Dear Mr. Ward:
I'm writing to complain because The C Users Journal does not have a nude
centerfold. Also, I think that every subscription should include a bottle of
rum and a free compliler. Also ...
The above tongue-firmly-in cheek comments illustrate a common thread I see
running through the complaint letters you publish: they all want this magazine
to be something other than what it is. From what I have seen, CUJ is not a
magazine for anyone who expects to learn C in ten minutes a day (and has only
written in Basic), nor is it intended to focus on one narrow, obscure aspect
of the language. I find this magazine to be illuminating, interesting,
informative, and a bit eclectic. I like it; of the several periodicals I read,
CUJ is the only one I always read cover-to-cover and in excruciating detail.
The information I get from this publication helps me in my mundane, MS-DOS
utility writing, my multiprocessor operating system design, and my machine
intelligence research. In short, it covers a hell of a lot of ground for one
magazine, and I am quite impressed. It's more than worth the money, and it is
the only magazine to which I subscribe.
Of course, the centerfold would be nice.
Very truly yours,
Ian S. King
520 SW Yamhill #430
Portland, Oregon 97204
Thank you, thank you, thank you. You just made my day. --rlw
Dear Mr. Ward,
Finally! I was delighted to see the discussion of arrays of pointers to
functions in Rex Jaeschke's column (March, 1990). An array of pointers to
functions is one of the most powerful constructs available in the C language,
yet it is almost never discussed in the trade journals or used in code.
We have been using arrays of pointers to functions (we call them procedure
tables) for 4 years. The uses for them go way beyond Rex's discussion. They
give the programmer the capability to treat code as if it were data, and this
opens up an entirely new world for system design.
As a brief example, consider the following task. Generally when a program is
initializing, it performs a series of tasks such as opening files, allocating
memory, and initializing communications ports.
Most of these steps will require some sort of error checking to make sure that
they were properly executed. If any error is encountered, the program must
stop execution and handle the error in a civilized manner. In addition, all
errors must be handled in exactly the same manner.
A traditional method for performing this would involve one very large function
with the steps coded one after the other. This method is bug-prone and
difficult to maintain.
Using a procedure table greatly simplifies this chore. The various
initialization tasks are divided up into bite-size pieces and each is given
its own function.
(See the code in Listing 1.)
The task of the function initialize() is to call each of the functions that is
in the procedure table init_funcs(). It calls them by looping through the
table. If any of the functions encounters an error, it returns a unique
non-zero return code. This causes initialize() to immediately return the value
that it received. The function that called initialize() then handles any error
based on the unique error code returned to it.
When a new task must be performed in the initialization, we just put its
address into the procedure table and it will be handled in the same systematic
manner as the other tasks, without the fear of side-effect bugs. Or if we find
out that we must rearrange the order of execution of the tasks, we just
rearrange the declaration of the procedure table.
This simple example still does not do justice to the power of procedure
tables, but it should give a starting point to anyone who is interested.
I would also like to offer one minor correction to Rex Jaeschke's column. He
states that the overhead of a function call could be avoided in his example by
indicating to the compiler to generate inline code for the functions called by
the statement:
(*funtable[pnode->objtype]
(pnode->pobject);
There is no way a compiler could generate inline code for this statement. The
reason is that the compiler does not know the address referenced by this
statement at compile time because pnode->objtype can be changed at will during
runtime. Also the addresses stored in funtable could be changed at runtime.
If you wish to do some further reading, the only other article I know of that
discusses this subject is "C Procedure Tables", in the August 1989 issue of
Dr. Dobb's Journal.
Keep up the good work. I love your magazine.
Sincerely,
Tim Berens, President
Back Office Applications, Inc.
6691 Centerville Business Parkway
Thanks for the kind words. Others have commented that they appreciated Rex's
discussion of jump tables. --rlw
Attn Robert Ward:
Subject: Feature not Bug
In reply to Mr. Alexander Vladimirovich Pavlov's letter (in the April 1990
edition of CUJ P. 140), Mr. Pavlov is describing an 'option' of Turbo C rather
than a 'bug'. In the Turbo C user's guide it describes a way to override the
standard library files in the project file. To override the standard library,
all one needs to do is place a special library name anywhere in the list of
names in the project file. The name of the library must start with a C,
followed by a letter representing the model (such as a 's' for the small
model); the remaining characters, up to six, may be a combination of letters
for a file name. However, an explicit .LIB extension must be used. See page 36
of the Turbo C user's guide for more information.
Grant Larkin
Borland International
Technical Support Department
1800 Green Hills Road
Scotts Valley, CA 958066
Thanks for the pointer. It's nice to know that the folks at Borland are
listening. --rlw
Gentlemen:
Some time back, you sent me an invitation to join the C Users Group and
subscribe to your journal. I have delayed joining for any number of reasons.
This past weekend I was in Detroit, and happened upon a copy of your
publication. I have only read one article from the magazine (Dr. C's Pointers)
and I am impressed. While I do not write operating systems or data base
applications on a routine basis, I am proficient enough to develop significant
applications. I am often skeptical of offers for magazine subscriptions, as
they provide only a few articles per year which are truly worthwile.
Enclosed, please find my subscription.
I wonder if you could also tell me how to order (and the cost of) a previous
issue. I would like to purchase the Feb 90 issue.
I must comment on the disgruntled reader/subscriber who found the price
increase an assault to his wallet (We Have Mail - March '89). I too side with
you. If you offer this kind of material in each issue, no one should take
exception to your policies and price structure. If you were to go out and
purchase a text, you will pay nearly as much (and possibly more). Even when
these are titled as "advanced", you must often wade through pages of useless,
basic material. Keep up the fine work.
Sincerely,
Gregory L. Filter
87 Lathrop
Battle Creek, MI 49017
Gee, is it my week or what? Thanks much again.
Dear Robert,
I was disappointed to find that this month's Obfuscated C Code contest became
instead a Totally Obscured C Code contest. I was even more disappointed to see
that the wrong version of my Threads source was printed last month, and that
no errata appeared this month. Has changing to a monthly schedule hurt your
quality control that badly?
I take it that I am not the only disappointed party, since I have gotten
correspondence from as far away as Saudi Arabia asking about problems with the
(wrong) Threads code. So let me try to clear up some common confusions for
your readers. First, if you are having trouble, get the right version of the
code from CUG (Volume 306). The version printed with the article was an early
draft. Second, if your compiler is not ANSI, you will need to remove the
prototypes in THREADS.H, replace void declarations with white space, and
modify THREADS.C to use old style function definitions, rather than
protytypes. Third, be sure you know your system. My code assumes that the
stack grows towards low memory, but some systems (e.g. ATT3B2) do it the other
way round. If so, you will need to reverse the ThProbe macro and the "not
enough stack" test in ThInit. Some other systems (e.g. VAX) keep close guard
on the stack, and won't allow a longjmp() to a deeper point in the stack.
Study your system's function call protocol and setjmp() and longjmp() code if
you have any doubts about whether Threads will work.
To get some lemonade from this lemon, the differences between the draft and
final versions of Threads may be instructive. The draft version maintained a
parent field for each thread, a vestige of the UNIX origins of my design. The
parent/child concept in UNIX is mainly for managing resource ownership,
whereas my threads share all resources, so I removed the relevant code in the
final version. This led to a smaller ThTabl structure, and greatly simplified
the logic in ThInit. The draft version maintained a run list for a simple
round robin scheduler. I found that I rarely used this feature, but usually
needed some more specialized scheduler. So I replaced it with a simple ThNext
macro, further reducing size and complexity. Finally, the draft version tests
input parameters and returns error codes. Since most cases of invalid
parameters would indicate severe program errors, I replaced most of these
tests with the assert() macro. Once you are certain that only valid parameters
can be passed you can define NDEBUG to remove the tests. The result of all
these changes was a much smaller and faster kernel.
Sincerely yours,
Gregory Colvin

680 Hartford
Boulder, CO 80303
Ah, I knew it was too good to last. This I don't want to talk about. This I
really don't want to talk about.
But, I suppose I must.
Yes, we have made a few serious mistakes recently. For one, we used the wrong
version of your code. We have placed the correct version on the code disk for
that issue. Second, in a masterly exhibition of disorientation, I put the
Publisher's Forum and the cover banner from the May issue on the April issue.
(I had just finished editing the April issue's obfuscated code story at the
time.) The amazing part of this screw-up is that the "dummy" version of the
book and proofs of the column were reviewed by four others without triggering
any warning bells.
Yes, some of these mistakes are related to work-load, and we've added staff to
address those problems. Some are just "transition glitches", and we're working
deliberatety to correct those. Please accept my abject apology.
Jeez, this is embarrasing. --rlw
Dear Robert:
I've subscribed to the C User's Journal for a couple/three years.. have
definitely missed your "How to do it in C" column recently. Now I see what
you've been up to.
"Tech Specialist" sounds like a rank in the armed services, but I'll give it a
try anyway. Perhaps this will be the platform from which you will launch the
long-awaited swimsuit issue....
Scott Swanson
Donna keeps nagging me to do more columns. I keep collecting ideas. Now all I
need is some more time. So why are we starting another magazine ...?
I can't promise a swimsuit issue (Donna still hasn't approved it), but I can
promise that we'll do our best to publish technical information you don't get
anywhere else. --rlw
Dear Mr. Ward:
As a "learning" C programmer, I was immediately engrossed in studying and
coding Leor Zolman's mini-database program (April, 1990,p.69+). Last year I
wrote a program that had essentially the same goal as Mr. Zolman is
addressing. Seeing his slick and efficient method of handling I/O in data
structures makes my approach a candidate for your obfuscated C code contest.
I am more than grateful for what Mr. Zolman has already taught me, and I look
forward to the remaining articles in the series. I hope, too, there will be
many more tutorial features by Leor Zolman in future issues of the CUJ.
Sincerely,
John Ullmer
Data Archiving Services
P.O. Box 160637
San Antonio, Texas 78280-2837
You just made Leor's day.
Leor has another installment in this issue. I too think he's done a good job,
and I plan to have him do more. Would you like to suggest a particular topic?
--rlw

Listing 1
#include<stdio.h>

extern alloc_memory(),open_files(),init_screen(),init_printer(),
init_vars(),init_commport();

char * memblock;
#define BLOCK_SIZE 4096
#define MEM_ALLOCATION_ERROR 2001

int (*init_funcs[])() = {
alloc_memory, /* Each of these initialization functions */
open_files, /* returns a value of 0 if it executed OK. */
init_screen, /* If an error occurred ,it returns an error*/
init_printer, /* code that is unique to itself, such as */
init_vars, /* MEM_ALLOCATION_ERROR. The error is*/
init_commport, /* handled by the routine that called */
NULL /* initialize() */
};

initialize()
{
int i,errstat;
for(i = 0 ; init_funcs[i] != NULL ; ++i){
if(errstat = (*init_funcs[i])()){
return(errstat);
}
}
return(0);
}

alloc_memory()
{
char * malloc();
if(!(memblock = (char *)malloc(BLOCK_SIZE))){

return(MEM_ALLOCATION_ERROR);
}
return(0);
}

/* The rest of the initialization functions are setup like
alloc_memory() */
























































An Object-Oriented Approach To Command Line Options


Don Colner


Donald Colner is a computer software consultant with 19 years experience in
software engineering, and applications, systems, design and development. He
can be reached at Basic Data Systems, Inc., 2202 Sherbrooke Way, Rockville,
Maryland 20850 (301) 279-2791.


Object-oriented programming focuses on an object, dividing a program into two
parts: a high-level part that needs information about the object, and a
low-level part that provides that information. Object-oriented programming
languages enforce this division by hiding the data which describe the object
from the high-level part of the program.
Object-oriented programming is both an art and a craft. Identifying the object
and the information which is needed about the object is mostly art. The
resulting collection of data and functions is called a class. On the other
hand implementing the data structures and functions which make up the class is
mostly craft; this low-level part of the program is usually addressed with
traditional programming techniques.
Scott Maley, in his article "The World of Command Line Options," (The C Users
Journal, Vol. 8, No. 3, March 1990) provides an excellent exposition of a
traditional solution to the frequently encountered task of accessing command
line arguments. To contrast traditional with object-oriented techniques, in
this article I will develop an object-oriented solution to the same problem
using top-down functional analysis to drive the design.


The Problem


Many C programs begin by parsing the command line arguments accessible through
argc and argv. One would expect that consistent use of a function like
getopt() from UNIX System V would free the programmer from re-inventing this
error-prone code with every new application. Frustratingly, the reality is
that the code is seldom reusable because the usage isn't completely consistent
between programs and because the available functions (like getopt()) rarely
meet all of one's needs.


Top-Down Functional Analysis


Top-down functional analysis begins by identifying the data abstraction (e.g.
the command line arguments) and defining what information the high-level
functions may require about the data abstraction. The high-level functions
which need the information should be considered prior to any code which
implements the class. Listing 1, TestOpt.c, provides an example.
To treat the command line arguments as an object, first decide what you want
to know about the arguments. Assume that command lines are structured so that
they contain the program name in argv[0], switches (a letter preceded by -),
and options (arguments after argv[0] that don't begin with -). These options
are frequently file names. Switches are sometimes followed by a parameter. For
example,
MyProgram -x -f 10 MyFile YourFile
In this example the program name is MyProgram, there are two switches, x and
f, and the f switch has the parameter 10. There are two options, MyFile, and
YourFile.
The questions we might want to ask about a command line with this structure
include:
Is a certain switch set?
What is the parameter given for a certain switch?
What is the next option?
Do I need to look for more switches?
What name was used to invoke this program?
These questions can be answered with the following set of functions:
IsSwitch( opt, 'f');
GetParameter( opt, 'f');
GetNextOption( opt );
IsMoreSwitches( opt );
GetProgramName( opt );
Listing 1, Test0pt.c, shows how these functions could be used in a C program.
Notice that this top level design analysis does not focus on the structure of
the data referenced by opt. In fact, we have not even mentioned it. Up to this
point in the analysis, opt is a data abstraction.


Public Class Definition


The public class definition for Options is contained in the header file,
Options.h (Listing 2) and should be the second focus of attention. This header
file contains the prototypes of the functions which operate on the abstract
data type. Note that the definition of Options:
typedef void * Options
provides no information at all about the structure of the data which is
manipulated by the functions such as GetParameter() in the Options class.


Private Class Definition


The private class definition of Options is contained in Options.c, Listing 3.
The details of the data structure pointed to by the Options opt are restricted
to this file. As a result, if a high-level function such as main() wants any
information about opt, it must invoke one of the Options class functions, such
as IsMoreSwitches(). This makes the class data private to the class functions.
Making the definition of the data structure private to the class assures us
that none of the high-level functions can manipulate the class data
incorrectly. Any mistakes are created inside low-level functions which
implement the class. This "information hiding" makes debugging easier.
This approach restricts the details of the class implementation to a single
file, the private class definition Options.c. As a result, if it becomes
necessary or desirable to change the class, the impact of that change is
restricted to a single file. This makes bugs easier to find and enhancements
simpler to implement. If you later decide to alter the behavior of a class
function, you know that only the low-level functions will need to be changed.
If, for example, you decide that switch letters should be case insensitive,
you will need to change only IsSwitch().



Constructor And Destructor


So far I have ignored two very important functions in Listing 1:
Options CreateOptions( void );
void DestroyOptions( Options );
This function pair, called the constructor and destructor functions, plays a
very important role. The constructor function instantiates the object by
dynamically allocating data for one copy of the structure which will hold the
private data of the Options class.
Dynamic data allocation permits the reusability of the class. In fact, the
combined requirement of data privacy (meaning that only functions within the
class have knowledge of the structure of the object) and reusability make
dynamic memory allocation by a class function the only way to instantiate an
abstract data type.
The destructor function complements the constructor function. The constructor
allocates the memory to instantiate the abstract data type, and the destructor
deallocates it.


Implementation And Coding Style


Now that the abstract data type is defined, I'll illustrate its use and
implementation by developing a program (lp.c) to drive a printer. If I
implement this program as a trivial call to a function PrintFiles():
/* lp.c
*/
void PrintFiles( int argc,
char *argv []);
void main( int argc,
char *argv []) {
PrintFiles( argc, argv);}
then I can easily incorporate the PrintFiles() functionality in other
programs. Because I've handled the command line options as an abstract data
type, any program which uses the function PrintFiles() can use the class
Options to manipulate its arguments. If all your programs use the class
Options to parse the command line options, you can achieve a much more
consistent user interface and extend this user interface from the command line
to functions.
Two conventions have been followed in defining the public class functions.
Each of these functions is named by a predicate followed by an object with the
first letter of each word capitalized. In addition, the first parameter for
every function except the CreateOptions() function is of type Options.
For any class named Object, the following should be included in the Public
Class Definition of Object:
typedef void * Object;
Object CreateObject( void );
void DestroyObject( Object );
The use of the macro named this is consistent with the usage in C++. In
Listing 3, the structure OPTIONS is only used in two places in the file. It is
used in CreateOptions() to determine the amount of memory required to
instantiate an Options, and is used in the macro definition of this. The this
macro gets around the awkward implementation detail of dealing with the object
as a (void *) in the high-level functions and as type (OPTIONS *) inside the
class definition.
I suggest consistent error handling be placed at the lowest possible level.
The ErrorExit() function in Listing 3 illustrates a relatively simple example
of a general-purpose error handling function. Low-level error handling implies
exiting from the program when the error is encountered. Error recovery schemes
ordinarily require handling the errors within the high-level functions,
producing much more complicated programs. One of your objectives should be to
force complexity to the program's lower-level functions.
Note that none of the functions in the Options class return error values or
set error flags. This approach is usually satisfactory so long as you can
change the error handling to deal with different types of user interfaces, say
a command line or windowing interface. If two styles of user interface exist
in the same program, it may be necessary to maintain a global pointer to an
error handling routine which is changed when the user interface changes.


Summary


The major elements of this design approach are:
Top-down, functional design
Private Class Definition in one .c file
Public Class Definition in header file
typedef void * Object; abstract data type
this macro to simplify class functions
Predicate-Object function names including:
Object CreateObject( void ) and
void DestroyObject( Object )
Object as the first function parameter
Low-level error handling
The first four items are essential to the object-oriented design process. The
remaining items are more a matter of style than substance.
Even though C++ offers much more sophistication in implementing
object-oriented programs than C, it's clear that an abstract data type can be
created and manipulated in C. By applying the basic principles of
object-oriented design to their applications, C programmers can realize the
advantages of the object-oriented approach, including: better support for
top-down design by using abstract data types; more reusable function libraries
(classes); and more maintainable functions (private class definitions).

Listing 1
/* TestOpt.c
*/
#include <stdio.h>
#include "Options.h"
void
main( int argc, char **argv){

Options opt;
char *s;
PutArgs( (opt=CreateOptions()), argc, argv);
printf( "-x switch is %s.\n",
IsSwitch( opt, 'x' )?"ON":"OFF");
if( (s=GetParameter(opt,'f')) != NULL )
printf(" -f parameter: '%s'.\n", s);
else printf(" -f parameter omitted.\n");
if( IsMoreSwitches(opt) )
printf("Unrecognized switch character.\n");
printf( "The command line arguements are: ");
while( (s=GetNextOption(opt)) != NULL )
printf( "\t%s", s);
printf(".\n");
DestroyOptions(opt);
}


Listing 2
/* Options.h
*/
enum boolean { FALSE, TRUE};
typedef int boolean;
typedef void * Options; /*hides definition of the object*/

Options CreateOptions( void );

void PutArgs( Options options, const int argc, char **argv);
/* Establishes the options list as the argc strings
pointed to by argv.
*/

boolean IsSwitch( Options, const char switch_character);
/* Returns TRUE if -switch_character is an option.
It also marks this arguement as used.
*/
char * GetParameter( Options, const char switch_character);
/* Returns a pointer to the string following
the first occurance of -switch character
in the arguement list, called the switch parameter.
It also marks this arguement as used.
*/

boolean IsMoreSwitches( Options options );
/* Returns TRUE if there are switches
which have not been marked as used by IsSwitch()
*/

char * GetNextOption( Options );
/* Returns pointers to the options
in first-in, first-out or left-to-right order,
and NULL when there are no more options.

Options are the strings in the arguement list
which are neither switches nor switch parameters.
GetParameter() marks the switch parameters.
*/

void DestroyOptions(Options);



Listing 3
/* Options.c
*/
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include "Options.h"
struct tOPTIONS{
int argc, next;
char **argv;
boolean *used;
};
typedef struct tOPTIONS OPTIONS;
#define this ((OPTIONS *)options)
static void
ErrorExit( char *format, ...){
va_list ap;
va_start( ap, format);
fprintf( stderr, "Error: ");
vfprintf( stderr, format, ap);
fprintf( stderr, "!\nAborting.\n");
exit(1);
va_end( ap );
}
boolean
IsSwitch( Options options, const char switch_character){
int k =0;
for( ;k < this->argc; k++ )
if( this->argv[k][0] == '-' &&
this->argv[k][1] == switch character
)return (this->used[k] =TRUE);
return FALSE;
}
char *
GetParameter( Options options, const char switch_character){
int k =1;
for( ;k < this->argc; k++ )
if( this->argv[k][0] == '-' &&
this->argv[k][1] == switch_character
){
this->used[k++] =TRUE;
this->used[k] = TRUE;
return this->argv[k];
}
return NULL;
}
boolean
IsMoreSwitches( Options options ){
int k =0;
for(; k < this->argc; k++)
if( !this->used[k] && this->argv[k][0] == '-' )
return TRUE;
return FALSE;
}
char *
GetNextOption( Options options ){
for( ;this->next < this->argc; this->next++)

if( !this->used[this->next] &&
this->argv[this->next][0] != '-'
)return( this->argv[this->next++] );
return NULL;
}
Options
CreateOptions(void){
OPTIONS *opt;
if( (opt =calloc( 1, sizeof(OPTIONS))) == NULL )
ErrorExit("Out of memory in CreateOptions()");
opt->argc =0;
opt->argv =NULL;
opt->next =0;
opt->used =NULL;
return (Options)opt;
}
void
PutArgs( Options options, const int argc, char **argv){
this->argc =argc -1;/* ignore the program name */
this->argv =argv +1;/* in argv[0] */
this->next =0;
if( (this->used =calloc(this->argc,sizeof(boolean)))
== NULL )ErrorExit("Out of memory in PutArgs()");
{ int k=0;
while( k < this->argc )
this->used[k++] =FALSE;
}
}
void
DestroyOptions( Options options){
if( this->used )free( this->used );
free( this );
}






























Encapsulation, Inheritance And Late-Binding In C++


Ali Hazzah


Ali Hazzah is the president of Safe Harbor Associates, an independent
consultancy which specializes in helping the MIS departments of large
organizations develop applications which use emerging technologies, such as
object-oriented programming and cooperative processing. Hazzah has over nine
years experience in the computer industry. He may be contacted at (914)
738-5043.


Object-oriented languages support the concepts of encapsulation, inheritance,
and late-binding.
These terms can be loosely defined thus:
Encapsulation implies the explicit grouping of data and one or more functions
into a single programming construct, with the data accessible only through its
predefined function set.
Inheritance involves the "reuse" of preexisting chunks of software to build
new functionality (which automatically "inherits" the charateristics of the
reused code) into a program.
Binding refers to the attachment of member functions to an instance of a
class, or of function calls to those functions with which they are associated.
It is an evaluative process which may occur at compile time or at runtime.
When binding occurs at runtime, the process is called late-binding.
C++ is often characterized as a multi-paradigm language, which means that it
combines non-object-oriented concepts with object-oriented ones.
The question is, to what degree is C++ object-oriented? 
A fundamental notion in C++ is that of a type. Types can be variables,
structures, or classes. A variable type may be a character, integer, floating
point number, or a double.
A variable can be declared in a program as follows:
int x;
where x is the variable name, and int its type.
Variables can be grouped together into structures. These constructs help to
organize data in a program by treating it as a unit. For example:
struct point {
int x;
int y;
};
point is the structure tag; x and y are its members.
A structure may only contain variables.
In C++, operators such as + or - are used to manipulate variables. A
programmer may write an expression using these variables and operators to
produce a new value. For example:
x + y ...
here, the variables x and y are added, presumably to arrive at some meaningful
result.
Since x and y have previously been declared with the keyword int (which is
provided by C++), the programmer need not be concerned with internally
representing these variables. In other words, the compiler is charged with
defining their representation at the machine level.
Moreover, the programmer need not worry about specifying to the compiler how
integer addition is to performed; this job is also taken care of by the
compiler.
C++ provides the programmer with a number of built-in variable types. In
addition, C++ allows the compiler's type support mechanism to be extended by
allowing the programmer to define types.
Programmer-defined types are implemented as a combination of variables and
structures, as well as the permitted operations on these. The operations
themselves are implemented as functions which are written by the programmer.
C++ provides the type class for this purpose.
A class lets the programmer gather, in a self-contained unit recognized by the
compiler, the representational and operational aspects of the new type. It
also provides mechanisms for specifying various levels of accessibility to the
data and functions contained within the class. For example:
class shape {
// the default is private
point center;
void private_move (point to);
protected:
void protected_move (point to);
public:
void move (point to);
...
};
where shape is a class containing a private variable center, the private
function private_move, the protected function protected_move, and a public
function move.
In C++, this technique implements encapsulation, since it places center and
its associated functions into a "capsule." All accesses to center must take
place through these functions; this rule is strictly enforced by the compiler.


Encapsulation


A C++ programmer begins to write a program by building a conceptual framework
of the problem domain.
The programmer identifies various concepts, decides on how to represent these,
and specifies the operations which can be performed on instances of that
concept.
An instance of a class is called an object and is declared as:
shape myshape;
This statement declares myshape as a variable of the class shape.

A client (any external function) typically interacts with an object by issuing
function calls. For example:
myshape.move (point (0,2));
which moves myshape from wherever it was to the location 0,2.
Hiding the implementation from the client allows the programmer to safely
modify a member function. As long as its calling sequences and semantics are
preserved, client calls will continue to work, since encapsulation prevents
them from relying on particular implementation details.
In addition, encapsulation adds to program understandability, promotes
reusability, and simplifies debugging (providing that source level debuggers
are used).


Inheritance


Inheritance takes C++ a step further into "object-orientedness."
An overall design principle in C++ holds that a concept does not live in
isolation, but is related to other concepts. This principle is supported in
C++ by inheritance, which is basically a compiler-enforced organizational
notion.
In effect, a programmer designing a C++ application first needs to identify
and understand the concepts which are required by the application, in order to
create hierarchical classes out of these.
A class hierarchy in C++ is composed of a base class and its derived classes.
A base class defines a fundamental notion, which a derived class then refines
by providing additional members. This mechanism allows derived classes to
reuse the existing capabilities of the base class, and modify only what is
needed: the base class is simply extended so that the programmer does not have
to rewrite it in order to add new features.
For example, a problem domain may contain, in addition to the general concept
of shape, the slightly less general concepts of circle and triangle. The
concepts of circle and triangle are related, in the sense that they are both
shapes.
A programmer may choose to indicate these relationships through the mechanism
of derived classes:
class circle : shape { void foo1(); };
class triangle : shape { void protected: foo2(); };
where foo1 and foo2 represent the unique members of circle and triangle
respectively.
In C++, a class may have public members (which are available to clients and
derived classes), protected members (which are available only to derived
classes), and private members (which are only available to the implementing
class itself). For example:
void newfunc(circle *p, triangle *q){
p->foo1() // error: circle::foo1 is private
...
q->foo2() // error: triangle::foo2 is protected
...
};
where newfunc is a standalone function.
However, derived classes may access public or protected members of their base
class:
void circle::foo1() {
protected_move( point (0,2)); // this is ok
...
};
C++ also provides the keyword friend to designate functions which are allowed
access to private class members, but are themselves non-member functions:
class shape {
color col;
...
public:
...
friend void myfriend(shape *);
...
};
which defines a friend function myfriend, so that:
void myfriend(shape * p){
...
p->col = red; // this too is fine
...
};
C++ supports multiple inheritance, which is the ability of a derived class to
inherit the attributes of more than one base class. The judicious use of
multiple inheritance may further increase code reusability and program
flexibility.
For example, the programmer may wish to create a class which displays text on
a shape:
class composite_class : circle, display txt {

...
};
where the derived class composite_class inherits attributes from the base
class circle and a class display_ txt.
Like most flexible techniques, single or multiple inheritance can be quite
powerful when used skillfully, but can lead to messy programs when used
improperly.
As a rule, programmers must ensure that base class functions have generalized
functionality (to prevent unexpected behaviors in the context of their derived
classes), and that each derived class be provided with its own implementation
of the base function.


Polymorphism



To achieve inheritance in an elegant manner, C++ supports the concept of
polymorphism.
Polymorphism allows a programmer to have the same interface to different
objects: a consistent interface will produce different results, depending on
the type of the object.
Polymorphism is intimately associated with late-binding.
In C++, polymorphism is implemented as the combination of virtual functions
and derived classes:
class shape {
...
public:
...
virtual void draw ();
...
};
which declares a virtual function draw.
In this example, the programmer specifies that a shape be associated with a
draw function, but not how that function will be implemented.
This stands to reason, since there are many classes of shapes, each requiring
different implementations of the draw function. These derived classes will
therefore implement the behavior as part of their own methods.
During execution, the runtime environment decides which function to invoke.
The decision is based on the object type. For example:
class circle : public shape {
void draw ();
...
};

class triangle : public shape {
void draw ();
...
};

// two classes derived from shape

void myfunc (shape *p){
p->draw();
...
};

// a function which draws any shape

void anotherfunc () {
myfunc (new circle);
myfunc (new triangle);
};

// new makes an object on the heap
Polymorphism should not be confused with function overloading, which lets the
compiler determine at compile time the invocation on the basis of the function
argument types.
class display_txt {
...
public:
...
void display_func(int); // display an integer
void display_func{char*);// display a string
...
};
which declares the argument type of display_func to be an integer or a string,
and leads to:
display_txt t; // declare t

t.display_func (1);
t.display_func ("hello");
where the compiler ensures, prior to execution, that the correct version of
the display function is called.
As a rule of thumb, the later the bind, the greater the flexibility of a
program. The price for this flexibility in many object-oriented
implementations has been performance.
However, a number of studies have shown that the differences in execution
speed in C++ between a polymorphic call and a statically-bound call is quite
minimal.



Conclusion


C++ fundamentally supports the concepts of object-oriented programming --
encapsulation, inheritance, and late-binding. It also gives programmers who
wish to program at a low-level the option to do so.
Moreover, C++ is widely available. There are numerous commercial
implementations of C++ compilers, debuggers, linkers, and other programming
tools. The language itself, as is implied by its name, is compatible with
existing languages (in particular, with C) and operating systems.
Although an ANSI committee has been formed -- X3J16 -- to begin the process of
standardizing C++, the current de facto standard remains AT&T's translator,
formally known as the C++ Language System Release 2.0 for UNIX System V.
This translator does not currently support a number of features, such as
automatic storage management (which is available, however, from other sources
as an add-on facility), double dispatching, before and after methods (i.e.,
functions), and parametrized typing (sometimes called genericity), which are
associated with richer -- and slower -- object-oriented language
implementations.
Still, C++ is an evolving language. It can be expected to provide selective
degrees of support for these (and other) features over time, with genericity
and error-handling being the most likely additions in the next major release
of the official translator.
The author wishes to thank the inventor of C++, Bjarne Stroustrup of Bell
Labs, for his cooperation.




















































Object-Oriented Programming In C


David Brumbaugh


David Brumbaugh is a software developer at Advanced Information Services, Inc.
in Peoria IL. He has a B.A. in computer information science from Judson
College in Elgin, IL. He has been programming in C since 1985. He can be
contacted at Computer Express BBS, (309) 688-9789, 1220-2400 Baud, 8,1,N in
the C Language Questions folder.


C programmers have been using something like object oriented programming for
years. They called it good modularity.
The classic example of "object-oriented C" is the standard FILE structure and
its family of functions fopen, fclose, fread, fwrite, fprintf, etc. Only the
"methods" of the file object, fopen etc., access the members of FILE.
The FILE functions are examples of good, modular, manageable code. A more
accurate term for this type of programming is "structure driven".
Structure-driven programs consist of data structures and functions that
support them. The difference may only be semantic, but FILE objects don't have
any allowance for inheritance or polymorphism. Structure members and functions
that operate on them are not encapsulated into a single object.


Adding More OOPness


This article describes a technique which adds inheritance, polymorphism, and
encapsulation to the familiar structure-driven style. The steps of this
technique (listed in Table 1) are chosen to work with a particular
implementation of inheritance.
Consider the structures:
struct s1
{ int x;
int y;
};
struct s2
{ int x;
int y;
int z;
};
Suppose there is a structure of type s2 and a pointer to type s1.
struct s1 *s1p;
struct s2 s2s;
s1p = &s2s;
In almost all C compilers, s1p->x would be the same as s2s.x, and s1p->y would
be the same as s2s.y. You could say that structure s2 inherited x and y from
structure s1. Any function that expected a pointer to s1 could instead take a
pointer to s2 and could correctly address x and y and safely ignore z.
Listing 1 illustrates how to utilize this technique in an easy,
self-documenting way. By using #define to define a class, S1, and using this
definition to describe a subclass, S2, we assure that any changes to the
S1_CLASS definition are automatically reflected in its subclass S2_CLASS at
compile time.
An object is an instance of a class. In Listing 1, C's typedef permits objects
to be declared.


Coding Conventions


I observe certain conventions when writing methods for this OOP technique.
The first argument to a method is always a pointer to the object calling the
method. Many C++ translators do the same thing.
The first argument to a method is always named this, clarifying references to
the calling object.
All program code for a particular class is always in the same .c file. Methods
are given exactly the same function name as the pointers to those methods.
These functions are static, so they don't interfere with other functions of
the same name in other files.
When writing an abstract base class's methods, write functions for methods
that are defined to be subclass implemented. You may simply print a message to
the effect that the method is not available.
All constructors are named in the form new_CLASS(). The only arguments in
constructors are for initialization. The template in Listing 2 is the basis
for all constructors. If the constructor is a base class, remove all
SUPER_CLASS references from this template.
Destructors have a format that reverses the inheritance process. Destructor
names have the form destroy_CLASS(). The first, and usually only, argument is
a pointer to the object being destroyed. The second template in Listing 2 is
the general form of a destructor.


Prior Art


Eric White described another technique for writing "truly" object-oriented
programs in the February issue of The C Users Journal. There are some
differences between the technique I am suggesting and his.
This technique does not require any data structures other than those required
by the objects. There is no specific CLASS structure and no specific OBJECT
structure like in White's technique.
This technique does not require the use of any additional functions such as
White's message function.
Classes and subclasses are defined using C's #deƒine directive. Methods are
inherited from superclasses in a subclass's constructor, like White's, but no
function is required to register new methods.
There are no separate constructors and destructors for CLASS and OBJECT.
Constructors and destructors have more responsibility for inheritance and
polymorphism.

Scope is used to supply a rudimentary form of polymorphism, an issue not
directly addressed by White.
The resulting syntax of this technique is closer to C+ + than White's. Compare
the following three object-oriented methods of having a circle draw itself.
The first example is C++, the second uses White's technique, and the third
uses the technique described here.
1. circle.draw(radius);
2. message(&circle,DRAW,radius);
3. circle->draw(circle,radius);
This similarity to C++ was important to me. Most of the OOP code I have seen
in articles has been in C++, and I did not want to have to make a large mental
jump to get from C+ + to code I could use.


An Example Application


Many applications need to deal with lists. Sometimes these lists are arrays,
sometimes they are linked lists, some- times they are views of database
records. This example will develop a LIST_CLASS. The goal is to create a class
that will allow an application to have uniform access to all types of lists,
without the programmer having to concern himself with how the list is stored.
I developed this object when I needed a selector window. The selector window
is used as a menu and chooses a record from a data table. The SELECTOR object
had a LIST pointer as a member. Concrete sub-classes of ARRAY_LIST_CLASS and
PINNACLE_LIST_CLASS were both used by the SELECTOR, fulfilling the 00
requirement that a subclass can be used in place of a superclass.
I chose the Pinnacle library for two reasons. First, it is a good, modular,
"structure-driven" library. I was able to add an OO layer to it by
encapsulation. The second reason is availability. Pinnacle is a commercial
product, but a free trial disk is available from Vermont Database Corporation.
The trial diskette will suffice if you want to try these programs yourself.
In Listing 3 list.h, defines an abstract base class that more concrete classes
can inherit from. It defines the types of things you would normally do with a
list.
The list will be ordered in some way, even if it is only a physical order. The
list will have a "top" and an "end". There will be the concept of a "current"
member. The methods in Table 2 are common to all lists.
Some methods, such as seek and total_members, can be written by utilizing
other methods. Examine list.c in Listing 3. seek may not contain the most
efficient algorithm for all list types, so it can polymorphed later if
necessary. The same can be said of total_members.
From this abstract LIST_CLASS, more concrete classes need to be made. The
concrete classes must address two issues. First, how the list is implemented.
Second, what goes into the list.
What goes into the list is determined by the application. This example will be
a simple phone list with a first name, last name and phone number for each
entry. The list will be maintained in a last name, first name order. A
PHONE_ENTRY structure will be used to hold the data.
How to implement the list is a design decision. For the phone list, a database
of some sort may be a logical choice. If the list is temporary, a dynamic
array or linked list may be more practical. This is where the OO advantage
comes in. If a decision is made to change from a linked list to a database,
the changes are made within the class, not to the application.
I have chosen to use two list handling methods to illustrate this technique.
The first example uses a dynamic array to hold the elements of the list. The
second example uses the Pinnacle relational DBMS to hold the elements.
In Listing 4, arrylist.h and arrylist.c show how a dynamic array can be used
to implement the LIST_CLASS. Two new attributes are added to the LIST_CLASS.
The curr attribute is the index of the current member, and tot_members is the
number of members in the array. The default methods for total_members and seek
were not efficient so these methods are polymorphed.
In Listing 5 phlist1.h adds a PHONE_ENTRY pointer to hold the list and a sort
method to maintain the list in a sorted order. In phlist.c all methods not
previously written are completed.
pinlist.h and pinlist.c in Listing 6 show how a database package can be used
to make LIST_CLASS more concrete. Pinnacle is built around the relational
model, so a list is defined to be from one table in a database. Column use is
application dependent and will be specified in the more concrete subclass.
phlist2.h in Listing 7 defines which columns from the table will be used. A
working buffer is also added for convenience. In phlist2.c the constructor
defines which table and database to use. I used the sample database provided
with Pinnacle.
Listing 8, testlist.c, is the same for both implementations. The output
generated by this program is also identical in both cases.
I used Turbo C 2.0 to test the code for this article. To keep some prototyping
I turned off the ANSI warning "Suspicious pointer conversion." I could then
assign any type of pointer to any other type of pointer and still keep some
prototyping. To use this technique on a different compiler, you may have to
remove the prototypes entirely.


Conclusion


The technique presented here is not intended to replace object-oriented
languages such as C++. Instead, the technique is intended to allow a
programmer to experiment with object-oriented techniques without having to
invest in learning a new language.
There are a number of advantages to using this technique to write programs. It
allows you to realize most of the benefits associated with OOP. Code and data
are changed in an organized way through inheritance. Data encapsulation makes
the internals of the objects transparent. Maintenance and enhancement are
easier because of this encapsulation.
There are some advantages to using this technique over C++ or some other OOL.
There are enough differences between C++ and C to make many existing libraries
useless. I have seen examples of library functions having names that are C++
reserved words. I have had C++ translators change the names of the functions I
call and cause link errors with outside libraries. With this technique, I can
use an object-oriented style and still have full access to all the C functions
currently in use. This is a significant advantage if one considers the
thousands of dollars and man-hours invested in existing libraries.
Changing to any OOP technique carries some disadvantages. There is more
overhead when using a derived class because all the code from its superclasses
needs to be linked into the program, even when overridden. Bugs are inherited
with features. If a superclass contains a design bug, or an inherited
technique contains a bug, the whole class hierarchy is affected, complicating
the debugging of large class hierarchies.
There are some disadvantages to using this technique over C++. Some
prototyping may be sacrificed in inheritance, especially if you change the
prototype from one class level to the next. C++ constructors are more
automatic.
Finally, this technique isn't strictly portable because it assumes that
identical members within different structures are arranged uniformly.
Officially, the arrangement of members within a structure is implementation
dependent. However, the arrangement assumed here is the rule, rather than the
exception, and I have successfully tested the technique with two MS-DOS
compilers and one UNIX compiler.


Acknowledgements


I would like to thank Bob Pauwels and Mike Yocum of Advanced Information
Services for their input to this article.
Table 1
For any base class:

 1. Use the preprocessor #define to define the class.

 2. Use the class and typedef to define an object.

 3. Write methods common to the entire class hierarchy. A
 method must take a pointer to the object calling it as
 the first parameter.

 4. Write the constructor.

 a. Allocate memory for the object.


 b. Assign methods by setting function pointers to the
 methods written in step 3.

 c. Initialize attributes. This includes the allocation of
 additional memory, the opening of files, etc.

 5. Write the destructor.

 a. Close any files opened in constructor.

 b. Free any additional memory allocated by constructor.

 c. Free this object.


For any subclass:

 1. Use superclass's definition, then add new methods and
 attributes.

 2. Write any methods that are different from the
 superclass's method.

 3. Write the constructor.

 a. Call the superclass's constructor.

 b. Allocate memory for this object.

 c. Copy the superclass to this object.

 d. Free the memory used by the superclass. Use free(), not the
 destructor.

 e. Assign the new/different methods written in step 2.

 f. Initialize attributes. This includes the allocation of
 additional memory, the opening of files, etc.

 4. Write the destructor.

 a. Close any files specific to THIS object.

 b. Free only memory specific to THIS object.

 c. Call the superclass's destructor.

Table 2
METHOD PURPOSE
------------------------------------------------------------------------
at_top Return TRUE if current member is top member.

at_end Return TRUE if current member is last member.

is_empty Return TRUE if LIST is empty, FALSE otherwise.

find Search the list for an implementation defined member.
 If not found don't change currency.


prev Make the member previous to this one current. If current
 member is top, do nothing.

next Make the member after this one current. If current
 member is last, do nothing.

seek Search to a position in the list. Use like fseek.

top Make the top member current.

end Make the last member current.

display Display the current member.

add_member Add a new member to the list.

replace_member Replace data in current member.

current Return a pointer to the current member.

total_members Return the total number of members int the list.

tell Return the position, from the start of the list of
 the current member. The top member is 0.

Listing 1
#define S1_CLASS int x; \
int y; \
int (*read_x)(); \
int (*read_y)();

typedef struct s1 {
S1_CLASS
} S1;

#define S2_CLASS S1_CLASS \
int z; \
int (*read_z)(); \

typedef struct s2 {
S2_CLASS
} S2;


Listing 2
CLASS *new_CLASS() {
SUPER_CLASS *s;
CLASS *this;

/* Construct super class */
s = new_SUPER_CLASS();

/* Allocate memory for this object */
this = calloc(1,sizeof(CLASS));

/* Inherit everything you can from the superclass */
memmove(this,s,sizeof(CLASS);

/* We're done with the superclass's memory */

free(s);

/* Assign methods to object */
this->f1 = f1;

/* Inialize attributes here. Open files, allocate etc.*/

return(this);
}

void destroy_CLASS(CLASS *this) {
/* Free any specific data: */
free(this->p);

/* Close any files specific to this class: */
fclose(this->file);

/* Call the superclass's destructor */
destroy_SUPER_CLASS(this);

}

void destroy_SUPER_CLASS(SUPER_CLASS *this) {
free(this);
}


Listing 3
/* -------------------- List.H------------------ */
#include <stdio.h>
#include <stdlib.h>

#define LIST_CLASS unsigned (*at_top)(struct list*), \
(*at_end)(struct list*), \
(*is_empty)(struct list*), \
(*find)(struct list *, ...); \
void (*prev)(struct list*), \
(*next)(struct list *), \
(*top)(struct list *), \
(*seek)(struct list *, long, int), \
(*end)(struct list *), \
(*display)(struct list*), \
(*add_member)(struct list*, void *), \
(*replace_member)(struct list *, void *), \
void *(*current)(struct list *); \
long (*total_members)(struct list *), \
(*tell)(struct list *);

typedef struct list {
LIST_CLASS
} LIST;

LIST *new_list();
destroy_list(LIST *);

#define TRUE 1
#define FALSE 0

/* -------------------- LIST.C ------------------*/

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include "list.h"

static void not_valid() {
fprintf(stderr,"Operation is not valid for this list\n");
getch();
}

static unsigned is_empty(LIST *this) {
return( this->total_members(this) == 0L);
}

static void seek(LIST *this, long where, int start) {
long count;

switch(start) {
case SEEK_SET:
this->top(this);
for (count = 0; count < where; ++count) {
if ( this->at_end(this) )
break;
this->next(this);
}
break;
case SEEK_CUR:
if (where > 0) {
for (count = 0; count < where; ++count) {
if ( this->at_end(this) )
break;
this->next(this);
}
}
else {
for(count = 0; count > where; ++count) {
if (this->at_top(this) )
break;
this->prev(this);
}
}
break;
case SEEK_END:

this->end(this);
for(count = 0; count > where; ++count) {
if ( this->at_top(this) )
break;
this->prev(this);
}
break;
}
}

static long total_members(LIST *this)
{
long thisone, count;
thisone = this->tell(this); this->top(this);
count = 0;

do {
if ( ! this->at_end(this) ) {
++count;
this->next(this);
}
} while( ! this->at_end(this) );
this->seek{ this, thisone,SEEK_SET);
return(count);
}

LIST *new_list() {
LIST *this;

/* Allocate Memory for this Object */
this = calloc(1,sizeof(LIST));
if (this == NULL)
return(NULL);

/* Assign Methods */
this->at_top = not_valid; this->at_end = not_valid;
this->is_empty = is_empty; this->find = not_valid;
this->prev = not_valid; this->next = not_valid;
this->seek = seek; this->top = not_valid;
this->end = not_valid; this->display = not_valid;
this->replace_member = not_valid;
this->add_member = not_valid;
this->current = not_valid;
this->total_members = total_members;
this->tell = not_valid;

return(this);
}

destroy_list{LIST *this) {
free(this);
}


Listing 4
#include "list.h"
#include <alloc.h>
#define ARRAY_LIST_CLASS LIST_CLASS \
/* Index of current member */ int curr; \
/* Total members in array */ int tot_members;

typedef struct array_list {
ARRAY_LIST_CLASS
} ARRAY_LIST;

ARRAY_LIST *new_array_list();
destroy_array_list(ARRAY_LIST *);

/*------------------ ARRYLIST.C ------------------------*/

#include "arrylist.h"

static long total_members(ARRAY_LIST *this) {
return((long) this->tot_members);
}


static unsigned at_top(ARRAY_LIST *this) {
return(this->curr == 0);
}

static unsigned at_end(ARRAY_LIST *this) {
return(this->curr == this->tot_members);
}

static void prev(ARRAY_LIST *this) {
if (this->curr > 0)
--(this->curr);
}
static void next(ARRAY_LIST *this) {
if (this->curr < (this->tot_members))
++(this->curr);
}

static void seek(ARRAY_LIST *this, long where, int from) {
switch(from) {
case SEEK_SET:
if (where < this->tot_members)
this->curr = (int) where;
break;
case SEEK_CUR:
if (where > 0) {
if ( (this->curr + (int) where) <
this->tot_members ) {
this->curr += (int) where;
}
}
else {
if ((this->curr - (int) where) > 0) {
this->curr -= (int) where;
}
}
break;
case SEEK_END:
if (where <= this->tot_members) {
this->curr = this->tot_members - (int) where;
}
break;
}
}
static void top(ARRAY_LIST *this) {
this->curr = 0;
}
static void end(ARRAY_LIST *this) {
this->curr = this->tot_members - 1;
}
static long tell(ARRAY_LIST *this) {
return(this->curr);


}
ARRAY_LIST *new_array_list(void) {
ARRAY_LIST *this;
LIST *1;


l = new list();
if (l == NULL)
return(NULL);

this = calloc(l,sizeof(ARRAY_LIST));
if (this == NULL) {
destroy list(l);
return(NULL);
}
memmove(this,l,sizeof(LIST));
free(l);

this->total_members = total_members;
this->at_top = at_top; this->at_end = at_end;
this->prev = prev; this->next = next;
this->seek = seek; this->top = top;
this->end = end; this->tell = tell;
return(this);
}

destroy_array_list(ARRAY_LIST *this) {
destroy_list(this);
}


Listing 5
/* ----------------- PHLIST1.H ---------------*/
#include "arrylist.h"

typedef struct phone_entry {
char last_name[21], first_name[11], phone_no[14];
} PHONE_ENTRY;

#define PHONE_LIST_CLASS ARRAY_LIST_CLASS \
PHONE_ENTRY *data; \
void (*sort)(struct phone_list *);

typedef struct phone_list {
PHONE_LIST_CLASS
} PHONE_LIST;

PHONE_LIST *new_phone_list();
void destroy_phone_list(PHONE_LIST *);

/* ------------------ PHLIST1.C ---------------- */
#include "phlist1.h"
#include <string.h>
#include <conio.h>
#include <stdlib.h>

static void phone_list_memory_error(char *fun) {
fprintf(stderr,
"\nMemory Error in Function %s <Press a Key>\n", fun);
getch(); exit(1);
}

static unsigned find(PHONE_LIST *this,char *srch_last_name) {
PHONE_ENTRY *pe;
int orig;


orig = this->curr;
while(! this->at_end(this)) {
pe = this->current(this);
if ( stricmp(pe->last_name,srch_last_name) == 0)
return(TRUE);
else if (stricmp(pe->last_name,srch_last_name) > 0) {
this->curr = orig;
return(FALSE);
}
else
this->next(this);
}
pe = this->current(this);
if ( stricmp(pe->last_name,srch_last_name) == 0)
return(TRUE);
this->curr = orig;
return(FALSE);
}

static display(PHONE_LIST *this) {
PHONE_ENTRY *pe;

pe = this->current(this);
if (pe ! = NULL) {
printf("%-20s, %-10s - %-13s\n", pe->last_name,
pe->first_name, pe->phone_no);
}
}

static void add_member(PHONE_LIST *this, PHONE_ENTRY *pe) {
this->data = realloc(this->data,
sizeof(PHONE_ENTRY) * (this->tot_members + 1));

if (this->data == NULL)
phone_list_memory_error("phone_list: add_member");

memmove(this->data + this->tot_members, pe,
sizeof(PHONE_ENTRY));
++(this->tot_members);
this->sort(this);
}

static void replace_member(PHONE_LIST *this,
PHONE_ENTRY *pe) {
if (this->data != NULL)
memmove(this->data + this->curr, pe,sizeof(PHONE_ENTRY));
}

static PHONE_LIST *current(PHONE_LIST *this) {
if (! this->at_end(this) && this->data != NULL)
return(this->data + this->curr);
else
return(NULL);
}

static int pe_comp(PHONE_ENTRY *pe1, PHONE_ENTRY *pe2) {
int ret;
ret = stricmp(pe1->last_name, pe2->last_name);

if (ret == 0)
return(stricmp(pe1->first_name, pe2->first_name));
return(ret);
}

static sort(PHONE_LIST *this) {
qsort(this->data, (size_t) this->tot_members,
sizeof(PHONE_ENTRY), pe_comp);
}

PHONE_LIST *new_phone_list() {
ARRAY_LIST *al;
PHONE_LIST *this;

al = new_array_list();
if (al == NULL)
return(NULL);

this = calloc(1,sizeof(PHONE_LIST));
if (this == NULL) {
destroy_array_list(al);
return(NULL);
}
memmove(this,al,sizeof(ARRAY_LIST));
free(al);

this->find = find; this->display = display;
this->add_member = add_member;
this->replace_member = replace_member;
this->current = current; this->sort = sort;
return(this);
}

void destroy_phone_list(PHONE_LIST *this) {
if (this->data)
free(this->data);
destroy_array_list(this);
}


Listing 6
/* ----------------- PINLIST.H-------------------- */
#include "pinnacle.h"
#include "list.h"

#define PINNACLE_LIST_CLASS LIST_CLASS \
/* Pinnacle Database Object */ DB db; \
/*Pinnacle Database Table */ DBTAB table; \
/* Boolean flags*/ unsigned is_at_top, is_at_bottom;

typedef struct pinnacle_list {
PINNACLE_LIST_CLASS
} PINNACLE_LIST;

PINNACLE_LIST *new_pinnacle_list(char *database,
char *table);

/*------------------ PINLIST.C ------------------ */
#include "pinlist.h"


static long total_members(PINNACLE_LIST *this) {
return((long) DB_CountRows(this->table));
}

static unsigned at_top(PINNACLE_LIST *this) {
return(this->is_at_top);
}

static unsigned at_end(PINNACLE_LIST *this) {
return(this->is_at_bottom);
}

static void prev(PINNACLE_LIST *this) {
if (DB_NextRow(this->table, DBPREVIOUS))
this->is_at_top = FALSE;
else
this->is_at_top = TRUE;
if (this->total_members(this) > 1)
this->is_at_bottom = FALSE;
}

static void next(PINNACLE_LIST *this) {
if (DB_NextRow(this->table, DBNEXT))
this->is_at_bottom = FALSE;
if (this->total_members(this) > 1)
this->is at bottom = TRUE;
this->is_at_top = FALSE;
}

static void top(PINNACLE_LIST *this) {
DB_FirstRow(this->table);
DB_NextRow(this->table,DBNEXT);
this->is_at_top = TRUE;
if (this->total_members(this) > 1)
this->is_at_bottom = FALSE;
}

static void end(PINNACLE_LIST *this) {
DB_ForAllRows(this->table);
this->is_at_bottom = TRUE;
if (this->total_members(this) > 1)
this->is_at_top = FALSE;
}

static long tell(PINNACLE_LIST *this) {
DBROWID thisrow, checkrow;
long position = 0L;

thisrow = DB_CurrentRow(this->table);
this->top(this);
checkrow = DB_CurrentRow(this->table);
while(checkrow != thisrow) {
++position;

DB_NextRow(this->table,DBNEXT);
checkrow = DB_CurrentRow(this->table);
}
return(position);

}

PINNACLE_LIST *new_pinnacle_list(char *datab,char *table) {
PINNACLE_LIST *this;
LIST *l;

l = new_list();
if (l == NULL)
return(NULL);

this = calloc(1,sizeof(PINNACLE_LIST));
if (this == NULL) {
destroy_list(l); return(NULL);
}
memmove(this,l,sizeof(LIST)); free(l);

this->db = DB_Open(datab,"rw",0);
this->table = DB_Table(this->db,table);
this->total_members = total_members;
this->at_top = at_top; this->at_end = at_end;
this->prev = prev; this->next = next;
this->top = top; this->end = end; this->tell = tell;

return(this);
}

destroy_pinnacle_list(PINNACLE_LIST *this) {
DB_Close(this->db); destroy_list(this);
}


Listing 7
/* --------------------- PHLIST2.H --------------------- */
#include "pinlist.h"

typedef struct phone_entry {
char last_name[21], first_name[11], phone_no[14];

} PHONE_ENTRY;

#define PHONE_LIST_CLASS PINNACLE_LIST_CLASS \
PHONE_ENTRY pe; \
DBCOL last, first, phone, lastfirst;

typedef struct phone_list {
PHONE_LIST_CLASS
} PHONE_LIST;

PHONE_LIST *new_phone_list();
void destroy_phone_list(PHONE_LIST *);
/* ---------------- PHLIST2.C ------------------------ */
#include "phlist2.h"
#include <string.h>
#include <conio.h>
#include <stdlib.h>

static void phone_list_memory_error(char *fun) {
fprintf{stderr,
"\nMemory Error in Function %s <Press a Key>\n", fun);

getch(); exit(1);
}

static unsigned find(PHONE_LIST *this, char *srch_l_name) {
DBSEARCH sobj; unsigned found;

sobj = DB_SearchObject(this->db, String, srch_l_name, "==");

found = DB_FindNext(this->last,sobj,DBNEXT);
DB_Free(sobj);
return(found);
}

static display(PHONE_LIST *this) {
strcpy(this->pe.last_name,DB_GetString(this->last));
strcpy(this->pe.first_name,DB_GetString(this->first));
strcpy(this->pe.phone_no,DB_GetString(this->phone));
printf("%-20s, %-10s - %-13s\n",this->pe.last_name,
this->pe.first_name, this->pe.phone_no);
}

static void add_member(PHONE_LIST *this, PHONE_ENTRY *pe) {
DB_AddRow)(this->table);
DB_PutString(this->last,pe->last_name);
DB_PutString(this->first,pe->first_name);
DB_PutString(this->phone,pe->phone_no);
}

static void replace_member(PHONE_LIST *this,
PHONE_ENTRY *pe) {
DB_PutString(this->last,pe->last_name);
DB_PutString(this->first,pe->first_name);
DB_PutString(this->phone,pe->phone_no);
}

static PHONE_ENTRY *current(PHONE_LIST *this) {
strcpy(this->pe.last_name,DB_GetString(this->last));
strcpy(this->pe.first_name,DB_GetString(this->first));
strcpy(this->pe.phone_no,DB_GetString(this->phone));
return(&(this->pe));
}

PHONE_LIST *new_phone_list() {
PINNACLE_LIST *pl; PHONE_LIST *this;

pl = new_pinnacle_list("fonelist.db","PhoneList");
if (pl == NULL)

return(NULL);

this = calloc(1,sizeof(PHONE_LIST));
if (this == NULL) {
destroy_pinnacle_list(pl);
return(NULL);
}

memmove(this,pl,sizeof(PINNACLE_LIST));
free(pl);


this->last = DB_Column(this->table,"Last");
this->first = DB_Column(this->table,"First");
this->phone = DB_Column(this->table,"Phone");
this->lastfirst = DB_Column(this->table,"LastFirst");
DB_OrderBy(this->lastfirst);
this->find = find; this->display = display;
this->add_member = add_member;
this->replace_member = replace_member;
this->current = current;
return(this);
}

void destroy_phone_list(PHONE_LIST *this) {
destroy_pinnacle_list(this);
}


Listing 8
/************************************************************
Testlist.c - Program to test the list object using a phone
list as an example.
************************************************************/
#include "phlist2.h"

static PHONE_ENTRY test_data[] = {
{"Able","Ben","456-7890"},{"Smith","John","456-0987"},
{"Kirk","Jim","622-1701"},{"Picard","Jon L.","622-1701"},
{"Jones","Cyrano","874-2253"}
};

static PHONE_ENTRY jane = {"Smith","Jane","123/456-0987"};

main()
{
PHONE_LIST *pe;
int x;

pe = new_phone_list();
for (x = 0; x < 5; ++x)
pe->add_member(pe,&test_data[x]);
printf("\nTesting Phone List:\n");
pe->top(pe);

while( ! pe->at_end(pe) ) {
pe->display(pe); pe->next(pe);
}
printf("\n Finding - Kirk \n");
pe->top(pe);
if (pe->find(pe,"Kirk") == TRUE)
pe->display(pe);
printf("\n Trying to find McCoy \n");
if (pe->find(pe,"McCoy") == FALSE)
printf("\nMcCoy not found\n");
printf("Current Member is :\n");
pe->display(pe);
printf("Replace John Smith with Jane\n");
pe->top(pe);
if (pe->find(pe,"Smith") == TRUE) {
pe->replace_member(pe,&jane);

}
else {
printf("Not Found (Strike a Key)\n"); getch();
}
printf("\nRedisplaying phone list:\n");
pe->top(pe);
while( ! pe->at_end(pe) ) {
pe->display(pe); pe->next(pe);
}
pe->end(pe);
printf("Total members = %ld\n",pe->total_members(pe));
printf("Current member = %ld\n",pe->tell(pe));
}


















































OS/2 Anonymous Pipes


Bob Withers


Bob Withers has a BS in computer science from Oakland University in Rochester,
MI and has been active in data processing for 20 years. He began programming
for micros in 1985 and has focused increasingly on C and 0S/2. Currently, he
works for Texas Instruments as a member of its Group Technical Staff. He can
be contacted at 649 Meadowbrook St., Allen, TX 75002.


In this article I'll discuss the anonymous pipe interprocess communication
(IPC) feature of OS/2 and present a short example program to demonstrate its
use. Anonymous pipes allow stream communication to occur between two related
processes. Anonymous pipes are different from named pipes because anonymous
pipes are identified solely by file handles. An OS/2 anonymous pipe can only
be used for communication between two processes which share the same command
subtree. In other words, the anonymous pipe handles must be inherited from a
parent process.
An anonymous pipe is created via the DosMakePipe() API function, which accepts
pointers to two file handles and returns handles for reading and writing to
the pipe. You can pass DosMakePipe() an additional, third parameter to specify
the pipe's size in bytes. OS/2 implements the pipe as a circular buffer. When
a process writes to the pipe, the data is placed in the buffer. If the buffer
becomes full, the writing process is blocked until data is removed from the
buffer via the read handle.
Since anonymous pipes are identified only by file handles -- handles which
must be inherited by a child process -- it's reasonable to ask how the child
process knows which handles to use. The answer is that unless some other form
of communication has occurred between parent and child, it doesn't. In fact,
the child doesn't even know it has inherited file handles. Though a parent
process can use several methods to advise a child of the pipe's handles, it's
more common to leave the child process in the dark, so to speak. In general, a
parent redirects a child's STDIN, STDOUT, and/or STDERR streams to an
anonymous pipe, altering the normal processing of the child without its ever
knowing. This is how the OS/2 command shell, CMD.EXE, supports command line
redirection and is the approach taken in my sample program ROUTEMSG.


The Sample Program


Listing 1 shows the source code for a short OS/2 utility named ROUTEMSG, and
Listing 2 shows the makefile used to compile and link it. The purpose of
ROUTEMSG is to capture the output of a child process and write it to an ASCII
file, as well as display it on the video screen. ROUTEMSG executes a child
process and redirects its STDOUT and STDERR streams to an anonymous pipe. I
find this utility very handy for running large makefiles. All compiler
messages are captured to an ASCII file, leaving me free to switch to another
screen group while the compile(s) run in the background. After the make is
complete, I can check the ASCII file for error messages and make corrections.
Having the compiler output also appear on the video display allows me to
switch back to the session running the compile to monitor its progress.
Following is the command line syntax for running ROUTEMSG:
ROUTEMSG <filename> <pgmname> [<args> ...]
where
filename - the name of the ASCII file used to capture the child's output
pgmname - the name of the program to run
args - optional command line arguments used by <pgmname>


ROUTEMSG Details


Letters (A) through (M) below expand on comments referenced in Listing 1.
(A) The program begins by checking receipt of the minimum required command
line arguments, namely the <filename> and <pgmname> arguments.
(B) The OS/2 DosOpen() API function opens <filename> as output and instructs
the operating system to either create it or truncate any existing file. Note
that the NOINHERIT flag is explicitly set to prevent the child process from
inheriting this file handle, thereby maintaining the default number of handles
available to the child. Allowing the child to inherit unneeded handles could
cause the child process to fail due to a lack of "real" handles for its own
use.
(C) The DosMakePipe() API creates the anonymous pipe and returns read and
write handles in the fh_piperead and fh_pipewrite variables respectively.
(D) The STDOUT and STDERR handles inherited from the parent process, most
likely the command shell CMD.EXE, are not needed and are closed.
(E) Handles for the STDOUT and STDERR processes are redirected to the pipe's
write handle via the DosDupHandle API function. The DosDupHandle() service
creates a new handle by which an existing stream may be referenced. The second
parameter points to a variable of type HFILE (Handle of a FILE) which contains
the newly assigned file handle. If this variable is initialized to OxFFFF,
OS/2 selects an unused handle and assigns it. Otherwise, as in the sample, the
initialized value is used as the new file handle. I use this mechanism to
force DosDupHandle() to reassign the pipe write handle to the STDOUT and
STDERR handles.
(F) I use the DosSetFHandState API to modify the pipe read handle so that the
child process doesn't inherit it. The reasoning is the same as described in
item (B).
(G) Since ROUTEMSG does not intend to write to the pipe, the handle obtained
for this purpose can now be closed. The pipe won't be destroyed, however,
since there are still open handles which reference it, namely STDOUT, STDERR,
and fh_piperead.
(H) The child program name is formatted, if necessary, to include a .EXE
extension since the OS/2 API used to create child processes requires filename
extensions.
(I) The command line to be passed to the child process is formatted. OS/2
requires the program name followed by a NUL byte and then followed by the
program's command line arguments -- each separated by one or more spaces and
terminated by two NULL bytes.
(J) The DosExecPgm API function now invokes the child program. This call
requests asynchronous execution of the child and parent processes. By passing
a NULL pointer for the environment strings, I instruct OS/2 to use a copy of
the current processes environment for the child.
(K) The "pipe write" file handles redirected earlier to STDOUT and STDERR are
no longer needed (they have already been inherited by the child) and are
closed. This is more than simply "clean-up". The read handle to the pipe will
not be notified that the pipe has no writers until all write handles are
closed. Leaving these two handles open in our process will cause ROUTEMSG to
block forever waiting for something to be written to the pipe.
(L) We finally get to the meat of the program -- so far everything has been
initialization. This while loop reads data from the pipe and directs it to
both the screen and the output file. I use the VioWrtTTY function to display
data on the video screen and DosWrite to update the output file. The loop
terminates if the DosRead function returns zero bytes read or if 0S/2 detects
an error reading the pipe. The significance of zero bytes being returned by
DosRead() is that the pipe is empty and no write handles exist.
(M) The remaining open handles are closed and the program is terminated.


Summary


I've presented a brief introduction to 0S/2 anonymous pipes and shown you
where this form of IPC is most useful -- modifying the standard stream
processing of a child process. I've tried to point out some aspects of
anonymous pipes that aren't immediately obvious after a casual reading of the
0S/2 documentation. In addition, I hope the ROUTEMSG utility is a useful
addition to the tool-chests of other 0S/2 developers.

Listing 1
/**********************************************************************/
/* */
/* ROUTEMSG.C - OS/2 utility to invoke a child process and route its STDOUT */
/* and STRERR streams to both the video display as well as an */
/* ASCII file. */
/* */
/**********************************************************************/

/* Modification Log */
/**********************************************************************/
/* --Date-- ----Programmer---- -------------Comments-------------------*/
/* */
/* 06/10/89 Bob Withers Program originally complete */
/* */
/**********************************************************************/

#pragma check_stack(off)

#define INCL_DOS
#define INCL_VIO

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdarg.h>
#include <os2.h>

#define STDOUT_HANDLE 1
#define STDERR_HANDLE 2
#define STATIC static

#ifndef FILE_NORMAL
#define FILE_NORMAL 0x0000
#define FILE_CREATE 0x0010
#define FILE_TRUNCATE 0x0002
#define OPEN_ACCESS_WRITEONLY 0x0001
#define OPEN_SHARE_DENYWRITE 0x0020
#define OPEN_FLAGS_NOINHERIT 0x0080
#endif

STATIC VOID NEAR_CDECL ErrorMsg(CHAR *str, ...);

static HFILE fh_stdout = STDOUT_HANDLE;
static HFILE fh_stderr = STDERR_HANDLE;
static CHAR acBuffer[1024];
static CHAR szPgmName[128];

INT_CDECL main(INT argc, CHAR **argv)
{
register SHORT i;
register CHAR *p;
auto SHORT sLen;
auto HFILE fh_output;
auto HFILE fh_piperead, fh_pipewrite;
auto USHORT usRetCode, usAction;
auto RESULTCODES rc;
auto CHAR szFailName[64];

/* (A) verify we have the minimum number of command line
arguments */
if (argc < 3)
{
ErrorMsg("Usage: %s <filename> <pgmname> [<args> ...]\n", argv[0]);
DosExit(EXIT_PROCESS, 1);
}

/* (B) open the output file, truncate it if it exists otherwise create */

usRetCode = DosOpen(argv[1], &fh_output,
&usAction, OL,
FILE_NORMAL,
FILE_CREATE FILE_TRUNCATE,
OPEN_ACCESS_WRITEONLY OPEN_SHARE_DENYWRITE
 OPEN_FLAGS_NOINHERIT,
OL);
if (usRetCode)
{
ErrorMsg("Unable to open file %s (Error status = %u)\n",
argv[1], usRetCode);
DosExit(EXIT_PROCESS, 2);
}

/* (C) create the pipe to be inherited by the child process */
usRetCode = DosMakePipe(&fh_piperead, &fh_pipewrite, 0);
if (usRetCode)
{
DosClose(fh_output);
ErrorMsg("Unable to create pipe (Error status = %u)\n", usRetCode);
DosExit(EXIT_PROCESS, 3);
}

/* (D) no longer need original STDOUT and STDERR so close them */
DosClose(fh_stdout);
DosClose(fh_stderr);

/* (E) redirect STDOUT and STDERR to the pipe's write file handle */
DosDupHandle(fh_pipewrite, &fh_stdout);
DosDupHandle(fh_pipewrite, &fh_stderr);

/* (F) set pipe read handle to NOINHERIT so that it will not reduce the */
/* number of handles available to the child process */
DosSetFHandState(fh_piperead, OPEN_FLAGS_NOINHERIT);

/* (G) the pipe write handle has been redirected, no longer needed */
DosClose(fh_pipewrite);

/* (H) format the program name to execute as the child process */
if (NULL == strchr(strupr(strcpy(szPgmName, argv[2])), '.'))
strcat(szPgmName, ".EXE");

/* (I) format the command line to be passed to the child process */
strcpy(acBuffer, argv[2]);
p = acBuffer + strlen(acBuffer) + 1;
for (i = 3; i < argc; ++i)
{
sLen = strlen(argv[i]);
memcpy(p, argv[i], sLen);
p += sLen;
*p++ = ' ';
}
*p++ = '\0';
*p = '\0';

/* (J) try to execute the child process */
usRetCode = DosExecPgm(szFailName, sizeof(szFailName),
EXEC_ASYNC,
acBuffer,

NULL,
&rc,
szPgmName);
if (usRetCode)
{
ErrorMsg("Unable to bid child %s (Error status = %u [%s])\n",
szPgmName, usRetCode, szFailName);
DosExit(EXIT_PROCESS, 4);
}

/* (K) no longer need the pipe write handles so close them now */
DosClose(fh_stdout);
DosClose(fh_stderr);

/* (L) read data from the pipe and route it to screen and file */
while (TRUE)
{
auto USHORT usBytesRead, usBytesWritten;
static BOOL bError = FALSE;

usRetCode = DosRead(fh_piperead, acBuffer,
sizeof(acBuffer), &usBytesRead);
if (usRetCode 0 == usBytesRead)
break;

VioWrtTTY(acBuffer, usBytesRead, 0);
if (!bError)
{
DosWrite(fh_output, acBuffer, usBytesRead, &usBytesWritten);
if (usBytesWritten 1= usBytesRead)
{
ErrorMsg("Unable to write to output file, disk full?\n");
bError = TRUE;
}
}
}

/* (M) close the pipe read handle and the output file handle */
DosClose(fh_piperead);
DosClose(fh_output);
return(0);
}

STATIC VOID NEAR_CDECL ErrorMsg(CHAR *str, ...)
{
auto va_list va;

va_start(va, str);
VioWrtTTY(acBuffer, vsprintf(acBuffer, str, va), 0);
va_end(va);
return;
}


Listing 2
pgm = ROUTEMSG
model = S
compsw = /A$(model) /Ox /G2s
linksw = /A:16


$(pgm).OBJ : $*.C
cl /c /FPa /W3 ${compsw) $*.C

$(pgm).EXE : $*.OBJ
link $(linksw) $*,S*,S*,0S2;

























































Virtual Memory For 640K DOS


Walter Bright


Walter is the director of the compiler development division at Zortech Ltd. He
has a degree from Cal Tech and can be reached at 4819 118th Ave. N.E.,
Kirland, WA 98033.


[Ed. Note: Zortech's newest release v21 includes a virtual code manager (VCM)
which allows even 8088-based machines to ignore the 640K barrier. Here the
implementor, Walter Bright, describes how the technology works.]
As time goes by, programs tend to steadily increase in size and complexity.
Customers want more features, unanticipated problems require more code to
solve, programming in a high-level language results in larger programs than
assembler. As code size grows, so does the amount of data memory needed.
Pretty soon you start bumping up against the notorious "640K barrier" that all
MS-DOS programmers have learned to love.
The solutions available are:
Recode to reduce code size, or "How I learned to stop worrying and love
assembler language". This alternative may produce modest decreases in program
size, but is very costly in terms of schedule slides.
Stop adding features. Yeah, right!
Port to OS/2 -- if you and all your customers can afford the hardware and
software costs.
Use a DOS Extender. This approach works well, and is cheaper than the OS/2
route, but requires a 286 or better, with lots of extended memory.
Use overlays. This traditional technique involves swapping code off disk
instead of keeping it resident in memory all the time. Overlays will be
discussed shortly.
Use VCM (Virtual Code Manager). VCM is a technique whereby virtual memory can
be simulated, even on lowly 8088-based machines. VCM, what it is and how it
works, is the primary focus of this article.


Tradional Overlay Methods


Overlay schemes work by dividing a program's code into a root segment and
various overlay segments. The root segment is always resident in memory. The
overlay segments are placed into a reserved section of memory, called the
overlay region. An overlay is loaded only when the program calls a function in
that overlay. When an overlay is loaded, it replaces any existing overlay in
the overlay region. The size of the overlay region is the size of the largest
overlay segment. The layout in memory of a typical program with three overlays
is shown in Figure 1. Since Overlay C is the largest, it determines the size
of the overlay region.
The linker sets up overlays. A command to the linker to set up the overlays in
Figure 1 is something like:
LINK
rootl+root2+(ovla)+(ovlb)+(ovlc),prog;
The linker replaces all calls to functions in the overlays with calls to the
overlay manager, which loads the appropriate overlay into the overlay region
before jumping to the called function.
The overlay process begins to break down when more than a few overlays are
needed. It seems that every function called is in a different overlay, and
that overlays therefore always need to be loaded in from disk. This is a
condition known as "thrashing", and results in terrible performance. For a
simplistic example, imagine the following code:
for (i = 0; i < 10; i++)
// in overlay A
{ funcb(); // in overlay B
funcc(); // in overlay c
}
Running this code will cause the disk drive light to come on, and stay on. One
time through the loop will cause overlay A to be loaded two times, and overlay
B and C to be overloaded once each. That seems rather silly when lots of
memory might be available.
More sophisticated overlay linkers try to deal with this problem by allowing
overlay 'hierarchies', that is, the overlay structure looks like an inverted
tree (see Figure 2). Here, overlay B can be in memory at the same time as
overlay B1, and B at the same time as B2, and C at the same time as C2. But no
other simultaneous combinations are possible. Some implementations don't even
allow calls between leaves, that is, B1 cannot call C1. Most programs simply
don't decompose into such simple trees.
It is important to remember that the overlays are statically located by the
linker. They are loaded on demand into a fixed location in the program,
regardless of what other memory is available. The program is organized into an
overlay structure at compile/link time. There is no flexibility based on user
usage patterns or memory available at runtime.
What's needed is a scheme with the following capabilities:
Overlay segments are loaded into whatever free memory might be available.
As the demand for data memory increases (via calls to malloc), the overlay
manager discards overlay segments from memory, using a least recently used
(LRU) algorithm.
Decent performance on both 640K XT and AT machines.
Requires no special attention from programmers.
Works with pointers to functions (necessary to work with virtual function in
C++).
Works with debuggers.
VCM is a solution that meets these requirements. Instead of a fixed overlay
region, when the VCM manager needs memory to load an overlay segment (or vseg,
virtual code segment), it calls malloc() to get the memory. The vseg is then
loaded from disk into this region. When malloc() runs out of free memory, it
calls the VCM manager, which discards vsegs from memory until the request to
malloc() can be satisfied. Thus, the layout of code in memory is dynamically
adjusted to reflect the memory available and the usage pattern. Only under
worst case conditions does the performance degrade to that of the traditional
static overlay schemes (a buffer is set aside so that there is always room to
load at least one vseg). The layout in memory of a VCM program at one
particular instant is shown in Figure 3.
How does VCM work? The 8088 does not support position-independent code. A far
function call consists of five bytes:
0x9A, offset-low, offset-high,
seg-low, seg-high
With VCM, functions can't be invoked with far calls because we don't know at
link time where a vseg will wind up. The possible cases are:
1. Call from root to another function in the root. A far call will work; this
is what the linker normally does anyway!
2. Call from root to a function in a vseg. The far function calls are replaced
with these five bytes:
INT 3Fh ;call VCM manager
db vcsnum ;number of virtual code segment
dw voffset ;offset within that code segment
Note that this scheme implies a maximum of 255 virtual code segments. If the
vseg is resident, VCM jumps to the start of the vseg offset by the voffset
word. ff the vseg is not resident, VCM allocates space for it via malloc,
loads it from disk, and jumps to it.
3. Call from a vseg to another function in the same vseg. This call is
converted to a near function call:
push CS ;the function will do a far return

;call near ptr function
nop ;to fill it out to 5 bytes
Fortunately, near function calls are position independent. Note that a vseg
has only one code segment, separate from any other segment, so near function
calls never cross vseg boundaries.
4. Call from a vseg to another function in a different vseg: Do the same thing
as with case 2.
5. Call from a vseg to a function in the root: Do the same thing as with case
1. However, since the linker at link time can't know at what segment value the
root will be loaded, when VCM loads a vseg, it must be relocated, much like
.EXE files are relocated when loaded.


Other Problems


How about pointers to functions? A far function pointer is a 32-bit value: the
segment and offset. How can this address be fixed, when the code can move
around at runtime? The trick here is to replace the address of the function
with the address of a thunk. The thunk is a 5-byte entity that the linker adds
to the program. The thunk is always resident and stays at a fixed address. The
thunk consists off:
INT3Fh ;call VCM manager
db vcsnum ;number of virtual code segment
dw voffset ;offset within that code segment
These pointers to functions are converted to VCM calls using the same
mechanism which converts direct function calls to VCM calls. The function
pointer now points to this reload thunk. When the function pointer is called,
control is actually transferred to the reload thunk, which calls the VCM
manager to load the vseg, which then jumps to the actual function.
So far, it's all fairly straightforward. Now comes the tricky part. Suppose
the code in vseg A calls a function in vseg B. Now B calls malloc() a few
times and causes vseg A to get discarded from memory. The return address into
vseg A is still on the stack, but now points into data! Returning from that
function will cause a crash. Thus, when vseg A is discarded, the stack must be
walked to find any return addresses into vseg A. These return addresses are
then replaced with the addresses of reload thunks for vseg A. In order to walk
the stack, and distinguish far return addresses from any other stuff that
might be on the stack, some conventions must be carefully observed. A typical
stack frame generated for a function looks like:
func proc far
push BP
mov BP, SP
sub SP,10 ;make space for local variables
....
add SP, 10
pop BP ;get caller's BP
ret
func endp
So the return address is always a fixed offset from BP, and BP can be used to
find the stack frames as the stack is walked backwards. Since we must skip any
near function calls (recall that the return addresses are position independent
and no near calls could cross a vseg boundary), we must also find a way to
distinguish a near call on the stack from a far one.
The trick is to notice that the stack is always word aligned, that is, the
least significant bit is always 0. This bit can be used as a flag to indicate
a far stack frame. The stack frame code is modified to:
func proc far 
inc BP ;indicate far return
;address on stack
push BP
mov BP,SP
sub SP,10 ;make space for
;local variables
....
add SP,10
pop BP ;get caller's BP
dec BP ;counteract previous inc
ret
func endp
near functions would not have the inc BP / dec BP pair. The stack walker now
looks at bit 0 of BP for each frame to see if the return address is far or
near.
The same syntax to the linker is used to specify vseg modules, as was used for
the older overlay schemes, i.e.:
BLINK
root1+root2+(vcsa+vcsb)+(vcsc)+(vcsd),
prog,,(mlib.lib);
Three vsegs are created (vcsa and vcsb are combined into one vseg). Enclosing
the library mlib in parentheses tells the linker to place each module pulled
in from the library in its own vseg. Interestingly, even main() can be put
into a vseg! The only thing that cannot be put into a vseg is the C runtime
library, cause that contains the VCM initialization code, which had better not
get discarded.
That's how it works. There's one more problem, though. The compiler must be
modified to produce the inc BP / dec BP pairs. The trouble is, if one module
is linked to another, and one has the pairs and the other doesn't, mysterious
and erratic program crashes may occur. The solution is to create a new memory
model in addition to the standard T,S,C,M,L models, called the V model. All
modules are compiled with the V model, and the VCM linker verifies that all
were compiled with the V model.
For assembler programmers, the linker cannot verify that the stack frame
conventions are followed, so the responsibility rests with the programmer. The
stack frame is required for any function that calls malloc or calls another
function which might call malloc or exist in another vseg. When in doubt, put
the stack frame on all functions which call other functions.
All the information and vseg code needed by VCM is stored into the .EXE
program file. It is disguised as a Microsoft-style overlay, so that various
programs that fiddle around with .EXE files will not disturb it. Since the
vsegs are loaded from the .EXE file, the runtime performance of VCM can be
improved by running the program from a RAM disk.
Typical debuggers cannot deal with code that moves around at runtime.
Zortech's debugger was modified to work with VCM so debugging VCM programs is
as easy as debugging regular programs.
VCM does not overlay data. VCM helps make room for data by discarding code
that is not in use, but it cannot swap data to disk.
In conclusion, VCM is the ideal solution to a certain class of programming
problems. It is well suited to programs that have large amounts of code, have
widely varying amounts of data, and that must run on 8088 machines. A program
fitting this criteria is a word processor. Word processing programs typically
are loaded with features, each requiring significant code to implement.
Different customers use different features. The code to implement these
features takes away from the memory needed for the data. With VCM, only those
features which are actually being used have the code for them in memory. If
there is not much data in the document being edited, the needed code is all
resident in memory and the program runs at maximum speed. As the data grows in
size, less frequently used code is discarded. Performance gets slower, and
eventually reaches the worst case, which is that of the traditional static
overlay scheme.
Since VCM's worst case performance is that of static overlays, and typically
will be far superior, VCM completely obsoletes those old overlay schemes.
Figure 1
Figure 2
Figure 3



































































Resident Print Handlers Using Turbo C


Roger T. Stevens


Dr. Roger T. Stevens is a member of the technical staff of the MITRE
Corporation, Bedford, MA. He holds a B. A. degree in English from Union
College, an M. A. in Mathematics from Boston University, an M. Eng. in Systems
Engineering from Virginia Tech and a PhD. in Electrical Engineering from
California Western University. Dr. Steven's books Graphics Programming in C
and Fractal Programming in C were published by M & T Publishing.


The MS-DOS screen dump program works well with many dot matrix printers; but
fails to fully exploit the capabilites of more sophisticated printers such as
ink jet or color thermal printers. The software described here dumps an EGA
color screen written in text mode to a CalComp Plotmaster color printer, using
techniques that can easily be adapted to many other types of printers.


Planning The Display Size


Most printers handle graphics data a line at a time, first receiving
characters that define the number of bytes of graphics data to follow, and
then the graphics data for a line. Each transmitted bit represents a dot on
the paper. The resolution and transmission codes tend to be different for each
printer.
Most printers have a much higher resolution than the EGA display. Thus, the
first step in planning a printer graphics display is to determine how many
dots are to be printed for each display pixel. You must compare the printer
resolution (obtained from the printer documentation) with the color card
resolution. The EGA display has 640 pixels horizontally by 350 pixels
vertically, giving an aspect ratio of 4:3. However, the pixels on the EGA do
not have the same size in both directions; a pixel's vertical size is 1.37
times its horizontal size. The CalComp Plotmaster printer, for example, has a
graphics area of 2000 dots horizontally by 1600 dots vertically in the
landscape mode. The dots on the printer are square with the same horizontal
and vertical sizes.
With this hardware, three printed horizontal dots can be used to represent
each horizontal pixel, resulting in a graphics area of 1920 horizontal dots
(640 x 3) and ideally, 1440 vertical dots to keep the proper proportions.
Printing four vertical dots for every one in the original display gives 1400
vertical dots, resulting in a display that is close to the proper proportions.
Alternatively, one might use wider margins and print two horizontal dots for
each horizontal pixel, giving 1280 printed dots horizontally. To maintain the
proper proportions, there should be 960 vertical dots. Unfortunately, the best
manageable approximation is three vertical dots for each horizontal pixel,
producing 1050 printed dots. Though this will give a larger vertical dimension
than desired, it is acceptable in representing a text display: when trying to
copy a graphics display, the printer may reproduce circles as ovals.


Alphanumerics -- Text To Graphics


In the EGA text mode screen, contents are stored beginning at address B8000H.
The screen consists of 25 rows of 80 characters. The characters are stored in
sequence, using two bytes for each. The first byte is the ASCII value of the
character; the second byte is the attribute, which contains color information
for both the character and the background.
You can't replicate the shape (and possibly the color) of the screen display
by just sending characters to the printer. Instead, the printer, operating in
graphics mode, must draw a reproduction of the screen display. Thus the
screendump program must convert each ASCII character in display memory to the
eight by 14 pixel grid which is actually written on the screen. The BIOS
contains 14 bytes of shape information for each character. The eight bits in
each byte represent the eight character columns, and each represents the one
of the 14 character lines.
Each active (light) pixel that is represented by a 1; each inactive pixel by a
0. The beginning address of this character array for the EGA and VGA display
is found by issuing interrupt 10H (the video services interrupt), with
register AH containing 11H, register AL containing 30H, and register BH
containing 02H. The address of the beginning of the character array will be
returned in registers ES and BP.


Writing To The Printer


If a printer is assigned to parallel port LPT1, the same fundamental function
will work to send a character to the printer regardless of the kind of
printer. The function put_out in Listing 1 outputs a character to the printer
parallel port data register, then loops repeatedly, reading the status
register until the status bit shows that the printer is ready. An 0DH is then
sent to the control port to start a strobe and an 0CH to end the strobe. The
strobe clocks the character data into the printer's receive buffer.
The function print_handler in Listing 1 sends a screenful of data to the
printer. To write to the CalComp Plotmaster, for example, four full sets of
data are transmitted, one each for red, yellow, blue, and black. The
print-handler function initializes the printer for graphics, then performs a
loop for each of the four data sets. For each set, information on the
beginning location of the display is transmitted. A loop then sends data for
each of the 25 display lines. After each color panel (data set) is
transmitted, the printer advances to the next one. Finally, the
end-of-transmission command ejects the completed picture and returns the
printer to the normal mode. The exact codes in these commands differ from
printer to printer, but are usually well-documented and easy to implement.


Sending A Row Of Characters In Graphics Mode


The function print_row sends a row of characters to the printer. A loop
repeats for each of the 80 characters in a display row. First print_row
formulates the address in memory that contains the character to be sent. The
actual address of this memory's beginning is B8000H. The 8086 family of
microprocessors stores this address in two registers: a segment register
containing B000H and a pointer register 8000H. These are combined by
multiplying the segment register contents by 16 and adding it to the pointer
register contents. Turbo C and other implementations of C represent the
address as a long integer consisting of the segment register contents followed
by the pointer register contents. The attribute and ASCII value of the
character are then read.
The information for the characters comprising one display row is stored in a
buffer consisting of two bytes for each of the 80 characters. The first byte
contains the ASCII value of the character, and the second determines whether
the foreground and background colors should be printed. The values in the
attribute byte and the color panel determine the value of the second byte. It
is zero if neither foreground or background is to be printed, one if the
foreground is to be printed but not the background, two if the background is
to be printed but not the foreground, and three if both are to be printed.
Once the information for a row of characters has been deciphered and stored,
print_row sends the information for the row to the printer. This section
starts with a loop that repeats for each of the 14 lines of pixels that make
up the character. Since the resolution of the printer is set to print three
identical lines of dots for each line of pixels, a loop duplicates the process
three times. For each printer line, print_row outputs the characters that
specify the raster mode and the number of data bytes that follow. Two bytes of
data are then sent for each character.
One of four possible situations are selected by a switch statement. If neither
foreground or background is to be printed, two bytes of zeroes are sent to the
printer. If only the foreground is to be printed, two bytes in which each pair
of bits corresponds to a single bit in the character representation are sent.
If only the background color is to be printed, two bytes are sent; a one
represents a background pixel and a zero a foreground pixel. If both
foreground and background are to be printed, two bytes of 0FFH are sent to the
printer, resulting in all pixels being printed.


Replacing The Original Print Screen Function


PrtSc generates interrupt 5. To replace the standard print screen function, we
must replace the standard int 5 vector with the address of print_handler. The
most significant byte of the int 5 vector is at address 17H. If the original
print screen function's address is still in the table, this byte should be
0F0H, (ROM BIOS section of memory) and should be changed to use the new print
routine. If the byte contains a different value, the installation program
displays the message "Alternate Print Screen Routine Has Already Been
Installed" and then terminates. This message will appear if the print routine
has already been installed, or if a different specialized print routine has
been used.
If the new function to be installed, Turbo C's setvect command puts the
address of print_handler in the vector table for interrupt 5. The function
print_handler is of type interrupt, a special Turbo C type which creates a
function that saves all registers at entry and restores them on exit.
The registers are loaded and a geninterrupt command obtains the address of the
8x14 EGA character table from BIOS. This address is placed in the variable
new_chars for future use when the interrupt is activated. The message "Text
Screen Printing Routine for Plot-Master Installed" is then displayed on the
screen.
The Turbo C instruction keep is used to terminate but keep the new print
screen interrupt handler resident in memory. The second parameter passed to
the keep function is the amount of memory, in paragraphs, reserved for the
function being saved. The amount of memory specified should be as little as
possible, but not so small as to cut off part of the function. To determine
the minimum amount of memory, temporarily comment out the main program and
then compile the remainder while generating a map file. (In Turbo C type an
ALT-0 (Options), selecting Linker and then Map File Detailed. When compilation
is completed, you can examine the map file to get an idea of how much memory
the program requires. You should add some memory as a safety factor, convert
the result to paragraphs and enter it in the keep statement. With the comment
delimiters removed, the program can be recompiled using the tiny memory model.
The compiled program should be converted from an .EXE program to a .COM
program using the EXE2BIN utility.
MS-DOS is not reentrant, so you must take care in designing of a function to
act as an interrupt handler. Portions of the program that are to remain
resident and replace the original print screen function must not include any
calls to MS-DOS or BIOS routines. If the functions makes calls to MS-DOS or
BIOS routines, it may behave unexpectedly when the interrupt is called while
another MS-DOS function is active since one stack overlays another. There are
ways of including checks and tests to assure that everything is compatible,
but the safest method is to write the function without MS-DOS calls.


Making It Work With Turbo C 2.0



The program in Listing 1 will work with Turbo C v1.0 or 1.5. However, it will
not work with Turbo C v2.0. In the newer version, Borland has chosen to
restore the original values of interrupts 4, 5, and 6 before leaving a Turbo C
program. Thus the setvect command puts the address of the new print function
into the interrupt table, but as soon as the program terminates and stays
resident, the address is changed back to its original value. When the PrtSc
key is hit, the original print screen function will run.
Borland's documentation doesn't call attention to the change in setvect but
the distribution disks do include some (undocumented) files which address the
problem.
The assembly file CO.ASM contains the source code used to begin and end a
Turbo C program, including the function _restorezero, which restores the
original values of the interrupts. Remove all of this function except for the
return statement. You cannot remove the entire function because it is called
by several other functions. With TASM in the same directory as Turbo C rebuild
the startup code by running the batch file BUILD_CO. This batch program will
run TASM with all the proper switches set. It's best to use the TINY memory
model when compiling. As a result of running this program, a new version of
COT.OBJ is created, which replaces the version supplied with Turbo C. After
compiling the print program with Turbo C using the new COT.OBJ, the compiled
and linked program will work. 
This work was supported by the Department of Defense under contract AF
F19628-86-C0001.

Listing 1
/* Program to Replace Print Screen Function */

#include <stdio.h>
#include <dos.h>
#include <bios.h>
char put_out(char character);
char status();
void printrow(int y,int screen);

int i, j, attr, buffer[80][2];
unsigned char far *new_chars;
char output1,output2;

void interrupt print_handler()
{
put_out(0x02); /*Enter graphics mode*/
put_out(0x4C); /*Enter landscape mode*/
for (j=0; j<3; j++)
{
put_out(0x1B);
put_out(0x4F);
put_out(0x00);
put_out(0xD4);
put_out(0x00);
put_out(0x24);
/*Set origin to row 228 and column 36*/
for (i=0; i<25; i++)
printrow(i, j);
if (j<2)
put_out(0x0C); /*Advance color panel*/
}
put_out(0x04); /*Eject printed sheet*/
}

char put_out(char character)
{
char status=1;
int port = 0x378;
/*Send character to parallel port*/
outp(port,character);
/*Wait for printer to be ready*/
while ((!(status=inp(port+1) & 0x80));
outp(port+2,0x0D); /*Start output strobe*/
outp(port+2,0x0C); /*End output strobe*/
}
void printrow(int y, int screen)
{
int i,j,k,m,temp,ch,offset,tester;
char far *address;
unsigned char char_test,char_or;
for (i=0; i<80; i++)
{

offset = 2* (80*y + i);
address = 0xB8000000L + offset;
attr = *{address+1); /*get attribute*/
ch = *address; /*get character*/
buffer[i][0] = ch; /*character to first buffer*/
temp = attr & 0x07;
buffer[i] [1] = 0;
if ((screen == 0) && ((temp%2) == 0))
buffer[i][1] = 1;
if ((screen == 2) && (temp <4))
buffer[i][1] = 1;
if ((screen == 1) && ((temp == 0) (temp == 1) 
(temp == 4) (temp == 5)))
buffer[i][1] = 1;
temp = (attr & 0x70)>>4;
if ((screen == 0) && ((temp%2) == 0))
buffer[i][1] += 2;
if ((screen == 2) && (temp <4))
buffer[i] [1] += 2;
if ((screen == 1) && ((temp == 0) (temp == 1) 
(temp == 4) (temp == 5)))
buffer[i][1] += 2; /*set 2nd buffer to show printing */
/*of foreground and background */

for (i=0; i<14; i++)
{
for (k=0; k<3; k++)
{
put_out(0x1B);
put_out(0x4B); /*Raster data follows*/
put_out(0x00);
put_out(0xA0); /*160 bytes to be output*/
for (j=0; j<80; j++)
{
/*determine what bits to set for*/
/*correct color*/
switch(buffer[j][1])
{
case 0: /*Color not printed*/
put_out(0x00);
put_out(0x00);
break;
case 1: /*Foreground on ly printed*/
output1 = 0x00;
output2 = 0x00;
tester = 0;
for (m=0; m<4; m++)
{
char_test = 0x80 >> m;
char_or = 0xC0 >> tester;
if ((*(new_chars+buffer[j][0]*14+i) &
char_test) != 0)
output1 = output1 char_or;
tester += 2;
tester = 0;
for (m=0; m<4; m++)
{
char_test = 0x80 >> (m+4);
char_or = 0xC0 >> tester;

if ((*(new_chars+buffer[j][0]*14+i) &
char_test) != 0)
output2 = output2 char_or;
tester += 2;
}
put_out(output1);
put_out(output2);
break;
put_out(output1);
put_out(output2);
break;
case 2: /*Background only printed*/
output1 = 0x00;
output2 = 0x00;
tester = 0;
for (m=0; m<4; m++)
{
char_ test = 0x80 >> m;
char_or = 0xC0 >> tester;
if ((*(new_chars+buffer[j][0]*14+i) &
char_test) == 0)
output1 = output1 char_or;
tester += 2;
}
tester = 0;
for (m=0; m<4; m++)
{
char_test = 0x80 >> (m+4);
char_or = 0xC0 >> tester;
if ((*(new_chars+buffer[j] [0]*14+i) &
char_test) == 0)
output2 = output2 char_or;
tester += 2;
}

case 3: /*Solid color printed*/
put_out(0xFF);
put_out(0xFF);
break;
}
}
}
}
}
main()
{
char far *address;
unsigned char check;
union REGS reg;
struct SREGS inreg;
address = 0x00000017;
check = *address;
if (check == 0xF0) /*Check if still address in ROM BIOS*/
{
setvect(5,print_handler); /*Replace interrupt address*/
/*with that of print_handler*/

_BH = 2;
_AH = 0x11;

_AL = 0x30;
geninterrupt(0x10); /*Get address of character table*/
new_chars = (unsigned char far *) (_ES*0x10000 +_BP);
printf("\nText Screen Printing Routine for"
" PlotMaster Installed\n");

keep(0,0x200); /*Terminate and stay resident*/
}
else
printf("\nAlternate Print Screen Routine Has"
" Already Been Installed\n");
}



















































Controlling The Keyboard Buffer


Steven Gruel


Steven Gruel is a programmer/analyst with Restaurant Integrated Computer
Controls (RICC) in Norcross, GA. Presently he is working on operations control
and communications on an organization-wide scale. His computer interests
include C programming and assembler optimizing. You may contact him at 5175
Wexford Lane, Norcross, GA 30071.


Have you ever needed to ungetch() more than one character at a time, see how
many characters are in the buffer, or even to know what the characters are
without removing them from the buffer? Fortunately, Turbo C makes this easy
and painless, without using interrupts or assembler. By controlling the
circular character buffer and the two pointers that regulate the incoming and
outgoing character data, you can create any situation in the buffer you wish.


How The Buffer Works


The buffer consists of 16 consecutive words of memory and two pointers. The
first pointer is called the "Head", resides at 0x40:0x1A, and points to the
next word in the buffer to be read by getch() or other similar functions. The
second pointer is called the "Tail" and is positioned at 0x40:0x1C. It points
to the location where the next keyboard interrupt will leave its data. When
the buffer is empty, Head and Tail both point to the same place, making them
equal. As keys are added to the buffer, the Tail is incremented by twos. Once
Tail exceeds 60, it will restart at 30, giving the buffer its circular
behavior. The Tail will continue moving in this circle until it has filled the
word behind the Head. At this point the Tail will not be increased, and all
subsequent keys sent by the keyboard interrupt will merely beep and be ignored
until the Head is moved by a get-character function. When keyboard data is
brought into the C program, the Head is decremented one word per key. When
getch() is called the first time, it returns 0, moves the scan code to
character byte, and does not change the Head until Head reaches the location
one word short of the Tail, indicating that it's full.


How To Use The Functions


poll_kb() checks the buffer for any characters that may be available and
returns either the character or zero, the latter indicating no data available.
readch() uses poll_kb() in a loop to wait for a non-zero result. readch()
returns the non-zero result much like getch() except that it shifts left any
extended character's scan code in place of the ASCII code.
push_kb() puts a scan code and ASCII character code into the buffer and moves
the Tail accordingly, just like the key_board interrupt does.
kb_count() simply returns the number of keystrokes stored in the buffer.
kb_peek() looks ahead the number of reads specified, without changing any data
or positions.
kb_clear() sets Head and Tail to the same value, creating an empty buffer.
Note that poll_kb(), readch() and kb_peek() return all extended characters on
their first call as the scan code (i.e., the extended part of the character
code) logically shifted left eight bits. Any return value of 0 means there is
no data in the buffer for this call.


Applications


You can use these functions in a variety of situations, such as teaching new
users to use a software package. For example, you could store keyboard
sequences in a file along with prompts for popup messages. Your interactive
tutorial program must first disable the keyboard interrupt by storing the
current interrupt address and replacing it with an empty function. Then you
must tag onto one of the system clock interrupts to read the file data, check
the keyboard buffer, insert key codes, and to call the message and time delay
handler. These tasks could either be linked into the target program or coded
as a TSR.
You should also consider using the keyboard buffer as an easy way to add
macros to your existing code with a minimum of fuss. Another interesting
application might be to turn control over to a remote location through a modem
or network or to provide technical support with a more hands-on approach.

Listing 1
#include <dos.h>
#include <conio.h>

void show_char();
int poll_kb();
int readch();
int push_kb();
int kb_count();
int kb_peek();
void kb_clear();

void main(void) /* testing the KB functions */
{
int x, y;

/* Put some characters into the buffer */
/* For most applications the scan codes are not */
/* important, but for extended char it must have */
/* the scan code and 0 for the char: */
for (x : 0; x < 12; x++)

push_kb(1,32 + x);
push_kb(59,0);

printf("\n%d key(s) in the buffer. \n\n",kb_count());
printf("Key number %d is '%c' \n\n", 5, kb peek(5));
printf("Key number %d is '%c' \n\n", 7, kb_peek(7));

show_char();
}

void show_char(void) /* This is just a test function */
{
int c;
while((c = readch()) != 59 << 8) /* 27 = ESC */
{
putch(c);
printf(" the code for this is %d \n",c);
}
putch(c);
printf(" the code for this is %d \n",c);
}

/* poll_kb(): returns 0 if the buffer is empty, */
/* else ascii code or scan codes * 256 for extended: */

int poll_kb(void)
{
int head, tail;
unsigned int code;

head = peek(0x40, 0x1A);
tail = peek(0x40, 0x1C);

if(head == tail)
return(0);

code = (peek(0x40,head) & 0xFF);
if(code == 0) /* extended char */
code = (peek(0x40,head) & 0xFF00);

if(head < 60)
poke(0x40,0x1A,(head += 2));
else
poke(0x40,0x1A,(head = 30));

return(code);
}

/* readch(): keep checking buffer until something */
/* is there. Can also be done with "kbhit()" */

int readch(void)
{
int code;
while((code = poll_kb()) == 0)
{ /* can run any fast function here */
}
return(code);
}


/* putsh_kb(): key = the key code, ascii = asc code */
int push_kb(int key, int ascii)
{
unsigned int code;
int head, tail;

head = peek(0x40, 0x1A);
tail = peek(0x40, 0x1C);

code = (key << 8) + ascii;
poke(0x40,tail,code);

if(tail < 60)
{
tail += 2;
if(tail == head)
return(0);
poke(0x40,0x1C,tail);
}
else
{
tail = 30;
if(tail == head)
return(0);
poke(0x40,0x1C,tail);
}
return(1);
}

int kb_count(void)
{
int head, tail;
int count;

head = peek(0x40, 0x1A);
tail = peek(0x40, 0x1C);

if(head == tail) /* buffer is empty */
return(0);

if(head > tail) /* adjust for the roll over */
tail += 32;
count = (tail - head) / 2;

return(count);
}
/* "kb_peek" will return the ascii code of */
/* the number of keys back specified. */
/* If it is an extended char the scan code */
/* will be left shifted 8 and returned */

int kb_peek(int key_number)
{
int head, tail, code;

/* see if key_number of items are in the buffer: */

if(key_number > kb_count())

return(0);

head = peek(0x40, 0x1A);
tail = peek(0x40, 0x1C);

/* advance the desired number of words: */
head += (key_number - 1) * 2;
if(head > 60)
head -= 32; /* if head needs to wrap around */

code = (peek(0x40,head) & 0xFF);

if(code == 0) /* exten. scan sent instead of ascii */
code = (peek(0x40,head) & 0xFF00);

return(code);
}

void kb_clear(void) /* empty buffer data */
{
int head, tail;

head = peek(0x40, 0x1A);
tail = peek(0x40, 0x1C);

poke(0x40,0x1C,head); /* set tail = head */
}




































An Applied File I/O Tutorial: Using Binary File I/O


Leor Zolman


After spending the first half of his life in Hollywood, CA and the second in
Boston, MA (where he happily discovered Thai restaurants), Leor Zolman now
resides directly between those two cities in beautiful Lawrence, KS, where he
has a tremendously enjoyable time hacking DOS and Xenix systems for CUJ (but
really misses Thai restaurants.)


This is the final installment in a series of articles describing the design
and implementation of a simple flat-file information manager in C. My goal for
this series has been to provide a clear, general introduction to the usage of
elementary C building blocks.
Last month I presented ASCII-based file I/O routines for reading and writing
the Mini-Database system data records to disk. Those ASCII routines used the
text-line-oriented library functions fprintf() and fscanf() to perform their
data transfers. While ASCII data files are easily examined and debugged, they
are generally less efficient than binary files. An application that
manipulates ASCII data must perform format conversions on the data objects:
from binary to ASCII on output to disk, and vice-versa when the data is read
back from disk. While the performance penalty isn't really noticeable for
small sets of data (such as after entering a few records into the
Mini-Database), for larger amounts of data the time needed to perform these
conversions can be prohibitive, especially when a larger percentage of the
data is numeric.
Therefore, the subject this month is a set of disk routines for reading and
writing the data files of our Mini-Database system in straight, binary format.
Instead of formatting the data into ASCII text, these binary routines will
simply dump the data from memory into disk files and back again with no
translation whatsoever. The effective results, at least as far as a user of
the Mini-Database may notice, will be the same as if the data were being
translated to ASCII and back again to binary. To make any sense of the data
files outside of the application, however, you would probably need a low-level
debugging tool and some knowledge of the specifics of the target machine's
data type characteristics.


The Modular Approach


The ASCII-based file I/0 functions presented last month, read_db() and
write_db(), were placed in a source file by themselves. Since the rest of the
Mini-Database system code makes no assumptions about how read_db() and
write_db() perform their tasks, we can simply substitute a new source file
containing the binary versions for the source file containing last month's
versions of these functions. You won't even have to re-compile the other
modules; just compile the one new source module, and then link it in with the
other existing object modules to get a complete, new executable module.


Variations On A Theme


The Mini-Database (as presented up to now) has used a fixed-length array named
recs to store the pointers to data records in memory. The maximum number of
records that the system can handle has been fixed by the value of the MAX_RECS
symbolic constant, defined in the header file. While this arrangement is
reasonably efficient (because memory for the data records themselves is still
obtained dynamically), we could improve the system further by somehow allowing
even the memory used by the recs array to be allocated at run-time, based upon
anticipated requirements.
When a binary format is used for representing the data on disk, we can tell
exactly how much run-time memory will be required to hold an existing
Mini-Database file saved in binary format just by finding the length of the
saved binary file (note that this would not be true in the case of files saved
in ASCII text format).
So how can we arrange to allocate only the amount of memory we'll really need
for the array of record pointers, and no more? The answer lies in the
effective use of C's powerful data-typing mechanisms; the technique is called
dynamic array allocation.


Building And Using A Dynamic Array


Simple arrays in C are defined using notation such as
type-name identifier[size-constant];
The array dimension expression (size-constant, above) may be omitted in
certain special instances, such as when the array is a formal parameter in a
function definition (or in a prototype). Take, for example, a prototype for
the common C library function strlen():
int strlen(char str[])
The dimension isn't necessary because in such cases the identifier str is
acting purely as a pointer. There is no memory actually allocated except a
single pointer's worth; that pointer becomes a place-holder for an existing
(previously allocated) array whose address is passed to the function.
We can set up a dynamic array that works somewhat like the formal array
parameter described above, but explicitly rather than implicitly. Listing 1
and Listing 2 show new versions of the header file and main program file for
the Mini-Database system, enhanced to provide the option for a
dynamically-allocated recs array (the original versions, limited to a
statically-allocated array, were published in the April 1990 issue). Note that
with the DYN_ARRAY symbol defined to FALSE, the program compiles as in the
original version, with recs defined as a simple array of pointers (Listing 1,
line 57). With DYN_ARRAY set to TRUE on the other hand, recs is defined as "a
pointer to an array of pointers to structures" (line 53).
Notice the empty square brackets! Since we now have a data type that is
primarily a pointer, not an array, the compiler doesn't really care how many
elements the array being pointed to may contain at runtime. We as programmers
care very much, however, because we are going to be managing the memory for
this "virtual array" explicitly. We will obtain memory for the dynamic array
via malloc(), and assign the address of the memory to the recs variable. Once
recs is initialized, the expression (*recs) will be valid when used as the
base array expression in array subscripting operations.
Now the reason for the symbolic constant RECS (Listing 1, lines 54 and 58)
begins to emerge: wherever the simple array name recs was used in the original
(static array) version of the program, we now must substitute the more complex
expression (*recs) to support dynamic array allocation. Rather than using
awkward conditional blocks (e.g., #if. . .#else) every time an array operation
appears, I've chosen to use RECS as the generalized expression for the array
base, greatly reducing the number of places where code must be changed to
support the dynamic arrays. In the few places where alternate coding was
unavoidable, I've used conditional compilation based on the value of the
DYN_ARRAY symbol.


Binary I/O Without Dynamic Allocation


First, I'll explain how the binary file I/O functions work in the case where
DYN_ARRAY is FALSE, i.e., RECS is defined as the simple static-array name
recs.
To write the currently active database to disk, the write_db() function
(Listing 3, lines 74-115) is called with the filename as the only formal
parameter. To safeguard against wiping out previously-stored versions of the
data file in the event of a write error, a temporary filename is used to
actually write the file. If the write operation completes without error, then
the temporary file is renamed with the supplied filename parameter.
The first really interesting difference between the binary and text-based
versions of the function is the way the fopen() library call is invoked.
Instead of using a mode of "w", the binary version uses "wb". This mode
prevents system-dependent automatic newline-conversions. With a mode of "w",
newlines (represented in memory by a single ASCII linefeed [0x0a] character)
are expanded into CR-LF pairs on certain systems (like MS-DOS) before being
written to disk. With the "wb" mode, this conversion is never performed;
therefore, "wb" is the mode that must be used to write binary data.
To perform the disk writes, we call upon the standard library function
fwrite() within a for loop that iterates through the records pointed to by
RECS. The fwrite() function has been cleverly designed to accept its data
parameters in terms of logical records of whatever size is convenient to the
programmer, as opposed to solely in terms of "bytes." Such organization easily
supports the writing of fixed-length records with effective error checking,
and all in a very portable manner.
Our data is organized by records, each of which is a structure of type record
being pointed to by an entry in the RECS array. Listing 3, lines 96 and 97
show the fwrite() call for a single record. The first parameter is a pointer
to the data, the second is the size of each logical record, the third is the
number of such logical records to be written, and the final parameter is the
file pointer.
Notice how the sizeof operator is used to obtain the size of the data record.
This C "idiom" is far preferable to absolute byte counts embedded as numeric
constants. Using sizeof renders the code immune to variations in data
representation between different kinds of computers. There is no additional
runtime overhead incurred either, because sizeof expressions are always
evaluated at compile time.
fwrite() returns the number of logical records successfully written. The error
checking is easy: if the return value does not match the value of the third
parameter, there must have been a fatal error, in which case, we diagnose the
error, clean up, and return. If the return value agrees, the write loop
continues.
After all records have been written without incident, the old version of the
disk file is removed (line 106) and we begin a while loop. This loop is
another safeguard for the file-renaming process; if, for any reason, the
temporary file cannot be renamed to the filename supplied, then the user is
given the opportunity to enter another filename. This process continues until
the rename() function succeeds.
This safety feature wasn't in my first draft, but during the testing process,
I once created a new database and accidentally gave it an illegal filename.
Since no filename checking was performed until write_db() was actually called,
the program wouldn't let me save my data, and I had to start over! Safety
checks like this while loop help reduce the chance of losing data,
contributing to "user friendliness".
Reading a saved database file into memory with read_db() follows a similar
pattern. Again, the letter "b" is appended to the mode string (yielding "rb")
in order to turn off text conversions on input. The fread() standard library
function is analogous to fwrite(), with one minor variation: we must be able
to distinguish between error conditions and a normal end-of-file condition. If
the ferror() function returns a logical true (non-zero) value after a call to
fread(), then we know some kind of error has occurred. Otherwise, any return
value less than the value of the third parameter indicates an end-of-file
condition, with the number of records read equal to the return value. Lines
60--64 of Listing 3 perform the checks for error and end-of-file conditions.
If both tests are passed, then lines 63--64 allocate memory for the record
(checking for an out-of-memory condition), and the data is copied over from
the temporary holding buffer recbuf using structure assignment (line 66).


Going Dynamic



Now let's examine how the system operates when DYN_ARRAY is set to TRUE. First
a new symbolic constant, MAX_TO_ADD, and the previously discussed changes to
the definitions of recs and RECS are added to the header file (Listing 1,
lines 52-54).
MAX_TO_ADD controls the allocation of memory for database records
above-and-beyond the existing memory requirements. When MAX_TO_ADD is 100,
only 100 or fewer new records can be added to any database file within a
single session (between opening and closing the file). If the MAX_TO_ADD limit
is reached, the user must close the file and re-open it before adding
additional records. We cannot "dynamically extend" the RECS array once it has
been initially allocated, because the memory must be contiguous. (Remember,
we're dealing with memory for the array of pointers to the data records, not
the memory for data records. The data record memory can be allocated in
record-sized chunks as necessary.)
Within the CREATE option of the main menu, the dynamic array mechanism
requires some special setup (Listing 2, lines 74-81). The size for RECS is
calculated as the product of the data record pointer size multiplied by
MAX_TO_ADD (since the database will be initially empty). The required memory
is obtained via malloc(), and the array pointer recs is initialized with the
address of the allocated memory block. Note that this is one of only two
locations in the program code (the second comes up shortly) where the variable
recs is operated on directly; in all other references, recs appears only in
conjunction with the indirection operator as specified in the definition of
RECS (Listing 1, line 54).
The other section of code that must change to support the dynamic array is in
the read_db() function. Before a saved database file is loaded into memory, we
must know its size so the appropriate amount of memory can be allocated. The
file size is obtained by calling the fseek() and ftell() standard library
functions: after a data file is opened for binary input, we seek to the end of
the file using fseek() (Listing 3, line 38). Then we call ftell() to get the
byte-position of the end-of-file marker. That byte-position value is divided
by the logical data record size to yield the number of records stored in the
file, MAX_TO_ADD is added, and (finally!) that resulting value is scaled by
the size of a record pointer to determine the appropriate quantity of memory
to allocate (line 41). After all that, we must still call fseek() one more
time to re-position the file pointer at the beginning of the file, in
preparation for reading the data.
Note: there may be a library function included with your particular C
implementation that simply returns the size of a file, eliminating the need to
use fseek() and ftell() as shown. However, not all compiler vendors provide
such a function. Use the technique shown here for maximum portability; this
method is fairly efficient because file seeks are typically performed via
fast, random-access system calls.


Exercises


As the system is now structured, all data for an open database file must be
present in memory simultaneously. How might the system be enhanced to allow
databases too large for available memory to be handled? This is not a trivial
proposition, by the way.
For a first attempt, think about keeping only the current record in memory and
don't worry about how to sort the database.
When you have that much designed, create an indexing scheme based on an
in-memory index array that represents the ordering of the records in the data
file. When the file is stored, the array occupies the first block of file data
(preceded by some parameters indicating the length of the index array, by
necessity, and perhaps also the length of the data portion of the file for
convenience). As individual records are added and deleted, the index array can
be updated to reflect logical ordering.
The package could also be generalized by allowing the index array to be kept
in its own data file and to grow larger than the existing main memory
capacity. At that point, you'd have a basic framework similar to the scheme
used by many full-blown relational database systems! But realistically, I'd
recommend using one of those commercial systems over designing one
yourself...unless you're really ready for a challenge.

Listing 1
1: /*
2: * MDB.H (Generalized Static/Dynamic Array Version)
3: *
4: * Program: Mini-Database
5: * Written by: Leor Zolman
6: * Module: Program Header File
7: *
8:
9: #define TRUE 1
10: #define FALSE 0
11:
12: #define DYN_ARRAY TRUE
13:
14: /*
15: * Prototypes:
16: */
17:
18: int do_menu(struct menu_item *mnu, char *title);
19: void write_db(char *filename);
20: int read_db(char *filename);
21: void edit_db();
22: void fix_db();
23: void backup_db();
24: void error(char *msg);
25: struct record *alloc_rec(void);
26: void free_up();
27:
28: /*
29: * Data Definitions:
30: */
31:
32: struct record { /* Database record definition */
33: char active; /* TRUE if Active, else FALSE */
34: cha last[25], first[15];/* Name */
35: long id; /* ID Number */
36: int age; /* Age */
37: char gender; /* M or F */
38: float salary; /* Annual Salary */
39: };
40:
41: #define MAX_RECS 1000 /* Maximum number of records */

42:
43:
44: #ifdef MAIN_MODULE /* Make sure data is only */
45: #define EXTERN /* DEFINED in the main module, */
46: #else /* and declared as EXTERNAL in */
47: #define EXTERN extern /* the other modules. */
48: #endif
49:
50:
51: #if DYN_ARRAY /* Dynamics array allocation: */
52: #define MAX_TO_ADD 100 /* Limit on # of new records */
53: EXTERN struct record *(*recs)[]; /* recs: ptr to ar- */
54: #define RECS (*recs) /* ray of ptrs to struct record */
55:
56: #else /* Static array allocation: */
57: EXTERN struct record *recs[MAX_RECS]; /* Array of ptr */
58: #define RECS recs /* to struct of type record */
59: #endif
60:
61: EXTERN int n_recs; /* # of records in current db */
62: EXTERN int max_recs; /* Max # of recs allowed */
63:
64: struct menu_item { /* Menu definition record */
65: int action_code; /* Menu item code */
66: char *descrip; /* Menu item text */
67: };
68:


Listing 2
1: /*
2: * MDBMAIN.C
3: *
4: * Program: Mini-Database
5: * Written by: Leor Zolman
6: * Module: Main Program Module
7: *
8: * Program Description:
9: * This system is an "introductory showcase" of
10: * C programming techniques for File I/O-related
11: * applications. Areas of focus include:
12: * Static and Dynamic Array Allocation
13: * Text-based and Binary-based Disk Data Storage
14: * Elementary user-interface and error-handling
15: *
16: * Compile & Link (Turbo C):
17: * tcc mdbmain.c mdbedit.c mdbutil.c
18: * {mdbftxt.c or mdbfbin.c}
19: */
20:
21: #include <stdio.h>
22: #include <stdlib.h>
23:
24: #define MAIN_MODULE 1 /* force data definitions */
25: #include "mdb.h"
26:
27:
28: #define CREATE 1 /* Main menu action codes */
29: #define OPEN 2

30: #define EDIT 3
31: #define SAVE 4
32: #define BAKUP 5
33: #define CLOSE 6
34: #define ABANDON 7
35: #define QUIT 8
36:
37: static struct menu_item main_menu[] =
38: {
39: {CREATE, "Create New Database"},
40: {OPEN, "Select Existing Database to Work With"},
41: {EDIT, "Edit Database Records"},
42: {SAVE, "Write Database to Disk"},
43: {BAKUP, "Backup Database to Floppies"},
44: {CLOSE, "Close the Database"},
45: {ABANDON, "Abandon Changes to the Current Database"},
46: {QUIT, "Quit"},
47: {NULL} /* End of list */
48: };
49:
50:
51: main(int argc, char **argv)
52: {
53: char db_name[150];
54: int db_active = FALSE; /* No Database open */
55: FILE *fp;
56: #if DYN_ARRAY
57: int array_size;
58: #endif
59: while (1)
60: {
61: switch(do_menu(main_menu, "Main Menu"))
62: {
63: case CREATE:
64: if (db_active)
65: goto still_open;
66: printf("Name for new Database? ");
67: gets(db_name);
68: if ((fp = fopen(db_name,"r")) != NULL)
69: {
70: printf("That filename already exists.\n");
71: fclose(fp);
72: break;
73: }
74: #if DYN_ARRAY
75: array_size = MAX_TO_ADD * sizeof(struct record *);
76: if ((recs = malloc(array_size)) == NULL)
77: { /* allocate the memory */
78: printf("Couldn't allocate recs array.\n");
79: break;
80: }
81: max_recs = MAX_TO_ADD;
82: #else
83: max_recs = MAX_RECS;
84: #endif
85:
86: db_active = TRUE;
87: n_recs = 0;
88: printf("Database created; entering EDIT mode:\n");

89: /* After creating, fall through to EDIT */
90:
91: case EDIT:
92: if (!db_active)
93: goto inactive;
94: edit_db(); /* Edit records in memory */
95: break;
96:
97: case OPEN:
98: if (db_active)
99: {
100: still_open: printf("Current Database still open.\n");
101: break;
102: }
103: printf("Database Name? ");
104: gets(db_name);
105: if ((n_recs = read_db(db_name)) != NULL)
106: {
107: printf("\nLoaded %d Record(s).\n", n_recs);
108: db_active = TRUE;
109: }
110:
111: edit_db();
112: break;
113:
114: case BAKUP:
115: if (!db_active)
116: goto inactive;
117: backup_db(); /* Perform backup */
118: break;
119:
120: case CLOSE:
121: if (!db_active)
122: goto inactive;
123: write_db(db_name); /* write to disk */
124: free_up();
125: db_active = FALSE;
126: break;
127:
128: case SAVE:
129: if (!db_active)
130: goto inactive;
131: write_db(db_name); /* write to disk */
132: break;
133:
134: case ABANDON:
135: if (!db_active)
136: {
137: inactive: printf("Please select a Database first!\n");
138: break;
139: }
140: free_up();
141: db_active = FALSE;
142: break;
143:
144: case QUIT:
145: if (db_active)
146: goto still_open;
147: exit(0);

148: }
149: }
150: }
151:
152: /*
153: * Function: backup_db
154: * Purpose: Backup current Database to floppies
155: * Parameters: None
156: * Return Value: None
157: */
158:
159: void backup_db() /* Backup module */
160: {}


Listing 3
1: /*
2: * MDBFBIN.C
3: *
4: * Program: Mini-Database
5: * Written by: Leor Zolman
6: * Module: File I/O, Binary Representation Version
7: */
8:
9: #include <stdio.h>
10: #include <stdlib.h>
11: #include "mdb.h"
12:
13: /*
14: * Function: read_db
15 * Purpose: Load an existing Database from disk
16: * Parameters: Name of Database to load
17: * Return Value: NULL on error, else # of records.
18: */
19:
20: int read_db(char *filename)
21: {
22: FILE *fp;
23: int rec_no = 0; /* # of records read successfully */
24: int nrecs, result;
25: struct record recbuf;
26:
27: #if DYN_ARRAY
28: long fsize;
29: int array_size;
30: #endif
31:
32: if ((fp : fopen(filename, "rb")) == NULL)
33: {
34: printf("Database not found.\n");
35: return 0;
36: }
37:
38: #if DYN_ARRAY /* Allocating array space dynamically */
39: fseek(fp, 0L, 2); /* skip to end of data */
40: fsize = ftell(fp); /* to get size of data file */
41: nrecs = ftell(fp) / sizeof(struct record); /* # of recs */
42: max_recs = nrecs + MAX_TO_ADD; /* allow MAX_TO_ADD more */
43. array_size = max_recs * sizeof(struct record *);

44: /* allocate the memory */
45: if ((recs = malloc(array_size)) == NULL}
46: {
47: printf("Couldn't allocate recs array; aborting.\n");
48: return NULL;
49: }
50: fseek(fp, 0L, 0); /* reset to beginning of data */
51: #else
52: max_recs = MAX_RECS;
53: #endif
54:
55: for (nrecs = 0; ;nrecs++)
56: {
57: result = fread(&recbuf, sizeof(struct record), 1, fp);
58:
59: if (result == 0) /* EOF */
60: break;
61:
62: if (ferror(fp))
63: error("Error on file input. Aborting.\n");
64:
65: if ((RECS[nrecs] = alloc_rec()) == NULL)
66: error("Out of memory. Aborting.\n");
67:
68: *RECS[nrecs] = recbuf; /* Copy the record */
69: }
70:
71: fclose(fp);
72: return nrecs;
73: }
74:
75:
76: /*
77: * Function: write_db
78: * Purpose: Write current Database to disk
79: * Parameters: Name of Database
80: * Return Value: None
81: */
82:
83: void write_db(char *filename)
84: {
85: FILE *fp;
86: char *tempname = "TEMPFILE";
87: int result, i;
88:
89: if ((fp = fopen(tempname, "wb")) == NULL)
90: {
91: printf("Can't open Database file for reading.\n");
92: return;
93: }
94:
95: printf("Writing Database %s To Disk...\n", filename);
96:
97: for (i = 0; i < n_recs; i++)
98: if (fwrite(RECS[i], sizeof(struct record),
99: 1, fp) != 1)
100: {
101: printf("Error writing file. Aborting attempt.\n");
102: fclose(fp);

103: remove(tempname);
104: return;
105: }
106:
107: fclose(fp);
108: remove(filename);
109: while (rename(tempname, filename) == -1)
110: {
111: printf("Error renaming temp file: %s\n",
112: _strerror(NULL));
113: printf("Please enter a new filename: ");
114: gets(filename);
115: }
116: printf("Database written successfully.\n");
117: }
















































Standard C


Interpreting The Nasties




P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standards committee. His latest book is Standard C which
he co-authored with Jim Brodie.


Last month, I described the first meeting of committee X3J11 since the C
standard was approved by ANSI. We are now into the interpretation phase.
People who have questions about the standard can submit them to X3. That
committee registers the questions and passes them to X3J11 to draft a
response.
I outlined the rules and procedures by which X3J11 must now respond to
requests for interpretation. I also discussed a handful of questions that were
handled at the first interpretation meeting. The questions were presented in
order of difficulty.
Even the most difficult of the questions I presented last month merely
involved lapses in the standard. The issues arose because the standard failed
to describe certain cases completely. Absent specific details, it devolves
upon X3J11 to provide an interpretation.
This month, I present several issues that are nastier. They arise because the
standard doesn't say what many of us intended it to say. That may be because
we didn't understand the import of certain phraseology. It may be because of
an infelicitous edit that got by the extensive review process. It may be
because we simply got it wrong.
The reasons why are now largely unimportant. The C standard says what it says.
We cannot change it until the next review process, years from now. We can only
interpret what it says, like it or not.
Here are a few cases where a number of us definitely do not like what the
standard says. I will discuss the issues that turned up the unexpected
wording. I will also endeavor to estimate the extent of the damage, given that
we have to live with the words we adopted.


Percolating Type Information


We received a request for clarification of how declarations with linkage
interact. Linkage can be external, for a name known across separate
translation units, or internal, for a "file level static" known only within
the current translation unit. A declaration can also have no linkage, such as
an auto or typedef declaration.
A declaration with linkage designates the same function or data object as any
other declaration with linkage for the same name in the same translation unit.
For example,
extern int a[];
extern int a[10];
Both declarations have external linkage and both declare the name a.
Therefore, both name the same array of int. The types are not identical, but
they are compatible. For the second declaration, the translator ascribes to a
the composite type formed from the two compatible types. A composite type
merges all information from the two types. In this case, the composite type is
array of 10 int.
All that is clear enough and familiar to most C programmers. Where life gets
exciting is when the two declarations exist in different scopes. Existing C
translators are all over the map on this topic. Committee X3J11 had
innumerable discussions to determine the best standard behavior.
One thing we eventually agreed upon was that type information should not
percolate out from an inner block. For example,
extern int a[];
int f() {
extern int a[10];
.....}
int sizea = sizeof a; /* ERROR */
Both declarations designate the same array a, but only the inner one knows the
size of the array. That declaration evaporates when its scope ends. Hence, the
operand to the sizeof operator has an incomplete type. Since this violates a
constraint error, the translator must issue a diagnostic.
You know and I know that the array has ten elements. Any translator can easily
retain the same information. It probably should, for checking of further
declarations or for passing hints on to the linker. Nevertheless, it cannot
make use of this information to avoid the diagnostic. That would do violence
to the block structure of C. A few of us fought against such institutionalized
priggishness, but in the end we went along.
Now for the problem. What happens if we turn the situation around? Consider
the following:
extern int a[10];
int f() {
extern int a[];
int sizea = sizeof a; /* ERROR! */
.....}
This does not do violence to the block structure of C. The first declaration
of a is unequivocally visible when the second one appears. It would be only
sensible to use the composite type as the effective type of the second
declaration. But that is not what the standard says.
The relevant statement is in Section 3.1.2.6, page 26, lines 19-20: "For an
identifier with external or internal linkage declared in the same scope as
another declaration for that identifier, the type of the identifier becomes
the composite type." Note that the earlier declaration must not be just
visible, but in the same scope. Lest there be any doubt, Section 3.1.2.1, page
21, line 38 says: "Two identifiers have the same scope if and only if their
scopes terminate at the same point."
That is not the behavior many of us signed up for. We wanted type information
to percolate in from outer blocks, even if it could no longer percolate out
from inner blocks. We didn't want translators to have to play stupid when they
obviously knew better.
In particular, we wanted the assurance that the old practice of "importing"
externals would not be penalized. With this style of coding, you write extern
declarations at the top of each function body to advertize what resources it
uses. That lets you move functions about with less concern that the
appropriate declaration context occurs before each function definition.
If type information does not percolate in, however, you lose some of the
advantage of new features in Standard C. Consider:
double sin(double);
int f() {
extern double sin();
The function prototype is shielded within the function body by the old-style
declaration. No argument type checking or coercion occurs. Similarly, an
incomplete type remains incomplete even if the full type information is
visible, as in the earlier example. This is hardly desirable behavior for a
language with a long-standing reputation for pragmatism.
The actual wording of the standard may not be what we all wanted, but it is by
no means a disaster. It should encourage you to go hunting for extern
declarations inside function bodies. Get rid of them, or at least move them
outside any functions. Forget any old notions about making each function
definition self-sufficient enough to be easily moved. Instead, focus on
declaring everything at the top of each source file. That's where your
#include directives belong anyway.
Whatever your earlier vision of C, remember that it is now very much a block
structured language. You can still do violence to that block structure by
writing declarations with linkage. That violence is much more contained,
however.



Percolating Storage Class Information


I'll just touch briefly on a related gaffe. It seems we interfered with
percolating storage classes in much the same way we messed up type
information. Consider the following:
static int x;
int f() {
extern int x:
{
extern int x; /* ERROR! */
We all know how to read the first extern declaration. It doesn't mean that x
has external linkage, because an earlier declaration for x is visible that has
linkage. In this case, the extern storage class keyword means "whatever you
said earlier." So the declaration refers to the same x as before, which has
internal linkage.
You'd think that the same principle applies to the second extern declaration.
So did many of us on the committee. But we are all wrong. The innermost
declaration is shielded from the outermost one in a surprising way.
Section 3.1.2.2, page 22, lines 16-19 say: "If the declaration of an
identifier for an object or a function contains the storage-class specifier
extern, the identifier has the same linkage as any visible declaration of the
identifier with file scope. If there is no visible declaration with file
scope, the identifier has external linkage."
Note that the standard talks about a visible declaration with file scope. It
does not refer to a visible declaration with linkage. That's probably what
most of us meant, but it's not what the standard says.
In the example above, no declaration with file scope is visible to the
innermost declaration. It is shielded by the intervening declaration. That
means the innermost declaration is for an x with external linkage. Since you
cannot specify both internal and external linkage for the same name in one
translation unit, the program is in error. (It is undefined behavior, so a
translator does not necessarily have to emit a diagnostic.)
Put in plain English, storage class information doesn't percolate in. Maybe it
should, but it doesn't.


Knothole Type Casts


Another request for clarification concerned floating point comparisons. It is
always a touchy business testing two floating point values for exact equality.
Why a prudent programmer would trust any such comparison is beyond me.
Nevertheless, people have the right to ask about guarantees. And the commitee
has the obligation to answer as best it can.
I won't reproduce all of the fringe cases the committee felt obliged to
examine. Instead, I will focus on a particular area where the wording of the
standard caused another of those nasty surprises. The consequences may even
spread beyond the narrow business of floating point comparisons.
One of the roots of the problem goes back to the earliest days of C. The
PDP-11, on which C was first implemented, has an optional floating point
processor, or FPP. That processor encouraged you to perform all calculations
with double values. It was in many ways easier to convert float operands to
double and do the operation than to switch the FPP to float mode and leave the
operands unconverted. It also happened to retain more precision at only a
relatively small performance penalty.
That is why C used to specify that all floating point arithmetic is performed
with double values. It is one of the reasons why float arguments are passed to
functions as type double. The C standard has eliminated the conversion
requirement for floating point arithmetic. It relaxes the argument passing
requirement in the presence of a function prototype. But it still permits a
certain latitude.
Even modern implementations sometimes prefer to do arithmetic at a greater
precision than called for by the typing rules of C. The Intel 80X87 chips
perform arithmetic internally with 80-bit representations, for either 32-bit
float or 64-bit double operands. Much the same is true of the Motorola 68881.
You can force any of these chips to discard extra precision from time to time,
but who wants to? It is difficult, wasteful, and silly to trash those extra
bits.
The issue isn't even confined to floating point values. Computers seldom have
a full complement of divide or shift instructions. An implementation often has
to perform these operations to greater precision than required by C. That lets
you get a "righter" answer, one that avoids a potential intermediate overflow,
in some integer expressions. There are even implementations that perform some
integer operations in floating point, because the FPP is so fast.
All of those considerations led to a special dispensation that was added to
the standard. Section 3.2.1.5, page 36, lines 38-39 say: "The values of
floating operands and of the results of floating expressions may be
represented in greater precision and range than that required by the type; the
types are not changed thereby." Note that integer expressions are not
included. An implementation can avoid overflow by keeping extra intermediate
bits. It cannot keep extra bits that the programmer really wants to have
scraped off.
Now let's get back to the business of comparing floating point values for
equality. With all that extra precision hanging about, guarantees of any sort
are hard to come by. Remember that the extra precision is optional. The same
expression written in two different places can be evaluated different ways.
You can write all sorts of algebraic equalities that get sabotaged by a
zealous C optimizer.
C does have a way to force a given representation for a value, however. If you
assign the value of an expression to a data object, the value must be
converted as needed to fit in that data object. Any excess bits get scraped
off. Thus,
double x, y;
x = 1.0 / 3.0;
y=x;
if (x == 1.0 / 3.0)
/* MIGHT NOT BE TRUE */;
if (x == y)
/* MUST BE TRUE */;
It is not a permissible optimization to replace either x or y in an expression
with the unconverted value that gets stored. You can use only the value after
conversion. C programmers have relied on this behavior for years. You use it
to pick the float bits out of a double intermediate value. You use it to pick
the char part out of an int. You often depend on assignment to give a known
representation for a value.
There is also another way to force a given representation. Most of us have
learned to use a type cast in place of an assignment. It saves making up some
temporary data object, and it can save a needless store and access.
Not all early C compilers let you scrape bits with a type cast. If the cast
specified a type narrower than one of the computational types, it was
"widened" to int or double. I was guilty of writing a family of compilers that
worked that way. I am quick to admit that it is not the most desirable
behavior. You generally want a type cast to behave like a knothole and scrape
off any bits not used to represent the specified type.
The committee agreed that C should have knothole casts. We did not say this as
well as we should have, however. Section 3.3.4, page 46, lines 24-26 say:
"Preceding an expression by a parenthesized type name converts the value of
the expression to the named type. This construction is called a cast. A cast
that specifies no conversion has no effect on the type or value of an
expression."
That last sentence is the killer. It raises serious questions about how good a
knothole a type cast really is. Sure, you still have to scrape bits if the
cast calls for a type different from that of its operand. But what if the type
is the same? That is indeed a cast that specifies no conversion. It must
therefore have no effect on the type or value of an expression.
Sounds innocent enough until you think about those extra bits drifting around
in intermediate calculations. Then you wonder whether an implementation is
required to scrape them off. That causes trouble for expressions like:
double x = 1.0 / 3.0;
if (x == (double)(1.0 / 3.0))
/* MIGHT NOT BE TRUE */;
If the words are to be taken literally, the (double) type cast may have no
effect. This is clearly not what was desired.
It will be hard for the committee to interpret its way out of this one.
Driving another nail in the coffin is another sentence in Section 3.2, page
35, lines 7-8: "Conversion of an operand value to a compatible type causes no
change to the value or the representation." That is practically a restatement
of the same constraint in the description of type casts.
Let me emphasize that the committee has made no official interpretation of
this issue to date. It is still possible for it to justify the interpretation
most of us want. It is a foolish implementor who would take advantage of this
latitude and supply broken knothole casts. Nevertheless, here is another place
where the wording of the standard is far from perfect.


The Thin Ice Never Ends


I can recall only one other area where the committee has found poor wording
(so far). That is in the description of function prototype parameters and how
they parse. It deals with an ambiguity that has been in C in various forms
since typedef was invented. Dennis Ritchie had troubles with this issue. It
seems that the trouble continues to haunt us to this day.
Language designers like context free languages. It's nice to be able to parse
any context irrespective of what has gone before. When C was first born, it
was a context free language. Once typedef was added, however, that went out
the window.
What typedef does is to change the syntactic category of certain names. Where
once they could name only operands in an expression (outside of declarations),
now they could name type parts as well. A C parser takes quite a different
attitude toward a type part than it does toward, say, a name declared as a
data object. A type part signals the beginning of a new declaration, in a
variety of contexts.
To parse C, you have to know what typedef names are visible in each context.
Given the block structure of the language, that is not a job you can do by
halves. A parser really has to understand the semantics of all declarations in
order to progress correctly through a translation unit.
What makes matters worse is a small assortment of ambiguities. C is a language
that encourages you to leave out obvious bits. Omit a type specifier and the
translator assumes you want type int. Omit a storage class and the translator
guesses extern or auto, depending upon the context of the declaration. That is
a latitude we all tend to enjoy when we write C code. (When was the last time
you wrote the keyword auto?)

Mix abbreviated declarations with typedef names and you're asking for trouble.
Consider:
typedef int T, U;
int f() {
extern T(U);
That last declaration can have (at least) two different meanings. It can
declare T to be a function returning (implicitly) int with a single argument
of type U. Or it can declare that U is an external data object of type T.
Which is it to be?
Dennis Ritchie established the precedent for dealing with such ambiguities.
(At the same time, he admitted that the language is spongy in this area with
his famous statement, "The ice is thin here.") Ritchie gave a simple rule. If
a name is visible as a typedef and a type part is valid, then the name should
be taken as a typedef. This is true even if an alternate interpretation is
valid. It remains true even if subsequent parsing reveals that the typedef
interpretation leads to a syntax error.
One place the ice is thin is when you want to redeclare a typedef name in an
inner block. You can make the ice thicker by avoiding abbreviations in the
declaration that gives the name a new meaning. Don't give the parser, or a
casual reader, the opportunity to misunderstand your intention.
Another place is when you write redundant parentheses within a declarator.
Every time the parser sees a left parenthesis within a declaration, it has to
make a guess. Are we about to see a chunk of stuff that is parenthesized to
get the grouping right? Or are we about to see a parameter list for a function
declaration? Sometimes you can, or must, omit the name in a declaration. That
only adds to the excitement for the parser.
Ritchie resolved this ambiguity by fiat as well. He said that empty
parentheses within a declarator should always be taken as signalling the
attribute "function returning." They are never treated as redundant
parentheses around an omitted name.
So another good rule is never to write redundant parentheses in a declarator.
You are only inviting misunderstanding if you do.
There were just a couple of places in C where Ritchie's rules had to be
invoked. When the committee added function prototypes, we introduced one or
two more places. We agreed from the start to resolve any new ambiguities the
same ways Ritchie did. The only problem is, we didn't say what we meant as
clearly as we should have.
What we said instead was in Section 3.5.4.3, page 69, lines 2-4: "In a
parameter declaration, a single typedef name in parentheses is taken to be an
abstract declarator that specifies a function with a single parameter, not as
redundant parentheses around the identifier for a declarator."
That takes care of the simplest ambiguities, but not all of them. Consider:
typedef int T, U;
int f(T (U(V)));
Don't try to parse this. It will only give you a headache. What you need to
know is that it gives X3J11 a headache as well. It is an ambiguous case that
is not resolved by the language of the standard.
This situation is far from disastrous. I know of no controversy over what the
committee intended, or of what behavior everyone else wants. We just didn't
say it right in the standard. We can't argue that the omission is a
typographical error. We are not permitted to insert "correct" wording at this
juncture. We will simply have to produce an interpretation and send it through
X3 channels for approval.


Conclusion


I don't think that any of these gaffes constitute serious damage to the C
standard. On the contrary, I am still pleasantly surprised that X3J11 got so
much right. I recite the problems here to give you a flavor of how the
interpretation phase is progressing for the C language. I also hope to show
you that the standard still seems to be in pretty good shape.
My hope and expectation is that it will remain in good shape under continued
scrutiny.







































Doctor C's Pointers(R)


Assertive Programming




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091
or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.


The header assert.h contains a macro that can be used for debugging purposes.
This macro is called assert and it is conditionally defined based on the
existence of another macro, NDEBUG. It is the programmer's responsibility to
define NDEBUG; this symbol is never defined in the header or automatically by
the compiler. If NDEBUG is not defined when assert.h is #included, the assert
macro expands to a vacuous void expression and has no affect on the program.
NDEBUG then, is key to whether or not assertions are actually inserted in the
program. (Typically, your code would contain calls to assert and you would
define NDEBUG on the command-line when you compile the module being debugged.)
Let's look at a simple example:
#include <assert.h>
#include <stdio.h>

main()
{
int value;

while (1) {
printf("Enter an integer: ");
scanf("%4d", &value);
assert (value == 20);
}
}
Since NDEBUG is not defined, assert expands into an assertion. With WATCOM's
v7 MS-DOS compiler, the expansion is:
_assert (value == 20,"value == 20","test1.c",11);
producing this:
Enter an integer: 20
Enter an integer: 19
Assertion failed: value == 20, file test1.c, line 11


Abnormal Termination


The assertion causes the program to terminate abnormally but only if the
argument to assert tests false and NDEBUG is not defined. In the preceding
example, the first iteration of the loop produced 20 == 20 which is true so
the program continued. The next loop, however, saw 19 == 20 and the _assert
function called abort after writing its assertion message containing the false
expression along with its source filename and line number, to stderr. Note
that the actual source tokens are preserved. (In a standard-conforming
implementation, this is achieved by using the stringize preprocessor operator
#.) The format of the message output is implementation-defined.
Standard C requires the assert argument to be an integral expression.
Some implementations define assert incorrectly. For example, using Turbo C v2,
assert in this example expands to the code in Listing 1
Now while this program will, in fact, compile and work, there are some subtle
problems. First, assert expands to a statement, not a void expression as
required, which prevents you from using assert in constructs like:
if (...)
assert(...);
else
...
In this case, assert expands to a statement which is then followed by the null
statement, giving two statements in the true part of the if. Consequently, the
compiler complains about a "dangling" else.
The next problem is that fprintf has a variable argument list and needs to be
called in the presence of a prototype containing an ellipsis. Borland supplies
the prototype by including stdio.h in assert.h, but one standard header is not
permitted to include another so that's a problem. (Borland breaks the same
rule by including stdlib.h to get a prototype for abort.)
If the definition of NDEBUG is omitted, the assert macro expands to nothing in
both WATCOM and Turbo C. And while this works in most cases, the macro is
supposed to expand to a void expression (such as ((void)0) ). One case where
the empty expansion won't work is:
exp1 ? assert(...) : ...
If the call to assert expands to nothing at all, a syntax error results.
However, with a void expression, it would not.
While most library headers are designed to be included only once (with
multiple inclusions having no extra effect), assert.h can be included multiple
times such that, with the appropriate #defines and #undefs of NDEBUG, you can
enable and disable assertions over various sections of code in the same source
file. (This is not always supported correctly by compilers. The #undef assert
is missing in Turbo C. This is necessary since you will be redefining assert
with a different value.)
Since assert can cause abort to be called, it's a very heavy-handed debugging
technique. Not only can you debug only one expression at a time, but when a
problem is found, the program terminates abnormally, without flushing output
buffers or performing logical program shutdown. For a discussion of trapping
the call to abort and shutting down in a controlled fashion, see my column
"exit and abort" in the June issue of The C Users Journal.
To see if your compiler's version of assert is standard-conforming, compile
the program in Listing 2. No errors should result.

Listing 1

if(!(value == 20)) {
fprintf((&_streams[2]),
"Assertion failed: %s, file %s, line %d\n",
"value == 20", "test1.c", 11);
abort ();
}


Listing 2
#define NDEBUG
#include <assert.h>

void f{int i)
{
assert(i);

i ? assert{i - 4) : assert(i + 4};

if (i > 24)
assert(i * 3);
else
assert(i * 24};
}

#undef NDEBUG
#include <assert.h>

void g(int i)
{
assert(i);

i ? assert(i - 4) : assert(i + 4);

if (i > 24)
assert(i * 3);
else
assert(i * 24);
}

























Implementer's Notebook


Using yacc Or lex Twice In One Program




Don Libes


Don Libes is a computer scientist at the National Institute of Standards and
Technology. He is also the author of Life With UNIX, published by
Prentice-Hall. His electronic mail address is libes@cme.nist.gov. He can also
be reached at NIST, Bldg. 220, Rm A-127, Gaithersburg, MD 20899.


In earlier columns, I described various pitfalls that could befall one using
static variables. In this column, I will describe how to avoid a common
pitfall by using static variables (and a couple other techniques).
The question I am going to answer is the following: What do you do when
computer generated code uses names that conflict with others already in use? A
simple answer is that you change the names in your code. Ah, but what if the
computer generated code conflicts with other computer generated code!?
The best example is one that comes up often when using yacc and lex. Both of
these programs take a specification (not written in C) and produce C code. The
generated code is then fed to the C compiler.


Fixing yacc


yacc takes a BNF file describing a grammar and actions. As output, yacc
produces a file called y.tab.c. Much of this file is specific to yacc and
doesn't change based on the user input. Listing 1 shows the beginning of a
y.tab.c file.
Now don't expect this fragment to be understandable. We are going to examine
it with other goals in mind. If you have been paying attention to the subject
of this column, what should leap out at you immediately is that none of the
variables are static. Instead, yacc attempts to avoid conflicts with
user-provided variables by prefacing its variable names with the string yy.
This is fine if you are building, say, a C compiler. All you have to do is
avoid using variables beginning with yy. However, if you want to use two
yacc-built parsers in one program, you will run into problems, since both
parsers will use variables with the same names.


Two Parsers?


A database system, for example, might need an SQL parser for user queries, and
a second parser to parse database schemas. Both parsers will need complex
parsers and each will be entirely distinct, requiring a different yacc
grammar. yacc is so flexible in other ways, it is surprising that the original
designer didn't foresee this problem! Going back to our example, we will now
fix the problem. Here are the steps we have to take:


A Solution


The #defines can be ignored since they are local to the .c source file. We
could rename all the global yy variables with a unique prefix, but since they
don't need to be globals, it's better to convert them to statics. Besides,
making them static just requires the addition of one line for each, rather
than globally searching and replacing each variable.
A good technique is to make a file called yacc.hdr (see Listing 2) containing
all the yacc globals appropriately declared. Prepend this file to the yacc
output file.
The remaining global symbols that start with yy are necessarily shared with
the user. For example yyparse is the function that the user calls to run the
parser. Obviously yyparse can't be made static or you wouldn't be able to call
it. The alternative is to rename it. For my SQL example, I renamed yyparse to
sql_parse. Following this convention, rename all the remaining variables in
the same way.
yacc uses a typedef called YYSTYPE. This typedef defines the structure of a
value returned by a yacc rule. Since this structure can be changed by the
user, it also must be renamed.
Finally, we must change the name of the yacc output file. yacc does not
directly support this renaming (such as by a command-line option) so it must
be handled separately. Renaming the output file like the variables is
reasonable. All of these changes can be isolated into a Makefile. The Makefile
fragment in Listing 3 produces all the steps that I have discussed.
A couple of lines bear explanation. The first line informs yacc that the
following actions are to be performed whenever it must convert a yacc
specification (which always ends in .y) to a real parser (which always ends in
.c).
The actions begin by running yacc to generate the normal y.tab.c. Next, a new
file is created that begins with our static declarations. If it isn't obvious,
I'm assuming the existence of several UNIX-like tools (although there are
implementations of them for most other environments). For example, grep is a
command that prints out lines matching the pattern. -v reverses the action, in
this case removing all the debugging lines left by the C preprocessor. It is
not that we don't want the debugging information, but rather that with the
addition of the static declarations, it will be completely wrong. (Fixing the
debugging lines is left as an exercise for the reader).
The output of grep is piped into sed, the stream editor. sed performs the
necessary substitutions to make the yy variables begin with whatever has been
used to preface the yacc specification file. I already mentioned YYSTYPE and
yyparse earlier. The remaining variables are yylex and yylval.yylex is called
by the parser for each new token. lex is often used to generate this. yylval
is the last value returned from a parse. Both YYSTYPE and yylval also appear
in the y.tab.h output and must be renamed there as well.
New names for the variables and files are chosen based upon the original user
input filename. If the input is called sql.y, then the output file will be
called sql.yacc.c and the parser will be invoked as sql_parse().
y.tab.h is an auxiliary file that is useful for interfacing with a scanner
such as lex. The following lines perform similar transformations on y.tab.h ,
leaving it in a file called (in this case), sql.yacc.h. Lastly, the yacc
temporaries are deleted.


Fixing lex


While lex wasn't written by the same person as yacc, the author nonetheless
made the same mistake. You cannot use lex twice in a single program. Let's
solve the problem again!
Listing 4 is a fragment of some lex output. As with yacc, most of the file is
the same from one lex output to another.
It should be evident that lex has the very same problem as yacc. While lex
promises that its external names all begin with yy or YY, this isn't
sufficient to protect it from itseft -- only the user.
Let's use the same fix used for yacc. First make a file called lex.hdr which
containes all the lex globals redeclared static. (See Listing 5.) Prepend this
file to each lex output file.
The remaining global symbols that start with yy are necessarily shared with
the user. For example yylex is the function that the user calls to run the lex
scanner. As in the yacc solution, we will rename yylex to XXXX_lex where XXXX
is the base name of your file. Following this convention, rename all the
remaining variables in the same way.
Again these transformations can be isolated into a Makefile. The Makefile
fragment in Listing 6 produces these additional steps for lex.

As with yacc, new names for the variables and files are chosen based upon the
original user input filename. If the input is called sql.l, then the output
file will be called sql.lex.c and the lex scanner will be invoked as sql_lex.


Alternatives


The lex and yacc available from the Free Software Foundation (FSF), called
flex and bison respectively, do not have the problem that the original
versions do. flex and bison can be used an arbitrary number of times in the
same program. For information on how to get software from FSF, read Dave
Fiedler's January and May '89 (V7#1,4) columns of The C Users Journal. You can
also find the same information in chapter 9 ("UNIX Underground") of my book
Life With UNIX.
Nonetheless, the FSF code is not an option for some people, partly because of
the copyleft restrictions, and partly because lex and bison aren't completely
compatible with their predecessors. In any case, you may be quite happy with
the solution I have sketched in this column.
Another solution to this problem should also be considered: it may be possible
to combine the yacc grammars to create a single parser. If you are limited by
memory, such a combined grammar can greatly reduce various functions and
tables which will be duplicated by my approach.
On the other hand, combination can make the grammar much more complex,
especially if there is any overlap in the specifications. In addition,
debugging can be much more complicated because you must also test for
interactions between the two (or more) grammars. Finally, the product of a
more complex specification will undoubtedly run slower than a specialized
parser.


Conclusion


I have shown how to use static variables to solve the naming problems
encountered in programs which use multiple instances of yacc or lex. You can
take these techniques and generalize them to fix computer-generated code from
other poorly designed programs. Finally, let this serve as a reminder: When
writing programs that generate source code, allow for the possibility that
such code can be used multiple times in a single program.

Listing 1
#define yyclearin yychar = -1
#define yyerrok yyerrflag = 0
extern int yychar;
extern short yyerrflag;
#ifndef YYMAXDEPTH
#define YYMAXDEPTH 150
#endif
#ifndef YYSTYPE
#define YYSTYPE int
#endif
YYSTYPE yylval, yyval;
# define YYERRCODE 256

short yyexca[] ={
-1, 1,
0, -1,
-2, 0,
};


Listing 2
/* yacc.hdr - definitions to make yacc globals local */
static YYSTYPE yyv[];
static int yychar;
static int yynerrs;
static short yyerrflag;
static short yyact[];
static short yychk[];
static short yydef[];
static short yyexca[];
static short yypact[];
static short yypgo[];
static short yyr1[];
static short yyr2[];
/* end of yacc.hdr */


Listing 3
.y.c:
yacc -d $(YFLAGS) $*.y
cat yacc.hdr > $*.yacc.c

grep -v "^#.*line" y.tab.c \
sed -e 's/YYSTYPE/$*_STYPE/'\
-e 's/yylex/$*_lex/'\
-e 's/yylval/$*_lval/'\
-e 's/yyparse/$*_parse/'\
>> $*.yacc.c
sed -e 's/YYSTYPE/$*_STYPE/'\
-e 's/yylval/$*_lval/'\
y.tab.h > $*.yacc.h
rm y.tab.c y.tab.h


Listing 4
int yyleng; extern char yytext[];
int yymorfg;
extern char *yysptr, yysbuf[];
int yytchar;
FILE *yyin = {stdin}, *yyout = {stdout};
extern int yylineno;
struct yysvf {
struct yywork *yystoff;
struct yysvf *yyother;
int *yystops;};
struct yysvf *yyestate;
extern struct yysvf yysvec[], *yybgin;
# define YYNEWLINE 10
yylex(){
int nstr; extern int yyprevious;


Listing 5
/* start of lex.hdr - definitions
to make lex globals local */
static int yyback();
static struct yysvf *yybgin;
static struct yywork yycrank[];
static yysvf *yyestate;
static char yyextra[];
static int *yyfnd;
static FILE *yyin;
static int yyinput();
static int yyleng;
static int yylineno;
static int yylook();
static yysvf **ylsp;
static yysvf *yylstate[];
static char yymatch[];
static int yymorfg;
static yysvf **yyolsp;
static FILE *yyout;
static int yyoutput();
static int yyprevious;
static char yysbuf[];
static char *yysptr;
static struct yysvf yysvec[];
static int yytchar;
static yywork *yytop;
static int yyunput();
static int yyvstop[];

/* end of lex.hdr */


Listing 6
.1.c:
lex $(LFLAGS) $*.1
cat lex.hdr > $*.lex.c
grep -v "^#.*line" lex.yy.c \
sed -e 's/yylex/$*_ex/'\
-e 's/yytext/$*_text/'\
-e 's/yywrap/$*_wrap/'\
>> $*.lex.c
rm lex.yy.c


















































Applying C++


Formal Specifications and Object-Oriented Design




Tsvi Bar-David


Tsvi Bar-David is president of Deerworks, a C++ and Object-Oriented design
training company, and currently a faculty member in the Software Engineering
Department at Monmouth College. He received his PhD in mathematics from the
University of California at Berkeley. Previously, he was employed at Bell Labs
in the development and delivery of UNIX, C++, and Object-Oriented courses. He
can be contacted at 411 Valentine St., Highland Park, NJ 08904 (201) 745-7458.


The last several columns have applied an object oriented design framework to
the design and implementation of a text editor. A critical phase in the design
process is the abstraction step -- specifying the behavior of a type by
listing the operations in its public interface. The abstraction step forms the
basis for the type manual page. A manual page describes the behavior of a
type, not its implementation, which, in any event, is subject to change.
Roughly speaking, the manual page for a type consists of a set of manual
pages, one for each operation in the public interface of the type. The manual
page for an operation can be modelled after the the manual page for a function
in the runtime support library for the C/C+ + compiler.
The manual page serves as a means of communication between the designer of a
type and the implementor (assuming that they are two distinct people). The
type manual page serves as the criterion against which the correctness of the
implementation is judged. As such, the manual page needs to be clear and
unambiguous. Unfortunately, manual pages (for types, or for that matter, for
functions) are usually written in English, and human language is anything but
clear and unambiguous!
Clearly what is needed is a formal, unambiguous specification (manual page) of
the behavior of a type, that rests upon a solid mathematical foundation. The
benefits of formal specification include:
The implementor has an absolute criterion against which to judge the
correctness of an implementation.
Viewing the formal specification as a set of axioms which describe and define
a type (or more generally a software system), the designer can then begin to
prove theorems about the type, which are true in all correct implementations
of the type.
These theorems constitute the visible behavior of objects of the type. The
designer can validate a design by discussing these behaviors with the
customer, not necessarily in a formal way, but rather in a language that the
customer understands. As long as the customer keeps saying that these
behavioral consequences of the specification are desirable or appropriate, the
design remains unchanged. When a customer rejects a behavior, the designer
knows that it is time to go back to the drawing board and change the
specification. This dialogue is iterated until the customer and the designer
have achieved (hopefully) convergence. Note again that the dialogue between
customer and designer takes place before any implementation decisions have
been made.
The earlier bugs are discovered in the life cycle of a product, the cheaper
they are to fix. Formal specification gives the designer and the customer the
opportunity to begin debugging at the earliest possible moment, namely the
statement of requirements on the product. In fact, every phase of the product
life cycle provides opportunities for debugging. Taking advantage of these
opportunities reduces the amount of debugging needed during a product's
maintenance phase.
The formal specification serves as the starting point for any implementation
of the type. Given such a formal description of the semantics of the type and
given a data structure to represent the state of objects of this type, it is
possible to mechanically generate a provably correct implementation of the
operations in the public interface. This sounded like the operations in the
public interface. This sounded like sheer magic the first time I heard it, but
it is true, and you can begin to study this fascinating area of software
engineering in David Gries' book The Science of Programming [1]. Thus, the
existence of a formal specification is a sine qua non for proof of type and
program correctness.
There is even the more fantastic possibility that given a formal specification
of a type, it is possible to mechanically generate both a data structure for
the type as well as provably correct algorithms for the operations! The
problem here is to generate optimal data structures and algorithms. For an
introduction to this fascinating area of computer science, consult the work of
Michael Lowry of the Kestrel Institute [2].
The formal specification is part of the project documentation. Designers and
implementors need to be familiar with formal specification, for it is their
daily bread and the way in which they communicate with themselves and each
other. Nonetheless, it is good to heavily annotate the formal specification in
English. The annotation is useful both for practitioners of formal
specification and for other members of the software community (e.g., project
managers, documentors and customers) who are not necessarily schooled in
formal methods, but who must be able to read the document. There is no harm in
this annotation, despite the inherent fuzziness of the English, because the
formal specification which the annotation accompanies is the final arbiter of
its meaning.


Class Stack


This section presents a mathematical model for formal specification in the
context of data abstraction. The model is robust enough to handle parametric
types. Inheritance will be left to a future column.
To motivate the mathematical model consider the C++ type, Stack of integers,
with the simple implementation in Listing 1.
The following characterizes each member function of Stack (indeed, any member
function of any class):
Explicit arguments (possibly none) in addition to the object against which it
is invoked (an implicit argument which is always present).
A return value of some type, possibly void (the member function is all side
effect).
The member function may change the state of the object against which it is
invoked.
Although C++ does not make syntactic provision for it, there corresponds to
each member function a precondition and a postcondition. The precondition
describes the state (value in the object) that the object must be in before
the associated member function can be safely invoked. If the precondition is
not satisfied, invoking the associated member function is meaningless. The
postcondition describes the state of the object after the invocation of the
member function. Preconditions and postconditions are generally represented as
Boolean (predicate) functions on the object state. It is convenient for the
precondition and postcondition functions to be member functions in the public
interface of a type, for then it is easy for client programmers to verify the
conditions before invoking other member functions. This is also good for the
designer and implementor, since after all, the predicate functions report back
on the state of the object, which is represented by private data members. So
implementing preconditions and postconditions as member functions means that
we do not have to violate the principle of data hiding.
In the case of Stack, the preconditions and postconditions are not member
functions. However, they are easily computed from the member functions
isempty( ), isfull( ). Table 1 summarizes the preconditions and postconditions
for Stack.
Each precondition and postcondition entry is as if invoked against the object.
So, to push an integer x onto a Stack object s, ensure that
!s.isfull() == TRUE
before performing
s.push( x);
The entry TRUE means the member function will always returns TRUE regardless
of the object's state. In other words, if the precondition is TRUE, the
associated member function may be invoked unconditionally.


Mathematical Model


Now we can discuss a mathematical model of the public interface (semantics) of
a type.
To each operation (member function) f in the public interface of a type, there
corresponds a map j, the state transition function, of the general form
j : T x X (R) T x Z
This deceptively simple-looking expression contains within it the entire
description of the behavior of , in the following manner:
We can write f as a vector-valued map ( s, r ) where
j (t,x) = ( s (t,x), r (t,x))
s : T x X (R) T
r : T x X (R)Z
s describes the state transition aspect of j, and r calculates the return
value.
T is the value set of the type whose operations we are modelling. With mild
abuse of notation, we use T to denote both the value set of the type and the
type itself. T as a value set is the set of all states (state space) that an
object of type T can assume.

The T on the left of the arrow represents the state of the object before the
invocation of f.
The T on the right of the arrow represents the state of the object after the
invocation of f.
X is the type of the explicit argument to f. Without loss of generality, we
can assume that f has precisely one explicit argument, because if f had
several, we could take X to be the Cartesian product of the types of the
arguments. X can be nil, i.e., no explicit arguments.
Z is the type of the return value. Z can be nil, i.e., no returned value.
j is in general a partial function That is, it is defined on a subset, dom j,
of T x X. This subset corresponds to the precondition for invoking f.
Similarly, the range of j, range j, is generally a subset of T x Z, and not
the whole Cartesian product. This corresponds to the postcondition on f.
Apropos, j often has the form
dom j = T› x X where T› is a subset of T.
which means that the precondition places some restriction on the state of the
object, but places no restrictions on the argument X.
In order to legally invoke f, its precondition must be true. Thus, the
programmer needs some means of computing the precondition, i.e., that a
particular tuple (t,x) is in fact in the domain of j. To this end it would be
useful if dom j were described by a predicate function pre
dom = {(t,x) in T x X : pre(t,x) is TRUE}
Assuming that the predicate function is in the public interface of T, it takes
the form
pre : T x X (R) Boolean
In (C+ +) language terms, if pre were not in the public interface of the type,
it would have to be a friend function, because pre must have access to the
private data of T objects.
It is also useful for the range of j to be described by a predicate function
range j = {(t,z) in T x Z : post (t,x) is TRUE}
for, if a sequence of operations is invoked against a single object (called
chaining), it is necessary to prove that the postcondition of the previous
operation implies the precondition of the next operation.
Again, for the reason stated above, we assume that post is in the public
interface of T with the form
post : T x X (R) Boolean
As member functions, preconditions and postconditions are inspector functions.
Their invocation does not change the state of the object; they merely return a
value (here a truth value) which tells us something about the state of the
object. The precondition for invoking a precondition or postcondition function
is the boolean constant TRUE -- that is, we can always invoke them. The
postcondition of a precondition or postcondition function is no change in the
state of the object. Thus there is no need to include the precondition and
postcondition of precondition and postcondition functions explicitly in the
public interface of a type!
From here on out, the domain and range of the state transition function j are
assumed to be computable by predicate functions, respectively pre(condition)
and post(condition). And the precondition and postcondition for each operation
in the public interface are themselves given as part of the public interface.
The discussion has yielded a useful and practical criterion that the public
interface of a type ought to satisfy:
Definition: A type is strongly closed if the precondition and postcondition of
each operation are themselves in the public interface of the type. A type is
(weakly) closed if the precondition and postcondition can be (easily) computed
in terms of the public interface of the type.
From now on, all types are assumed closed.


Notation For Specification


Stack is an example of a type which satisfies (weak) closure. In the notation
below, we have generalized things a bit. Stack is a parametric type of
T-values.
The value set of the nil type is the empty set. Its public interface is empty
as well.
Stack : (T:Type) -> Type is
face
push : T -> nil
pre
isfull.not
post
isempty.not
done

pop : nil -> T
pre
isempty.not
post
isfull.not
done

isempty : nil -> Boolean

isfull : nil -> Boolean
done
Within the pre and post clauses, the expression is implicitly invoked against
the object, against which we wish to invoke the associated operation.
So far I have described (specified) Stack[T] in the style of model-based
specification with operations, preconditions and postconditions. This is not
enough. However, by hybridizing model-based specification with algebraic
specification, we can provide a rigorous and complete description of the
(parametric) Stack type.
Let's add an equations section eqn to our specification in which we describe
the relationships between the state transition functions j corresponding to
the interface operations f. We employ the convention (which may be
superfluous) that 'f is the state transition map corresponding to f. We must
also add to the specification a section for distinguished constants const,
here a symbol for the empty stack value (see Listing 2).
All the variables in the equations section (s, x) of the specification are
implicitly quantified over the correct types and satisfy the appropriate
preconditions.


Misgivings And Exception Handling


When a colleague saw this formal specification of the (parametric) stack type
for the first time, he remarked that it left him with a feeling that something
was missing. I think what bothered him was the algebraic description of the
relationships between the state transition maps associated with the
operations. Somehow, this relationship doesn't seem adequate to completely
describe the behavior of objects of this type. He and I are accustomed to the
concreteness of a particular implementation of the stack type, i.e., the
selection of a data structure to represent object state and algorithms to
implement the member functions. However, any implementation (whether or not in
an object-oriented language) which satisfies the specification is correct.

A word about exception handling. For the sake of simplicity, this formal
framework for object-oriented specification doesn't include any provisions for
exception handling. If the precondition is satisfied and the corresponding
operation is invoked, the object is guaranteed to be in the state described by
the postcondition (and the equations). If the precondition is not satisfied
and the corresponding operation is nonetheless invoked, all bets are off: the
object is in an arbitrary state, and so is the program of which it is a part.
Which is not to say that exception handling cannot be part of a formal
specification. For details consult [3].


Applying The Mathematical Model To Stack


This material is presented to give the reader a feel for the relationship
between formal specification and implementation in an object-oriented language
by proving that the C++ implementation of Stack (of integers) presented in
Listing 1 satisfies the formal specification. The proof consists of computing
the state transition function which corresponds to each public member function
in class Stack. The computation of the state transition function depends, of
course, on the representation of stack state that has been selected. The next
step is to verify that the state transition functions satisfy the equations.
The selected implementation of stack state
...
private:
int index;
int data[dim];
};
corresponds to the Cartesian product
int x Array(int, dim)
where the first component is the state space for the index, and the second
component is the type arrays of integers of a given dimension. Following [1],
it is useful to define a function e which models the assignment of a new value
to a slot in a vector
Click Here for Equation
Listing 3 gives the state transition functions in terms of e.
The representation of the empty stack state s0 raises some interesting
questions about the stack state space. Any tuple of the form ( 0, v) for any
array v causes the isempty() predicate to return TRUE, which violates the
formal requirement that the empty stack state be unique. We can recover the
uniqueness of the empty state by constructing a new stack state space s which
is the set of equivalence classes (the factor space) on int X Array( int, dim)
with respect to the equivalence relationship
( i, v) equivalent to ( j, w)
if and only if
i = j = 0, or
i = j and v[k] = w[k] for all k
with 0 <= k < i
It turns out that all of the operations ( push, pop ... ) factor through the
equivalence relationship and thus are well-defined on the factor space s. But
I leave the proof of this to the reader. For details on equivalence
relationships and factor spaces, consult Halmos's charming book on set theory
[4]. In the factor space s, s0 (actually the projection of s0 int S ) is the
unique element that satisfies isempty( s) == TRUE
The machinery of equivalence relationships and factor spaces is quite daunting
for those who have not seen it before. Nonetheless, the above equivalence
relationship corresponds nicely to a programmer's intuitive notion of what it
means for two stack objects to have the same state. Namely, we could care less
about the values in the array v whose index are greater than or equal to the
stack index. Indeed, the equivalence relationship could serve as the
definition of comparison (the overloaded == operator) in class Stack (see
Listing 4).
A computation suffices to prove that class Stack satisfies the formal
equations. As in the formal specification, all the variables in the proof are
implicitly quantified over the correct types and satisfy the appropriate
preconditions. Here is the proof that popping followed by pushing is the same
as doing nothing!
Prove: 'push 'pop( i, v) = ( i, v)

'push 'pop( i, v) = 'push( i - 1, v, v[i - 1] )
= ( i, e( i - 1, v[i - 1], v) )
= ( i, v)

since e( j, v[j], v) = v for all j
It remains to show that pushing followed by popping does not change the state
of a stack object. The crux of this proof is the equality of two stack states
under the equivalence relationship.
Prove: 'pop 'push( i, v, x) = ( i, v, x)

'pop 'push( i, v, x) = 'pop( i + 1, e( i, x, v))
= ( i, e( i, x, v), e( i, x, v)[i])
= ( i, e( i, x, v), x)

since e( i, x, v)[i] = x

( i, e( i, x, v), x) = ( i, v, x)

because ( i, e( i, x, v)) equivalent to ( i, v)
for all i
For completeness Listing 5 gives a formal specification for the Boolean type,
modelled after Cohen, Harwood and Jackson [3].


Summary


I have presented a formal specification language for object-oriented design
using the Stack type as an example. The specification language is suitable for
describing (parametric) abstract data types, and maps reasonably well into
type implementation in C++, as well as other-object oriented languages. The
question may be raised: why invent another language? Why not extend C++ with a
couple of keywords (for example pre, post, eqn . . .) and use that as our
specification language? My answer is that would be fine for the C++ community,
but it would unnecessarily limit the community of users, which,
optimistically, could be all who wish to specify and design in the object
model, including C, eiffel, ada, and lisp programmers. Of course, the loyalist
rebuttal is that everybody should program in C++. I don't believe that (and I
program in C++), and regardless of my or anybody else's feelings on the
subject, not every one is going to program in C++. My goal is to design a RISC
(reduced instruction set) object-oriented specification language with a
minimal and orthogonal feature set. In contrast, the design goal for C++ is an
implementation language which efficiently supports (among other paradigms) the
object model.
Although I've only scratched the surface of formal specification, I hope that
you have gained a sense of the potential benefits.
References
[1] Gries, The Science of Programming, Springer-Verlag, 1981.
[2] R. Lowry, "Algorithm Theories and Design Tactics", Proceedings of the
International Conference on the Mathematics of Program Construction, LNCS 375,
Springer-Verlag, 1989.
[3] Cohen, W. T. Harwood, M. I. Jackson, The Specification of Complex Systems,
Addison-Wesley, 1986.

[4] R. Halmos, Naive Set Theory, Van Nostrand, 1963.
[5] Kleene, Mathematical Logic, Wiley, 1968.
Table 1
member function precondition postcondition
--------------------------------------------------------
push !isfull() !isempty()
pop !isempty() !isfull()
isempty TRUE no change in object state
isfull TRUE no change in object state

Listing 1
typedef int Boolean;
const Boolean TRUTH = 1;
const Boolean FALSE = 0;

const int dim = 10;

class Stack {
public:
// initialize stack to sane state
Stack() { index = 0; }

// push a int value
void push(int x} { data[index++] = x; }

// remove top value and return it
int pop() { return data[--index]; }

// returns true if empty
Boolean isempty() { return index == 0;

// returns true if full
Boolean isfull() { return index >= dim; }
private:
int index;
int data[dim];
};


Listing 2
Stack : (T:Type) -> Type is
uses
Boolean
ops
push : T-> nil
pre
isfull.not
post
isempty.not
done

pop : nil -> T
pre
isempty.not
post
isfull.not
done

isempty : nil -> Boolean


isfull : nil -> Boolean
const
s0 : Stack(T)
eqn # this is a comment
# push and pop are inverse to each other
'pop( 'push( s, x)) iseq (s, x)
'push( 'pop(s)) iseq s
# there is a unique empty stack value
'isempty( s0) iseq TRUE
'isempty( s) => s iseq s0
done


Listing 3
'isempty : int X Array( int, dim) -> Boolean
'isempty( index, v) = ( index == 0 )

'isfull : int X Array( int, dim) -> Boolean
'isfull( index, v) = ( index >= dim )

'push : int X Array( int, dim) X int -> int X Array( int, dim)
'push( index, v, x) = ( index + 1, e( index, x, v))

'pop : int X Array( int, dim) -> int X Array( int, dim) X int
'pop( index, v) = ( index - 1, v, v[index - 1])


Listing 4
typedef int Boolean;
const Boolean TRUE = 1;
const Boolean FALSE = 0;

class Stack {
public:
Boolean operator == ( Stack &s)
{
Boolean r = (index == s.index);
if( index > 0)
for( int k = 0; (r == TRUE) && (k < index); k++ )
r = r && (data[k] == s.data[k]);
return r;
}
...


Listing 5
Boolean : Type is
const
TRUE : Boolean
FALSE : Boolean
ops
not : nil -> Boolean
or : Boolean -> Boolean
and : Boolean -> Boolean
implies : Boolean -> Boolean
iseq : Boolean -> Boolean # equivalence
eqn
# idempotence of not

'not( 'not(x) ) = x
'not( TRUE) = FALSE
# associativity of or
x.or( y.or( z)) = (x.or(y) ).or(z)
# commutativity of or
x.or( y) = y.or( x)
x.or( TRUE) = TRUE
x.or( FALSE) = x
# definition of and
'and( x, y) = (x.not.or( y.not ) ).not
# implication
x.implies( y) = x.not.or( y)
x.iseq( y) = x.implies( y).and( y.implies(x) )


















































Questions & Answers


Stringizing, Replies




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


You may fax questions for Ken to (919) 493-4390. When you hear the answering
message, press the * button on your telephone. Ken also receives email at
kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP).


Q
I usually program in BASIC, and now I'm self-taught in C. However, I have two
questions to ask you:
1. I'd like to use the C program in Listing 1 to read a file which has been
created by BASIC PRINT# command.
Why did the output of rec.first_name exceed 15 characters while it was defined
as 15 bytes?
2. How to print the output to printer 2 (LPT2) instead of printer 1 (LPT1)
because in C there is only one stdprn.
Chaiyos Gosolsatit
Lewiston, NY
A
You were lucky that there was a NUL (a binary zero byte) immediately following
the rec variable. Otherwise your printout might have even looked more garbagey
(or is it garbagy?). printf doesn't know record_name's field width. The "%s"
format specifier tells printf to print characters starting at the address in
the corresponding argument and ending at the first NUL character. (By
convention NUL signifies the end of the string.)
The symbol rec.first_name evaluates to the address of the first character in
the first name field. printf() starts printing form there to the first NUL,
which by sheer accident happened to follow rec. Thus printf() printed both the
first_name and last_name.
To stop printf() without relying on a NUL, use the field width and the
precision specifiers with "%s". For "%s", the precision (the number after the
decimal point gives the maximum number of characters. Your print statements
would then read:
printf("First Name: %15.15s\n", rec.first_name);
printf("Last Name : %15.15s\n", rec.last_name);
The field width specifier can be omitted when the strings will be at least the
length of the precision value and the printout will still line up. I usually
include a width specifier so that the printout lines up even if an unexpected
NUL appears in a string. 
2. Under MS-DOS a MODE command on the DOS command line before executing your
program will set the printer to be any device (LPT1, LPT2, COM1, etc.). that
you want to use. Some compilers also define symbols, e.g. stdlst, for
alternate devices.
Q
Here is a funny concerning function pointers that I am at a loss to explain:
Suppose you define a function, e.g.
int func(void)
{
...
}
then define a function pointer with
int (*ptr)();
then point ptr to the function with
ptr = func;
Both func and ptr now presumably point to the starting address of the function
and the latter can indeed be called either with
func();
or
ptr()
However it can also be called with
(*ptr)();
but *ptr is the object pointed to by ptr, which is presumably the beginning of
the function body and not its address. How come they both work?
Peter Sington
Essex, England
A
When you call ptr(), you are using a shortcut that is now permitted by the
ANSI standard. It was not in K&R.
Under K&R, calling functions with pointers was consistent with data types. You
called the function with a pointer and the indirection operator. For example,
with a data pointer:
int i, j;
int *p;

p = &i; /* Refers to the variable p
(pointer to int) */
*p = 5; /* Refers to the object p
points at (an int) */
j = *p; /* Refers to the object p
points at (an int) */
The use of p alone represents a pointer to an int; *p represents an int.
Following along these lines for a function pointer:
int f();
int (*pf)();
pf = f; /* Refers to the variable pf */
another_func(pf); /* Passes the value
of variable pf */
another_func((*pf)()); /* Passes the
return value of the
function (an int) */
The appearance of the form of a declaration (except for the data type) in an
expression is an object of that data type. The (*pf)() form for calling the
function matches this for function pointers. Note that the parentheses are
needed due to the precedence of the operators. Without them, the expression
*pf() would attempt to apply the indirection operator to the value that a
function called pf returns.
Some compilers permitted pf(). According to the Rationale for the standard,
this construct "is unambiguous, invalidates no old code, and can be an
important shorthand. The shorthand is useful for packages that present only
one external name, which designates a structure full of pointers to objects
and functions: member functions can be called as graphics.open(file) instead
of (*graphics. open) (file)".
Accordingly, the Rationale states you can use any of the following:
(&f)(); (*f)(); (**f)(); (***f)();
pf(); (*pf)(); (**pf)(); (***pf)();
I prefer to use the (*pf) () form. This alerts the maintenance programmer to
the fact that a reference to pf will not appear in the external function
listing.
Q
I want to stuff two characters into an integer. I used the following code and
it works with Microsoft 5.0. Is there a better way?
int i;
/* Stick the first character in */
(char) i = 'A';
/* Stick the second character in */
*(&(char)i + 1) = 'B';
Larry Meyers
Raleigh, NC
A
I'm amazed that it works. The standard states that "a cast converts the value
of the expression to the named type". It also states that a "cast that
specifies an implicit conversion or no conversion has no effect on the type or
value of an expression."
The address operator can only be applied to an lvalue or a function
designator. An lvalue is something that represents a memory address where a
value can be stored; an lvalue can be the left-side of an assignment
statement. The address operator cannot be applied to an expression. For
example, &(i + 1) is illegal.
Listing 2 shows a program with the casts and the results (with an offending
error removed).
The cast of the char c should be okay, since it does not do any conversion. I
tried the program with two compilers. The Manx Aztec compiler complained that
(char) c is not an 1value and gave many more errors. The Microsoft compiler
allowed everything except for the assignment to (double) i, which it
complained was not an lvalue.
If you need to stuff the characters into an integer, it might be better to use
the memcpy function or do bit shifting:
memcpy(&i, "AB", 2);
i = 'A' << 8 'B';
Q
I was pleased to discover in your February column an explanation of the
offsetof() macro of the new ANSI standard. Unfortunately my compilers (Turbo C
and Power C) do not provide this macro and after making a dozen phone calls I
have not been able to find a copy of the draft standard. I'm wondering if you
might be able to provide me a copy of the offsetof() macro definition.
A
The offsetof operator can be defined:
#define offsetof(type,member) \
(size_t) &( ((type *) NULL) ->member)
where size_t is an ANSI standard typedef used for values that represent sizes
in memory units (e.g. bytes). You could implement size_t with:
typedef int size_t;
or
typedef long size_t;
This macro simply finds the address of a member of a structure whose base
address is 0 (the NULL address) and casts it into an integer.
Q
Is there a way under MS-DOS to link to C library functions at runtime? My
executable files are too long. Presumably each contains a copy of each library
function it calls -- printf(), scanf() and the like. How can I reduce or avoid
this replication, even at the cost of some execution speed?
Dale Wharton
Montreal, Canada
A
I have just received a copy of RTLink by Pocket Soft, Inc. (P.O. Box 821049,
Houston, TX 77282 (713) 460-5600). RTLink has two features that will help you
with large executable files. First, it supports more overlays than the
Microsoft linker does. Second, it supports runtime libraries (RTLs). These are
similar to what OS/2 offers. The RTLs contain common functions that are used
by multiple executable files.
RTLink was recommended to me by some of my associates. I haven't used it yet,
but I have a program that is approaching memory limitations with a single
level of overlays.
Q
The problem that Josh Cohen has (The C Users Journal, vol.8, no. 3), is
similar to a problem that I had recently. I had written a dozen small parsers,
and the startup code for each was nearly identical, making it natural to use
one file of code and setting things up so that the preprocessor would make the
necessary minor changes. The files were carefully named so that prog1.c would
use the headers prog1_a.h, prog1_b.h, etc.. There were corresponding headers
for prog2 and the rest. The stereotyped code was put in file progx.h.
What I wanted to do was write something like (for prog1. c).

#define PROG prog1
#include "progx.h"
which might be like
#define INCL_A(i) #i "_a.h"
#define INCL_B(i) #i "_b.h"

#include INCL_A(PROG)
#include INCL_B(PROG)
or something that would have a similar effect. Nothing works, either in Turbo
C 2.0 or Microsoft C 5.1. I've tried several variations, such as
#define STR(s) #s
#define INCL_A(i) STR(i) "_a.h"

#include INCL_A(PROG)
and even the direct
#include STR (PROG)
Nothing works. Sometimes the compiler refuses to recognize the construction,
and other times says that it can't open the file
"prog1" "_a.h"
where for some reason the strings weren't concatenated. The token-pasting
operator helps, but not enough.
What I ended up doing was, in the main source file, defining
#define INCL_A "prog1_a.h"
#define INCL_B "prog1_b.h"
and then in progx.h using
#include INCL_A
#include INCL_c
which works, even if it is more of a hassle.
I've gone through the manuals and the books, and I can't find any reason why
this shouldn't work, but it doesn't. Do you know of a way around this, or at
least, can you give me an explanation of why the compilers behave this way?
Jim Howell
Lafayette, Colorado
A
The translation of the source file takes place in a number of phases. These
are defined in the ANSI Standard section 2.1.1.2. For example, in the second
phase physical lines are transformed into logical lines. In this phase, the \
immediately followed by a new-line is eliminated. In the fourth phase, the
preprocessing directives are executed. This brings in the #include files. You
are using the:
#include DEFINE_LABEL
version of the #include file (a new ANSI feature). The DEFINE_LABEL must be
something that has been #defined. Not until the sixth phase are the adjacent
string literals concatenated. All of these phases may take place in a single
physical pass of the compiler; they are ordered in the Standard to clarify the
logical operations that should take place.
As you can see, string literal concatenation does not take place until after
the #include filename has been accessed. Thus you get the prog1_a.h filename
error. The best I can suggest is to add a few #ifdefs in your header. For
example:
#ifdef PROG1
#define INCL_A "prog1_a.h"
#define INCL_B "prog1_b.h"
#endif
#ifdef PROG2
#define INCL_A "prog2_a.h"
#define INCL_B "prog2_b.h"
#endif
When you compile, you would simply use:
cc /DPROG1
or
cc /DPROG2
to setup the necessary names. Most compilers provide a /D or -D option to
define a name on the command line.


More On Stringizing


From the readers' response on this issue, I am reminded of an old story that I
tell my classes during the introductory talk. A couple had a child who did not
speak a single word. As the child grew up, they took him to all sorts of
specialists, but to no avail. Suddenly, at the dinner table on the child's
tenth birthday, he asked, "Could you please pass the sugar?" The astonished
parents exclaimed, "You can speak! Why haven't you said anything before?" The
child replied, "Well, up to now, everything's been fine."
Responses were received via fax, email and postal service. Last month I
published the only one received before press time. This month there were too
many to print.
As I mentioned in the last column, once the solution appears, it is obvious.
For whatever reason, I had a mental model of how the token replacement worked.
The printed explanation in the standard always translated into my mental
model, even though the explanation was not in line with my model, especially
since there was no specific example. With normal stuff, it doesn't matter, at
least for the macro expressions I generally use. For example, given:
#define MAX 10
#define X(Y) Y + 1
#define Z(Q) X(Q * 2)
then
Z(MAX)

can be interpreted as either (No expansion):
X(MAX * 2)
MAX * 2 + 1
10 * 2 + 1
or as (expansion of the tokens)
X(10 * 2)
10 * 2 + 1
With the # operator, these two ways are different. Given
#define MAX 10
#define A(Y) #Y
#define B(Q) A(Q)
then B(MAX) can be interpreted as (no expansion):
A(MAX) /* Simply substitute */
#MAX /* Don't substitute, simply expand */
"MAX" /* Quote operator */
or as (expansion along the way):
A(10) /* Since Q was in parameter list
but not part of # or ##,
it is expanded */
#10
"10"
For ANSI and me, the correct model is now:
"The tokens of a macro represent either a name or a value. If it is used with
a # or ## in the replacement string, it represents the name itself (e.g.
unexpanded). Otherwise it represents a value (the expansion)."
If you have this mental model, then it all works out fine. As you'll see in
the responses, at least one compiler maker does not follow the ANSI model.
Replies on this problem were received from many people. Although there is not
room to print them in full, I thought it would be interesting to show how many
variations there were in naming the "stringifying" macro. I'm going to go out
on a limb and suggest a "standard name" for this macro, just like argc and
argv are "standard names" for the main function's parameters. Let's call it
"quote (x)".


Replies


You will probably be flooded with responses to Josh Cohen's question in the
March 1990 C Users Journal, but just in case nobody else writes:
#define quote(x) #x
#define STRING_WITH_VALUE(x) "Error message" quote(x)
James Janney
Salt Lake City, Utah
I received comparable advice from:
Mark Grand, Concord CA; Shamus McBride, Seattle, WA; Carl Paukstis, Spokane,
WA; Ian Cargill, Surrey, United Kingdom; Mary Kirtland, Arlington, VA; Mike
Higginbottom, Sturtevant, WI; and Al Williams, League City, TX.
Josh Cohen, in the March, 1990, issue of The C Users Journal, had a problem
with getting the preprocessor to put the value of a macro into a string. I had
a similar problem where I was trying to insert a constant into the width
parameter a format string. I solved it this way:
#define STR_LEN 15
#define STR(s) #s
and used these in a variable definition
char* fmt ="%" STR (STR_LEN) "s";
The contents of fmt are now "%15s", which was exactly what I wanted. This
solved the problem for Turbo C 2.0, and your code at the top of page 35 works
here, too.
Unfortunately, neither of these work with Microsoft C 5.1, which produces the
format string "%STR_LENs" and Mr. Cohen's problem, but with another layer of
indirection both can be made to work. To solve Mr. Cohen's problem, write
#define MAX 10
#define STR(s) #s
#define MSG(m) "Error message " STR (M)
and use the call
DoMsg (MSG (MAX));
The string "Error message 10:" is now passed to DoMsg, as we want it to.
My problem needed a similar change, namely
#define STR_LEN 15
#define STR(s) #s
#define FMT(s) "%" STR (s) "s"
Char* fmt =FMT (STR_LEN);
Both of these behave the same way in Turbo C. Thankfully.
This naturally brings up the problem of which is the correct behavior for a
compiler that conforms to the C standard. This seems to be a problem of
precedence, over whether the stringizing comes before the translation of STR
or after it, and from what I can tell from the standard, the MSC compiler
behaves correctly. But then the whole thing is murky. It took some
experimentation to figure these out, and I have no confidence that any of them
would port to another "conforming" compiler, or even to the next version of
these.
Jim Howell
Lafayette, Colorado



Constants Into Strings


I read with interest the letter from Josh Cohen and Stuart Downing of Dexter
Michigan, and your response. You can use the * in the specification, as:
printf ("\n Record is %*s %*s",
WFIRST_NAME, record.first_name,
WSECOND_NAME, record.last_name);
The * may also be used in place of the precision part of the conversion
specification, as:
printf ("%10.*f", precision, value);
Obviously, the width specified need not be a constant, allowing you to specify
a variable field width or precision. Note, however, that you will incur a
run-time penalty, which would not be the case if the pre-processor generated
the string you wanted.
I realize this doesn't solve the original problem, but I couldn't find any way
to do that either.
David Hansen
Chaska, Minnesota
Similar letters were received from John D. Bowman of Maryland Hts., MO and
Michael S. Alt of Rockville, MD.


Reply To The Replies


Thank you all for your comments on the * specifier. I think I put down a
version of my problem that I simplified to something that the * can now
handle.
What I really wanted to do was initialize a table of format strings. I had
this problem six years ago (would that be 6 BAC -- Before ANSI C?). The *
specifier did not exist in the compiler I was using. As I recall the
situation, there was a record on disk that looked like:
#define WFIRST_NAME 10
#define WLAST_NAME 30
#define WNUMBER 5
struct s_record
{
char first_name[WFIRST_NAME];
char last_name[WLAST_NAME];
int number;
...
};
struct s_record record;
I created an array of field information that looked something like:
struct s_field
{
char *location; /* Location of data */
char *format; /* Format for data */
};

struct s_field fields[] = {
{ (char *) &record.first_name, "%10s"},
{ (char *) &record.last_name, "%30s"},
{ (char *) &record.number, "%5d"},
...
};
The fields array could be used in a loop to retrieve and print the
information. The fact that the sizes of each field appeared in two places made
it more difficult to maintain. So it would have been quite helpful to have the
quote operator and implicit concatenation then.
I eventually wound up using a table such as:
struct s_field
{
char *location;
int size; /* Number of bytes */
int type; /* Type - STRING, INTEGER, etc. */
char format[10]; /* Format string */
};
where the types were #defined. (This compiler did not even have enumerated
types). Then in an initialization routine, I did some sprintfs to the format
field in a switch statement. This looked something like:
switch (field.type)
{

case STRING:
sprintf(field.format, "%%%ds", field.size);
break;
case INTEGER:
sprintf(field.format, '%%%dd", field.size);
break;
...
}
Note that the %% yields a single % in the output. (KP)


Mouse Hardware


You may determine a mouse's hardware hookup through interrupt 33h function
36(decimal). The mouse driver must be compatible with the Microsoft Mouse v6.0
or later, released in September of 1986. Exact information is in the Microsoft
MOUSE Programmer's Reference, pages 203-4, by Microsoft Press.
However, if you call function 36 and the mouse type is serial (CH = 2), the
IRQ number (CL = 4 or 3) tells you that the serial port is COM1 or C0M2,
respectively.
Code samples in assembly language and in C appear as Listing 3 and Listing 4.
They unrightfully assume familiarity with the IBM-PC hardware interrupt
structure.
Daniel R. Haney
Cambridge, MA

Listing 1
#include <stdio.h>

struct record
{
char first_name[15];
char last_name[15];
};

main()
{
struct record rec;
FILE *fptr;

fptr = fopen{"test.fil","rt");
fread(&rec, sizeof(rec), 1, fptr);
printf("First Name: %s\n", rec.first_name);
printf("Last Name : %s\n", rec.last_name);
fclose(fptr);
}

Sample file generated by BASIC PRINT#:
a123456789012345b23456789012345

Sample output from C:
First Name: a23456789012345b23456789012345
Last Name : b23456789012345


Listing 2
main()
{
int i = 0XFFFF;
double d = 123456789.123456789;
char c;

/* This is okay, cast does nothing */
(char) c = 'A';


printf("\n I is %x", i);

(char) i = 'A';

printf("\n I is %x", i);

*(&(char)i + 1) = 'B';

printf("\n I is %x", i);

/* Show it the "right" way */
i = 0XFFFF;

printf("\n I is %x", i);

*((char *) &i + 1) = 'B';

printf("\n I is %x", i);

/* Now try it with a double */

printf("\n d is %lf", d);

(int) d = 'C';

printf("\n d is %lf", d);

/* This causes a compiler error -- */
/* This was removed so that the listing could be
produced */

(double) i = 'A';

}

Results:

I is ffff
I is ff41
I is 4241
I is ffff
I is 42ff
d is 123456789.123457
d is 123456789.123048


Listing 3
_TEXT segment word public 'CODE'

; WHICH_MOUSE - Get driver revision, mouse type,
; and IRQ number
;
; entry: nothing
;
; exit: BH = major revision number in BCD
; BL = minor revision number in BCD
; CH = mouse type 1-5
; CL = IRQ number 0, or 2-5, or 7
;

; assembler: MASM 4.0+

Which_Mouse proc near
public Which_Mouse

mov ax, 36
int 33h
ret

Which_Mouse endp

_TEXT ends


Listing 4
/* WHICH_MOUSE - interrogates a mouse driver to determine
** hardware specifies.
**
** compiler: Microsoft C v5.10, or QuickC
*/

#include <dos.h>

void which_mouse ( void)

{
union REGS InReg, OutReg ;
unsigned char sub_species, IRQnum ;

static char *type_str[6] =
{ /* type of mouse */
"an unidentified", /* 0 */
"a bus", /* 1 */
"a serial: /* 2 */
"an InPort", /* 3 */
"a PS/2", /* 4 */
"a Hewlett-Packard" /* 5 */
} ;
static char *IO_req[8] =
{ /* IRQ number */
"a PS/2 auxiliary port." /* 0 */
"INVALID.", /* 1 */
"a Bus card interrupt.", /* 2 */
"the COM2 interrupt.", /* 3 */
"the COM1 interrupt.", /* 4 */
"the LPT2 interrupt.", /* 5 */
"INVALID.", /* 6 */
"the LPT1 interrupt." /* 7 */
};

/* interrogate the driver */
InReg.x.ax = 36 ;
int86( 0x33, &InReg, &OutReg) ;

/* report driver version */
printf ("Major_Version: %x\n", OutReg.h.bh) ;
printf ("Minor Version: %x\n", OutReg.h.bl) ;

/* range checks */

sub_species = ((OutReg.h.cl > 5) ? 0 : OutReg.h.ch) ;
IRQnum = ((OutReg.h.cl > 7) ? 1 : OutReg.h.cl) ;

printf ("Mouse type %x is %s device that uses\n",
OutReg.h.ch, type_str[sub_species]) ;

printf ("IRQ number %d:\t%s\n",
OutReg.h.cl, IO_req[IRQnum]) ;

}





















































Multitasking With The DESQview API C Library


Victor Volkman


Victor R. Volkman received a B.S. in computer science from Michigan
Technological University in 1986. His areas of interest include database
internals, compiler semantics, and graphics applications. He is currently
employed as software engineer at Cimage, Inc. of Ann Arbor, Michigan. He can
be reached at the HAL 9000 BBS, 313-663-4173, 1200/2400 baud.


Since its introduction in 1985, DESQview has become one of the most popular
multitasking environments available for the PC. It provides many of the
advanced features of OS/2 without requiring expensive hardware and software,
and without forsaking your existing PC-DOS development tools and environment.
Environments which fully support multitasking must provide an operating system
interface for the application developer. Quarterdeck Office Systems's DESQview
Application Program Interface (API) allows C programmers to access its entire
set of operating system functions for multitasking, event queues, mailboxes,
windows, dialog boxes, and more. C compilers currently supported include
Microsoft, Borland, Lattice, Metaware, and Watcom. DESQview APIs are also
available for BASIC, Pascal, and database languages.
The DESQview API is especially useful for developing networking and parallel
processing simulations. Simulating a network or parallel processing problem
helps solidify the algorithms used without tying up expensive resources during
development. One problem which I have previously investigated with parallel
processing simulators is that of approximating the roots of a function. This
problem was first suggested to me in 1986 by Prof. Dave Poplawski of the
Michigan Technological University. The roots of the function f(x) are those
values of x where f(x) = 0. The most efficient solution is to evenly partition
the possible range of the root between competing CPUs. In a simulator, each
CPU is modeled by a single task running concurrently with many others. I first
implemented my model using the Intel HyperCube simulator on a VAX-11/750 under
BSD UNIX 4.2. The ease with which the DESQview API implements the parallel
root finder demonstrates the power of DESQview's task management and intertask
communication abilities. This article will detail the multitasking features
available in the DESQview Application Programming Interface (API) C Library
and present a small application in C.


The DESQview Environment


Before delving into the parallel root application, I will explain how DESQview
interacts with MS-DOS and your application programs. DESQview is a preemptive
multitasker which provides a fixed time-slice to every program running on your
computer. When one program uses up its time-slice, DESQview intervenes and
switches the context to the next available program waiting in line. In a
non-preemptive system such as MS-Windows, the program running in the
foreground must explicity relinquish control before others can execute. Each
program loaded by DESQview can have its own window with up to 640K available
for its MS-DOS session. These windows are mapped into EEMS or EMS 4.0 memory
or else swapped to disk when not in use. Although DESQview will run on any PC
compatible, it is most effective when teamed up with the built-in memory
management capabilities of an 80386 processor.
DESQview supports a multitasking interface in three different models: tasks,
applications, and processes. The task model allows you to run more than one
function in a program simultaneously. These tasks are also referred to as
execution threads. To start a new task, call tsk_new() with a pointer to the
function for the thread to execute. The only limit to the number of
simultaneously executing tasks is the size of your program stack, since the
stack for each child task must be allocated only from the parent's stack.
DESQview tasks share the same program space, but not the same stack space.
Since the parallel root finder requires little stack and heap space, the task
model was chosen for multitasking.
The application model is only slightly more sophisticated than the task model.
An application can consist of several tasks, each with its own screen window.
One window is designated as the foreground window and receives a larger
time-slice than the remaining windows by default. Although the windows are all
logically related to the same process, they can be physically resized and
moved as if they were separate processes. Applications are like tasks in that
they are executing within the same program space. An application could be
considered to be midway between a task and a process.
The process model allows for the greatest degree of multitasking independence.
Each process is allocated its own window with up to 640K for its MS-DOS
session. Processes have totally separate program, stack, and heap spaces.
Additionally, processes need not even be executing the same program at all,
making them ideal for client/server applications such as database servers and
print servers. This also implies that processes cannot communicate with global
variables. Fortunately, DESQview provides a mailbox facility for communicating
between tasks and processes in the system.


Critical Sections And Semaphores


Whenever programs share data in any network or multitasking system, they must
use an arbitration system to determine who is allowed to read or write the
data. Otherwise, data may be lost or corrupted when two programs attempt to
write data to the same place. In a networking scenario, record locking of data
files is often sufficient to control access. However, when two tasks want to
access the same global variable no such simple metaphor exists. The problem of
controlling access to shared program data can be effectively solved using the
semaphore control model.
Any part of a program which reads or writes shared data could potentially be a
critical section. A critical section is a part of a task which must be
completed without interruption by the scheduler. If two or more tasks are
executing their critical sections simultaneously, then a data collision is an
immediate hazard. For example, a bank teller initiates both Task #1 and Task
#2 to update my account balance, which is initially $500. Task #1 is supposed
to deposit $100 and Task #2 should withdraw $200. Each task will read the
balance, modify the balance, and then write back the balance. The two tasks
are asynchronous with respect to each other. In Figure 1a, Task #2 overwrites
the balance and causes the deposit made by Task #1 to be lost. Figure 1b shows
the correct answer when the critical sections which read and write the account
balance are respected.
A semaphore is a counter indicating the number of processes attempting to
execute a critical section. A software semaphore is like a railroad semaphore
that prevents more than one train from entering the same section of track.
DESQview provides support for semaphores through the mailbox API calls. The
mal_lock() function allows a mailbox to be used as a semaphore. The first task
that calls mal_lock() against a mailbox becomes its owner. When other tasks
call mal_lock() they are automatically suspended until the owner relinquishes
control with a mal_unlock() call. Protecting critical sections is now as
simple as surrounding them with calls to mal_lock() and mal_unlock(). The
sendv() function in MONITOR.C (Listing 1) is such an example.


The Monitor


A monitor is a set of functions which integrates data structures and functions
to control the synchronization of concurrent tasks. A monitor may be
implemented with semaphores, events, and messages. Just as a program is an
abstract concept for a set of functions, a monitor is an abstraction for
interprocess communication. The monitor can hide the low-level implementation
details sufficiently so the programmer can safely ignore them.
The monitor becomes the central repository for critical sections and shared
variables. Since the monitor contains all references to the data structures,
they can be reduced to static local variables. Consequently, the tasks access
the data only through the monitor and therefore will have no critical sections
of their own. The monitor is passive and executes only when called from a
task. Figure 2 shows a monitor controlling access from N processes to the
shared data.
Hwang and Briggs (1984) considered monitors such a powerful paradigm that they
presented a special Pascal-like syntax for describing them. In practice, these
constructions require awkward macros that are difficult for the novice to
debug. Instead, moving the monitor's global variables and functions into a
separate source file is sufficient to hide its implementation. The monitor for
the parallel root finder is encapsulated in MONITOR.C and MONITOR.H (see
Listing 1 and Listing 2).
Hwang and Briggs also asserted that any complex system can be decomposed into
a set of processes and monitors allowing them to communicate. This
decomposition can greatly simplify program maintenance and verification of
correctness. When data structures must be changed, it is easy to determine
whether the monitor will be affected. New processes and monitors can also be
added without upsetting the entire system.


Root Finder: Main Program


A monitor simplifies the job of the parallel root finder. The file FINDROOT.C
(Listing 3) contains just three functions: main(), task_findroot(), and f().
Table 1 lists all the DESQview API functions used in the root finder
application. The main() function initializes the API, opens a window, starts
the tasks, waits for completion, and then displays the results. The first step
of initialization is to determine if DESQview is running or not. The
api_init() call returns the current DESQview version or zero if DESQview is
not active. The api_level() call asserts the version of the API interface you
will be using. Setting api_level() ensures upward compatibility with future
DESQview versions.
After intializing DESQview, the main() function opens a new window for itself
which reports status messages on the root finding process. The win_new()
function creates a new window based on the window title, height, and width.
Next, the win_move() function moves the window into place. The win_logattr()
and win_attr() calls ensure the text color is readable. The win_frame() and
win_unhide() calls set flags to disable the window border and mark it as
displayable. Finally, win_top() redraws the window using all of the attributes
and flags set by the previous window calls.
Before starting the tasks, the monitor must be initialized. The open_monitor()
function first creates a mailbox to be used strictly as a semaphore, then
allocates memory for a MONITOR structure which will track messages passed
between tasks. After opening the monitor, a priming MASTER message must be
sent through the monitor. This message will be claimed by the first task to
find the root in its interval.
Once everything has been initialized, starting the tasks is easy. A call to
tsk_new() starts a new task:
task_han = tsk_new(task_findroot, stacks[i],
STACKSIZE, " ", 0, 0, 0);
The tsk_new() function returns a task handle which can be used to suspend,
resume, or abort the task. The first parameter is a pointer to a function
returning an integer. More precisely, the parameter should be a pointer to a
void function since the return value cannot be accessed. The second parameter
is a pointer to the stack for the new task. The child task's stack must be
allocated from the parent task's stack. Otherwise, the C compiler's
assumptions about the stack segment will be invalid. In FINDROOT.C, a
two-dimensional array variable called stacks is declared locally. (An
alternative would be to call the MSC function alloca() to allocate a block
directly from the stack.) The third parameter specifies the size of the stack
available to the child task. The remaining four parameters allow a window to
be automatically created when the child task starts up.
A task receives its parameters in a mail message. Unlike the C standard
library spawn() functions, the DESQview tsk_new function cannot directly pass
parameters to the spawned process. Fortunately, you can find a task's mailbox
by calling mal_of(), passing the task handle. Calling mal_write() sends a copy
of the string to the child:
sprintf(node_str, "%d", i );
mal_write(mal_of(task_han), node_str,
sizeof(node_str));
Once the tasks are up and running, the main program has nothing to do but wait
for the final answer and display the results. The MASTER task is responsible
for notifying the main program with the TERMINATE message.


Root Finder: Child Tasks



The real work of root finding takes place in task_findroot(). The only
parameter which task_findroot() needs to know is its node number assigned by
its parent. Each task computes one value for f(x) and then communicates with
its neighbors to find the interval where the root is. Before a task can begin
searching for a root, it must know its node number. The node numbers run from
zero to NUMPRC-1, where NUMPRC is the number of processes being simulated. The
parent task sends the node number to the mailbox of each child task. The child
task must first call mal_me() to learn its mailbox handle, then call
mal_read() with the mailbox handle to obtain a pointer to the message. Since
mal_write() wrote the message, the call to mal_read() returns a pointer to a
copy of the message rather than a pointer to the parent's message buffer.
After finding its node number, each task opens its own window by calling the
API function win_new(). The node number is used to size and position each
window so that they don't overlap on the screen. If there are more than four
tasks, the windows will be only half the width of the screen. Figure 3 shows a
screen dump from a program run with four tasks.
When the child task's window is set up, it enters the main DO loop and
computes its f(x) value. Each task is assigned an interval whose size is
determined by the global interval of interest divided by the number of nodes:
interv = (global_x1-global_x0)/(NUMPRC);
The variables global_x0 and global_x1 are the left and right boundaries
respectively of the global interval. The MASTER task will adjust the global
interval once during each iteration to narrow the search boundaries. Figures
4a through 4c show how the interval decreases over three iterations. Each task
can compute its f(x) value as follows when it knows the left-hand boundary,
its node number, and the size of the interval:
y1 = f(global_x0+interv* (node+1));
After computation, two asynchronous communications allow a given node to find
the function results of its neighbor node. First, all even nodes send their
values to their right neighboring node. Second, all odd nodes send their
values to their neighbor on the right. Node 0 only has a neighbor to the right
and node NUMPRC-1 only has a neighbor on the left. Figure 5 illustrates both
stages of the communications. This message passing pattern fits well with
hypercube architectures, which require communications to be between adjacent
nodes.
Finally, each child task must determine if the root lies within its assigned
interval. If f(x0) * f(x1) is negative, then f(x) must have crossed the X-axis
in that interval. In other words, the root was below the X-axis at one end of
the interval and above the X-axis at the other end. This test is unreliable if
more than one root appears in any interval. Figure 6 shows that if the line
crosses the axis an even number of times in the interval, the test fails. A
more advanced root finder would be able to handle multiple roots.
The node that discovers the root computes the new boundary conditions. Note
that all other processes must synchronize against this controller node. This
synchronization ensures a correctly placed interval for the following
iteration. If the MASTER task finds that current interval to be less than
EPSILON, then it sends the TERMINATE message instead of relinquishing the
MASTER status. These TERMINATE and MASTER messages are not addressed to a
specific node but to dummy nodes which can be received from any task. When the
main program receives the TERMINATE notification, the child tasks finish and
are deallocated.


Root Finder: Inside The Monitor


As mentioned earlier, MONITOR.C and MONITOR.H contain the monitor used for the
parallel root finder. This monitor consists only of the functions
open_monitor(), close_monitor(), sendv(), and recvl(). You must first call the
open_monitor() function to initialize the monitor's data structure.
open_monitor() opens a new mailbox, assigns the name requested by the user,
and then allocates a monitor structure.
The close_monitor() function disables the monitor by closing the mailbox,
freeing the monitor structure, and setting the monitor pointer to NULL.
The sendv() and recvl() entry points are the only functions to actually use a
semaphore. Both functions attempt to lock the semaphore handle before
accessing the variables within their critical sections. The first task to call
mal_lock() becomes the owner of the semaphore. Other tasks that attempt to
call mal_lock() before the owner calls mal_unlock() are suspended. Tasks in a
suspended state must forfeit their timeslices until the expected condition
occurs. After taking the semaphore, the sendv() entry point simply records the
message value and its destination in the monitor. Finally, sendv() releases
the semaphore.
The recvl() function blocks the calling task until a message from the
requested node is received. After obtaining the semaphore, recvl() checks
whether the message has arrived. If it has not arrived, recvl() releases the
semaphore and forfeits the rest of its timeslice via api_pause() in order to
avoid a busywait in the task awaiting a message. The recvl() entry point
continues its strategy of lock, test, and unlock as long as necessary. When
the message finally arrives, the value is copied to the address specified in
the val_p parameter. recvl() then unlocks the semaphore so the process can
begin again.


Implementation Notes


The DESQview-specific version of the parallel root finder was implemented on a
16MHz 80286 computer running MS-DOS 3.30. The program was developed and tested
in conjuction with DESQview version 2.25, but should work fine with any
version 2.01 or later. The root finder was compiled with Microsoft C 5.1 and
linked with the DESQview API C Library version 1.2.
Although the API C Library claims to be reconfigurable to any memory model, I
used only the large model (code > 64K, data > 64K) as recommended by Davis
(1989). The root finder executable is about 42K excluding stack requirements.
Unlike many other windowing and menu packages which could add up to 100K to
your application, the DESQview API typically adds less than 10K of overhead,
since most API functions are stubs that pass the parameters along to the
DESQview kernel. This stub approach is similar in concept to the Dynamic Link
Library (DLL) of OS/2, but requires that DESQview be resident in memory before
a program can call API functions.


Conclusion


With the recent introduction of multiple-CPU 80386 computers by Zenith, Compaq
and others, the development of parallel programs has jumped from the
theoretical to the practical. Since the cost of parallel computers will always
exceed the cost of single-CPU computers, simulating parallel algorithms with a
multitasking system remains an attractive development alternative.
The parallel root finder demonstrates the ease with which multitasking
programs can be developed using the DESQview API C library. The root finder
also takes advantage of the intertask communication and windowing capabilites
of the API. The DESQview API can also manage many other aspects of your
application environment including event queues, dialog boxes, keyboards, mice,
timers, and scheduling. You can be certain the DESQview API will continue to
play a role in extending the capabilities of MS-DOS programs in the 1990s.


Annotated Bibliography


Davis, Stephen R., DESQview: A Guide to Programming the DESQview Multitasking
Environment, Redwood City, CA: M&T Publishing Inc., 1989.
The definitive tutorial for those interested in writing applications to use
the full power of DESQview. Davis covers windowing, multitasking, interprocess
communication, generalized message input, memory management, critical
sections, and semaphores. Unfortunately, it does not include a complete
catalog of the function library.
DESQview API C Library, Santa Monica, CA, Quarterdeck Office Systems, 1988.
Covers the Applications Programming Interface (API) for C. A complete
alphabetical listing of all API functions, parameters, descriptions, and
return values is provided. Several DESQview-specific programs demonstrate
intertask communication, process creation, timers, menus and event handling.
Hitt, Frederick J. "Drawing Out DESQview Power", PC Tech Journal, April 1989,
pp. 47-61.
Covers the entire set of API functions in a relatively short space. Hitt
covers many of the low-level details of multitasking and how it is enhanced
with the 80386. There is also a practical list of do's and don'ts for
controlling multitasking programs. No source programs are provided.
Hwang, Kai and Briggs, Fay, Computer Architecture and Parallel Processing, New
York, NY, McGraw-Hill Inc., 1984. pp. 570-577.
A textbook that includes many dissections of parallel computer and
supercomputer architectures. It provides a thorough survey of memory, I/O,
pipelining, vector processing, array processors, multiprocessor systems, and
data flow computers. Software techniques, such as semaphores and monitors, for
effectively utilizing parallel processing are emphasized at each step.
Parker, Tim. "Software Review: DESQview 2.2, API Toolkit", Computer Language,
Vol. 6, No. 4, April 1989, pp. 115-118.
A user report divided between the DESQview 2.2 environment and the DESQview
API Toolkits for C and Pascal. Parker gives an accurate account of the major
features of DESQview 2.2 and how they work. Additionally, he presents a very
simple overview of the API functions, API Debugger, and API Panel Design Tool.
Figure 1
Task #1 and #2 updating without synchronization (Fig. 1a)
Time Account Task # 1 Task #2
Slice Balance Action Action
------------------------------------------------------------------
 0 $500 None None
 1 $500 dep_bal = balance + 100; None
 2 $500 None wdr_bal = balance - 200;
 3 $600 balance = dep_bal; None
 4 $300 None balance = wdr_bal;

Task #1 and #2 with Synchronization (Fig. 1b)
Time Account Task #1 Task #2
Slice Balance Action Action
------------------------------------------------------------------
 0 $500 None None
 1 $500 dep_bal = balance + 100; None
 2 $600 balance = dep_bal; None
 3 $600 None wdr_bal = balance - 200;
 4 $400 None balance = wdr_bal;
Figure 2 Processes and Monitors
Figure 3 Intermediate Screen Output of the Root Finder
Root finder activated with 4 subprocesses and resolution 0.000001
spawned task 0
spawned task 1
spawned task 2
spawned task 3
 Node 000
 y0 = -0.21484, yl = -0.6863
 Node 0 has been suspended until notified

 Node 001
 Node 1 waiting for MASTER status
 Node 1 has restarted by the MASTER status

 Node 002
 Node 2 suspended until notified
 Node 2 restarted by the MASTER

 Node 003
 Node 3 suspended until notified
 Node 3 restarted by the MASTER
Figure 4
Figure 5 Root Finder Intertask Communication
Figure 6 Multiple Roots in the Same Interval -- Four Tasks
Table 1 List of DESQview API functions used by root finder
Function Description
------------------------------------------------------------------
api_init() Initialize API interface
api_level() Define API revision level that application requires
api_pause() Give other tasks a chance to run
------------------------------------------------------------------
mal_close() Close a mailbox
mal_lock() Lock access to a resource
mal_me() Get handle of current task's mailbox
mal_name() Assign a global name to a mailbox
mal_new() Create a new mailbox
mal_of() Get handle of a given task's mailbox
mal_read() Wait for next message
mal_unlock() Unlock access to a resource
mal_write() Send a message by value with status zero
------------------------------------------------------------------
tsk_free() Free a task and its window
tsk_me() Get handle of current task
tsk_new() Create a new task
------------------------------------------------------------------
win_attr() Set current output attribute
win_frame() Enable/disable display of a window's frame
win_logattr() Use logical/physical attributes
win_move() Move physical window

win_new() Create a new window
win_printf() Write formatted data to a window
win_top() Make window topmost in its application
win_unhide() Mark a window as not hidden

Listing 1
/*
* monitor.c : A task monitor for Desqview API
* author: Victor R. Volkman
*/

#include <stdio.h>
#include <malloc.h>
#include "dvapi.h" /* These two files are found in */
#include "dvapi2.h" /* the Desqview API C package. */
#include "monitor.h"

#ifndef MAKEFD
#include "monitor.fd"
#endif

/* The monitor contains two mutually exclusive functions:
recvl and sendv. The recvl function is used by
subprocesses to wait for any signal. A signal can
indicate either a synchronization wait or a communication
wait. The sendv function is used by suprocesses to send
a synchronization signal or obtain exclusive write access
to the global communication array.

Semafore usage is decoded as follows:

Semafore value Usage
-------------- -----
MASTER Signal indicating that the master node
is executing the critical section.

1..NUMPRC Communication signal indicates that
valid data is waiting for a given node.

NUMPRC+node Synchronization signal returned by the
master node to each individual node.

2*NUMPRC+1 Termination signal returned when the
master node finds that no further
intervals will be used.
*/
open_monitor(sem_name, new_mon_pp)
char *sem_name;
MONITOR **new_mon_pp;
{
DESQ_HAN semaphore_han;

semaphore_han = mal_new();
mal_name(semaphore_han, sem_name, strlen(sem_name));
*new_mon_pp = (MONITOR *) calloc(sizeof(MONITOR),1);
(*new_mon_pp)->sem_han = semaphore_han;
}

close_monitor(old_mon_pp)

MONITOR **old_mon_pp;
{
mal_close((*old_mon_pp)->sem_han);
free(*old_mon_pp};
*old_mon_pp = NULL;
}
sendv(mon_p,val,to_node)
MONITOR *mon_p;
double val;
int to_node;
{
DESQ_HAN semaphore_han = mon_p->sem_han;

mal_lock(semaphore_han); /* begin mutually exclusive
access */

mon_p->sent[to_node]++;
mon_p->value[to_node] = val;
mal_unlock(semaphore_han}; /* end mutually exclusive
access */
}
recvl(mon_p,val_p,from_node)
MONITOR *mon_p;
double *val_p;
int from_node;
{
DESQ_HAN semaphore_han = mon_p->sem_han;
int *from_ptr;

from_ptr = &(mon_p->sent[from_node]);

mal_lock(semaphore_han); /* begin mutually exclusive
access */
while (! *from_ptr) { /* message hasn't arrived yet */
mal_unlock(semaphore_han);
api_pause();
mal_lock(semaphore_han);
}
*from_ptr = *from_ptr - 1;
*val_p = mon_p->value[from_node];
mal_unlock(semaphore_han); /* end mutually exclusive
access */
}


Listing 2
#define TRUE 1
#define FALSE 0
#define GLOB_NAME "Common Window" /* Parameters of global window */
#define GLOB_HEIGHT 6
#define GLOB_WIDTH 80
#define EPSILON ((double) 0.0000001) /* Smallest interval of interest */

#define NUMPRC 4 /* Number of processes to simulate */
#define MASTER (2*NUMPRC) /* dummy node" claimed by master */
#define TERMINATE (2*NUMPRC+1) /* "dummy node" waiting in main() */
#define NUM_MSGS (2*NUMPRC+2) /* Total number of messages slots */
#define REQ_DESQVIEW 0x201 /* Desqview 2.01 or later required */


typedef unsigned long DESQ_HAN; /* Many DV functions use 32 bits */

typedef struct monitor {
DESQ_HAN sem_han; /* Mailbox belongs to the monitor */
int sent [NUM_MSGS]; /* Count of message pending */
double value [NUM_MSGS]; /* Actual message data */
} MONITOR; /* This is the keyword to use */


Listing 3
/*
*
* findroot.c : A parallel root finding program
* author: Victor R. Volkman
*
* Based on a problem suggested by: Professor Dave Poplawski
* Department of Computer Science
* Michigan Technological University
* Houghton, Michigan 49931
*/


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "dvapi.h"
#include "dvapi2.h"
#include "monitor.h"

#include "monitor.fd"
#ifdef MAKEFD
#include "findroot.fd"
#endif

double f();
static double global_x0, global_x1, global_y0, global_y1;
#define STACKSIZE 1000

int master[NUMPRC];
DESQ HAN glob_win_han;
static MONITOR *root_mon;


/* The main() program serves a passive role in the root finding
process. The main() program activates node-0 which assumes
temporary control over the other subprocesses. Note that
the lower and upper bounds (global_x0 and global_x1 must be
set prior to activating node-0. */

main()
{
int i;
int version;
DESQ_HAN task_han;
char stacks [NUMPRC] [STACKSIZE];
char node_str [40];
double dummy;

global_x0 = 0.0; /* lower Limit */

global_x1 = 1.0 /* Upper Limit */

version = api_init ();
if (version < REQ_DESQVIEW) {
printf("This program requires DESQView %d.%02d or later.\n",
REQ_DESQVIEW >> 8, REQ_DESQVIEW & 0xFF);
exit (1);
}
/* set minimum API function level */
api_level (REQ_DESQVIEW);

if (NUMPRC % 2 != 0) {
printf("NUMPRC is%d, please use an even number of nodes\n",NUMPRC);
exit (-1);
}

glob_win_han = win_new(GLOB_NAME,strlen(GLOB_NAME),
GLOB_HEIGHT,GLOB_WIDTH);
win_move(glob_win_han, 0, 0); /*Set screen row = 0, col = 0 */
win_logattr(glob_win_han,TRUE); /* Use logical screen attributes */
win_attr(glob_win_han,1); /* Display text in default color */
win_frame(glob_win_han,FALSE); /* Don't display a box border */
win_unhide(glob_win_han); /* Mark this window as visible */
win_top{glob_wit_han); /* Bring to top & redraw window */

win_printf(glob_win_han,

"Root finder activated with %d subprocesses and resolution %f...\n",
NUMPRC, EPSILON);
global_y0 = f(global_x0); /* Establish initial values */
global_y1 = f(global_x1);
open_monitor("rootfinder",&root_mon);
sendv(root_mon,0.0,MASTER); /* Allow MASTER signal to be claimed */

for (i= 0; i<NUMPRC; i++) {
master[i] = 0;
task_han= tsk_new(task_findroot, stacks[i], STACKSIZE, "", 0, 0, 0);
win_printf(glob_win_han,"spawned task %d\n",i);
win_top(glob_win_han);
sprintf(node_str,"%d",i);
mal_write(mal_of(task_han), node_str, sizeof(node_str));
}
recvl(root_mon,&dummy,TERMINATE);
win_printf(glob_win_han,"Lower limit X0 is %f,
f(X0) is %f\n",global_x0,global_y0);
win_printf(glob_win_han,"Upper limit X1 is %f, f(X1) is %f\n",
global_x1,global_y1);
win_printf(glob_win_han,"Master status %d, %d, %d, %d\n",
master[0],master[1],master[2],master[3]);
win printf(glob_win_han,"Press any key to continue.../n");
win_top(glob_win_han);
close_monitor(&root_mon);
getch();
}
task_findroot(void)
{
char *msg_buffer;
char title[30];
DESQ_HAN my_mal_han;

DESQ_HAN win_han;
double interv, y0, y1, dummy;
int lastnode, indx;
int msg_len;
int node;
int term;

my__mal_han = mal_me(); /* Find out and read mail from parent */
mal_read(my_mal_han,&msg_buffer,&msg_len);
sscanf(msg_buffer,"%d",&node);

sprintf(title,"Node %03d",node);
win_han = win_new(title,strlen(title),3,38+(node<4)*40);
win_move(win_han,7+(node%4)*5,1+(node>3)*40);
win_logattr(win_han,1); /* Use logical screen attributes */
win_attr(win_han,1); /* Display text in default color */
win_unhide(win_han); /* Mark this window as visible */
win_top(win_han); /* Bring to top & redraw window */

lastnode = (node==(NUMPRC-1));
do
{
interv = (global_x1-global_x0)/(NUMPRC);
y1 = f(global_x0+interv*(node+1));

/* First, all even nodes send & odd nodes receive, via recv semafore */
if (node %2 == 0}
sendv(root_mon,y1,node+1);
else /* I'm an odd node */
recvl(root_mon,&y0,node);

/* Second, all odd nodes send & even nodes receive, via recv semafore */
/* Node 0 may not receive, and the lastnode may not send */
if (!lastnode)
{
if (node %2 == 1)
sendv(root_mon,y1,node+1);
else if (node) /* I'm an even node > 0 */
recvl(root_mon,&y0, node);

else /* I'm node 0, so my left value is already computed */
y0 = global_y0;
}

win_printf(win_han,"y0 = %f, y1 = %f\n",y0,y1); win_top(win_han);
/* Determine which process will be the controller */
if ((y1*y0) < 0.0)
{
/* MASTER status must be inherited from the ex-controller */
win_fprintf(win_han,"Node %d waiting for MASTER status\n",node);
win_top(win_han);
recvl(root_mon,&dummy,MASTER);
master[node]++;
win_printf(win_han,"Node %d has received MASTER status\n",node);
win_top(win_han);

/* compute new boundary conditions */
global_x0 += interv * node;
global_y0 = y0;


if (!lastnode)
global_x1 = global_x0 + interv;
global_y1 = y1;
term = (interv < EPSILON);

/* synchronize waiting processes below */
for (indx=0; indx<NUMPRC; indx++)
if (indx!=node)
sendv(root_mon, (double)term, indx+NUMPRC);

/* Relinquish MASTER status to the next controller or TERMINATE */
if (term)
sendv(root_mon,0.0,TERMINATE);
else
sendv(root_mon,0.0,MASTER);
}
else /* synchronize */
{
win_printf(win_han,"Node %d suspended until notified \n",node);
win_top(win_han);
recvl(root_mon,&dummy,node+NUMPRC);
term = (int) dummy;
win_printf(win_han,"Node %d restarted by the MASTER \n",node);
win_top(win_han);
}
}
while (!term);

win_printf(win_han,"Leaving...\n");
tsk_free(tsk_me()); /* De-allocate DESQview resources */
}

double f(x)
float x;
{
return{x*x*x - x*x + x - 0.5);
}

























Programmer's Guide To Turbo C 2.0


Robert J. Sparks


Robert Sparks is a Ph.D. student in mathematics at Texas A & M University
where he has managed their UNIX and PC systems for several years. He may be
contacted at P.O. Box 14383, College Station, TX 77841.


Chao C. Chien's Programmer's Guide To Turbo C 2.0 is another brick in the wall
of C tutorial books. Since Kernigan and Ritchie's The C Programming Language
contains information so dense and unsuperfluous, I usually recommend that a
novice C user supplement it with a tutorial that matches his personality.
Chien's book might be the choice for those who learn by example.
The author assumes that C is not the reader's first programming language. In
fact, he suggests that the reader should have a good background in basic
computer science, two or more programming languages under his belt, and maybe
some experience with assembly language. He draws on this knowledge to describe
the structure of C.
Chapter one contains instructions for installing Turbo C and a sample
programming session. Instead of using the INSTALL program supplied with Turbo
C 2.0, Chien provides a list of commands to put the compiler on a hard disk by
brute force. He then walks the reader through the steps of editing, compiling
and running a simple C program with key-by-key instructions. Unfortunately,
this first program's source code is missing a critical space in a declaration.
This error occurs often in the first five chapters.
intn1, n2, n3;
Chien then launches into a basic description of C. He begins with data types,
explaining variable initialization and arrays. He moves on to basic I/O,
emphasizing print. Flow control follows with many examples of looping,
branching and function calls, including several that emphasize the difference
between pass-by-value and pass-by-reference. These chapters read easily and
are flawed only by typesetting errors in the examples and a short, unclear
section on the union data type.
Chien veers away from C with a chapter on top-down design aimed at BASIC
programmers. Chien presents the advantages of modular design and function
libraries and gives an example of using the Turbo C project builder. The focus
returns to C with a discussion of variable scope and extent, the preprocessor
and pointers.
After an example-filled tour of library routines (including several graphics
functions), there is a lengthy discussion of the struct data type, including
detailed examples of designing, declaring, and using structures. The book
presents the mechanics of passing a structure to a function and explains the
alternate syntax for passing structure pointers. Chien develops a linked list
and then an insertion sort using structures. The example programs are good
until the last in chapter 20, which is long, presented with very little
description, and is plagued with typesetting errors. Of the six #include
statements, only one came through with the filename intact.
File I/O is presented with examples of both sequential and direct file access.
These chapters are short, straightforward and contain a set of programs which
use window-based data entry, structs for data storage, and direct access I/O
to keep track of employee information.
Chien finishes the book with examples of interfacing to the BIOS, linking in
assembly language functions, generating inline assembly code, and interfacing
with MS-DOS.
I question whether Mr. Chien had a clear concept of his audience when he put
this book together. The level of presentation shifts between that of
introductory textbooks to that of spreadsheet or word processor cookbooks.
This fact combined with the strange placement of some of the chapters made the
book hard to follow. When the errors in the example listings are added, the
book can be downright confusing to a novice C programmer.
There are some strong points however. The book does have a large number of
simple examples. Someone who is an accomplished programmer in another language
could use these examples as a kind of toolkit for quick ports of their code to
C. But even for such a reader, this book should be backed up with Kernighan
and Ritchie's The C Programming Language.
Programmer's Guide
To Turbo C 2.0
Chao C. Chien
Bantam Computer
$22.95, 299 pages.




































User Interfaces In C ++ And Object-Oriented Programming


David D. Clark


The author has ten years programming experience and is a senior research
scientist in the advanced technology group of Miles Diagnostics, a
manufacturer of medical diagnostics instruments and reagents.


Many of you may remember the early days of the 8086 when software developers
were scrambling to get anything running on the new processor. In many cases,
the old 8080 assembly language was simply translated to 8086 assembly
language. Few of the improved 8086 features were used by the translated
programs. Reading User Interfaces in C++ and Object-Oriented Programming
reminded me of those days. The code in the book appears to be an almost
mechanical translation of regular C into C++. The code uses very few of the
useful features of C++. Although the phrase "Object-Oriented Programming"
appears in the title, it is given short shrift in the book itself.
The introduction starts with a succint definition of a user interface and why
it is important. Though the purpose of the book is to build a user interface
toolbox in C++ for the PC and compatibles, "C" could just as easily substitute
for "C++" in most of the discussions. The introduction also lists hardware and
software requirements.


Chapter Contents


The first chapter of the book is devoted to the PC display. It explains the
advantages and disadvantages of using MS-DOS services, BIOS services and
direct manipulation of display memory to handle the low-level video aspects of
a user interface. The book maintains that controlling video memory directly is
the quickest way to display and store video information. In taking this
approach, the programmer must prevent "snow" and "flicker" on the Color
Graphics Adapter (CGA). The section on avoiding flicker by using the vertical
and horizontal retrace periods is marred by an editing error: an earlier
paragraph is replicated where a description of the mechanics of retrace
detection should appear.
The next chapter covers using assembly language routines to access hardware
and begins with a discussion of interfacing assembly language routines with
C++. Short, but clear, explanations of function and variable naming, parameter
passing and return values are presented with assembly language fragments as
examples. The examples are based on Zortech C++ v1.0 so the specifics may
differ for v2.0
The majority of the chapter consists of an assembly language listing of
low-level video routines to handle saving and displaying rectangular regions
of text on the screen, drawing boxes, setting attributes, setting the video
mode and a single low-level routine to read characters from the keyboard.
Chapter three begins a series of chapters with listings of C/C++ code used to
build a user interface. Though entitled "The C++ Input/Output Functions", the
chapter contains no C++ functions. The second listing, lowlevel.cpp, contains
C functions which generally call the PC-BIOS to handle cursor movement, shape,
setting video attributes, etc. There are also a couple of functions used to
print a character, with or without an attribute. Another function prints a
string centered on a particular column. The listing is given the .cpp suffix
only because it includes the windows.hpp header file, which does contain
declarations of C++ objects.
Chapter four covers pointing devices, discussing the Microsoft mouse services
and how to access them. Appendix C contains a more complete description of the
services. A short listing implements a polling interface to the mouse driver.
A second listing generalizes this interface into a C++ class called pointer.
Though it would be easy to derive additional pointer devices like trackballs
and digitizers, the book never uses the pointer class again. The menu
functions, discussed below, use the mouse functions to access the driver
directly.
Chapter five fleshes out a C++ window class. A window's private data consists
of location information, cursor position within the window, an attribute, a
border type (single or double), and a scroll bar type. Several other private
methods do coordinate adjustments. Public methods open, close and draw the
window, set the cursor position within the window or return its position,
clear all or part of the window, draw horizontal and vertical scroll bars,
scroll the window contents, and print text within the window.
Chapter six presents three types of menu objects: popups, pulldowns and
dialogs. In my opinion, this chapter is an example of how not to create C++
objects. Though the menu objects are related and have many similarities, they
are implemented as separate, unrelated classes. A more object-oriented
approach might have derived these specialized menu classes from a more
general, parent menu class, which might have descended from the window class.
The author also seems to implement the menu classes inconsistently. The only
private data associated with the popup and dialog classes are their positions
and a flag. The selection text for the menu items is stored elsewhere and only
associated with the object at the time a menu must draw itself. This seems to
violate good data encapsulation guidelines.
Chapter seven presents data entry in plain C. The field types supported
include date, dollars, long integers, telephone numbers, Social Security
numbers and general strings.
The author seems to miss another opportunity to use features of C++. He could
have used the C++ class inheritance mechanism to build a hierarchy of data
entry field types. The string field might have been the parent class. The long
integer field could have descended from the string field. Fields for dates,
dollars, Social Security and telephone numbers could have been written as
descendants of the long integer field class. Such a class heirarchy would also
make adding new field types easier.
The final chapter presents a ledger program that exploits some of the features
of the user interface library. I am not an expert in accounting, but the
program seems quite usable. The interface functions which handle windows,
menus, and input/output appear to work quite well in this application.


Appendices


Slightly more than half of the book is devoted to appendices. The first is a
function reference for the routines presented in chapters two through seven.
It is followed by still another reference on the PC/XT/AT BIOS services. I
think other sources exist that provide more and better descriptions of the
BIOS services.
The third appendix references the services available from a Microsoft
compatible mouse driver. I found this appendix interesting, but of little use.
The descriptions were very brief, containing little explanation of why a
particular service might be useful.
The final, small appendix describes how to compile the interface functions,
put them in a library, and compile and link the sample ledger program.
Instructions are provided for Zortech and Guidelines C++. No specific version
numbers are given, but I presume the author used Zortech C++ v1.0. 


Compiling And Linking The Examples


Either the Microsoft assembler, MASM, or the Borland assembler, TASM, can be
used to assemble the low-level video access routines. TASM ran with no
problems.
The C++ code in the book will not compile and link without errors using
Zortech C++ v2.0. These problems are not the fault of the code, but are due to
the differences between C++ 1.0 and C++ 2.0. Several errors are due to missing
function prototypes and can be fixed simply by including the appropriate
header files.
I ran into problems when trying to compile the sample ledger program. It uses
the standard C library functions bsearch and qsort, which require pointers to
functions as arguments. Both functions take a pointer to a function which
takes two pointer arguments, compares the two things pointed at, and returns
an int. Both qsort and bsearch expect a pointer to a C function. Since
pointers to C functions are not type compatible with pointers to C++
functions, you must change the comparison function in the ledger program to
something like:
extern "C" {
int compare(const void *, const void *);
}
That is the portable way to declare functions with C linkage in a C++ program.
Related problems occur during linkage. You must declare the prototypes for the
assembly language functions in windows.hpp as having normal C linkage. You can
place an extern statement similar to the one above in the header file.
Finally, the assembly language functions call two functions in mouse.cpp. The
mouse functions must be declared as having standard C linkage and recompiled.
Once these changes are made, the interface library functions and example
program compile and link without error. The program then runs fine.


Conclusions


Although the book purportedly deals with object-oriented programming, it only
uses one facet of that programming metaphor: the use of objects themselves.
The book covers none of the other features of C++, such as inheritance or
polymorphism. Even though data abstraction and encapsulation are used in some
objects, they are not in others. For example, one of the menu types has its
text contained within the object while other objects have the text exist
independently from the menu object itself.
There is nothing wrong functionally with the code presented in the book. I
found only one error, in the way the right arrow key is handled by the dialog
class. While the code shows good function, it does not demonstrate
particularly good C++ style.
If you need user interface routines to handle simple text windows and menus,
consider Mr. Goodman's other book containing routines written in C. (It's even
a couple of bucks cheaper.) If you already have similar routines written in C,
this book is not going to be much of an improvement. If you are interested in
how you might use C++ and object-oriented programming to build a flexible user
interface library, wait for something else.

User Interfaces In C ++
And Object-Oriented Programming
Mark Goodwin
MIS Press 1989
$26.95 book only,
$53.95 book and disk; 394 pages.

























































Publisher's Forum
Now that the ANSI standard is complete and formally accepted cast in concrete,
warts and all the programming community is faced with an interesting and
unusual situation. In many ways, Standard C is a significantly different a new
language. Yes, Standard C is mostly backward compatible with K&R C, but it's
not just a refinement of K&R C. The stronger typing and more complex
namespace, the more carefully defined and capable pre-processor, the enhanced
types and fully specified library all represent major changes changes that
significantly impact coding practice.
Unlike other new languages, however, this new language has achieved instant
acceptance and instant application. Every serious compiler vendor has released
(even in advance of the standard's formal acceptance) Standard C compilers.
Literally hundreds of thousands of K&R programmers have suddenly become
Standard C programmers except that they haven't really.
We may be writing for a Standard C compiler, but we still aren't Standard C
programmers, because we haven't built the experiential base from which
coherent Standard C coding practice will emerge. I'm not talking about
religious issues e.g. how you indent the code. I'm talking about secular
issues i.e. how do you use the token pasting operator to build certain kinds
of symbols during the pre-processor phase (see the Reader Response portion of
our Question and Answer column in this issue). Or, how can compiler vendors
write conforming header files (see P.J. Plauger's Standard C column in this
issue). Or, how do you portably create float headers (see Tim Prince's
comments in the letters column). We are still struggling to master the
syntactic and semantic changes, and have only begun to ask "how do these
facilities fit together?"
There are no wise and all-knowing gurus to whom we may turn. While XJ311
created this language, even that august collection of programmers hasn't
applied the language long enough and broadly enough to have developed refined
answers to very many of the issues of usage. Moreover, to the dismay of many
who submitted "requests for clarification", the committee is formally
restrained from commenting on matters of coding practice (see P.J. Plauger's
Standard C column in the previous issues).
That leaves the task to us the working programmers. If we are to reap the full
benefit of Standard C, we must develop a comprehensive body of coding practice
the sooner the better.
We can greatly accelerate the process by sharing our individual discoveries.
If you find yourself musing that "there must be a way ...", send the question
to us perhaps someone else has spotted the trick. If you find some neat way to
exploit the new Standard C features, write and share it. If you realize that
certain features combine elegantly in unobvious ways write and share it.
Whatever you discover about Standard C, don't assume that it's common
knowledge. This is a new language and we're all discovering how to use it.
Your contribution, even if it's only a paragraph, is important to other
programmers struggling to domesticate this new Standard C.
Write it down. Send it to me. I'll find a place to print it.
Sincerely yours,
Robert Ward
Editor/Publisher


















































New Releases


CUG317 Group3 Image Processing


Michael P. Marking (AZ) has submitted a set of programs that manipulate
facsimile or scanner graphics images. He has implemented techniques to encode
and decode Group 3 (or 4) images and techniques to extract and insert TIFF
(Tag Image File Format) image files in C. The C source code should be fairly
portable, though it was developed with Microsoft C v5.1 under MS-DOS. The
related story appeared in the June 1990, C Users Journal.


CUG318 RED; CUG319 CPP


Edward K. Ream has placed his commercial software, the RED text editor and CPP
(C PreProcessor) in the public domain.
RED v7.0 is a full-screen text editor. Different from common text edtiors, RED
provides three modes: an edit mode, an insert mode and an overtype mode which
decreases the use of control keys, much like the vi editor. RED also provides
not only basic cursor movement and editing features, but a search that allows
wild cards, replacement, undo, and block operations. From a technical point of
view, RED contains two chief accomplishments: the screen is updated quickly,
and arbitrarily large files are handled quickly. These achievements do not
sacrifice portablity. In addition, the size of the .EXE file is 35K,
incredibly small.
CPP v5.3 is a modern C preprocessor. With one minor exception, where CPP will
complain about duplicate macro definitions, CPP conforms to the ANSI C
standard. CPP provides several command line options to include comments in the
output file, define and identifier, allow nested comments, specify search
paths for include files, and cancel the effect of the first definition of a
macro.
Both RED and CPP will compile with Microsoft C v5.0 or later and Turbo C v2.0
under MS-DOS. Make and link files have been included for both compilers. The
source code has been revised to take advantage of the latest features of the
draft ANSI standard of January, 1988. Function prototypes and other modern
features are used throughout.
The distribution disk for both RED and CPP includes complete C source code,
excellent documentation, make files, and programs with a debugging session of
Edward K. Ream's Sherlock debugging system.


UPDATES




CUG308 MSU, REMZ & LIST


Michael Kelly (MA) has updated his generic, doubly linked list program, List.
The new version (v2.01) includes features such as an unlimited number of
active lists, more compact and faster code, better error checking, and
improved documentation.


CUG297 Small Prolog


Henri de Feraudy (France) has updated his original Small Prolog. This update
was submitted in response to Lindsey Spratt's article in the March 1990, C
Users Journal. The update provides more comments in the source code and an
improved makefile.



























New Products


Industry-Related News & Announcements




JYACC Available On New Platforms


JYACC, Inc., has announced that its JAM family of application development
tools will be available on the FTX platform, Stratus Computer's newly released
UNIX operating system. Stratus Currently promotes JAM as a Stratus distributed
product on VOS-based systems.
JYACC also announced that JAM would also be marketed by Intel and Data
General. Intel is now including JAM as part of its standard development
environment on its iRMX-based System 120. A similar agreement gives Data
General the option of directly marketing JAM under Data General systems
including the AViiON platform running DG/UX, the MV platform running AOS/VS or
DG/UX and any 80286 or 386 CPU system running MS-DOS. Data General has also
agreed to market JYACC's JAM/DBi product.
Contact JYACC, Inc., 116 John St., New York, NY 10038. (212) 267-7722. Fax
(212) 608-6753.


Spawn Package Will Swap Code


Whitney Software, Inc., has released XSPAWN, a library that replaces the
standard C runtime library spawn functions with functions that can
transparently swap most of the parent process to expanded memory or disk
before executing the child process. The parent process is automatically
restored upon completion of the child process.
The XSPAWN functions expect the same arguments as their standard C
counterparts. The swap operation can be controlled through global variables.
For example, the use of expanded memory can be disabled.
XSPAWN includes all source code and libraries for Microsoft C and Borland's
Turbo C. All standard memory models are supported. Price is $79.95.
Contact Whitney Software, Inc., Box 4999, Walnut Creek, CA 94596
(415)933-9019.


Whitesmiths Cross Compilers Now Available For 684C11 And 8051 Micro
Controllers


The Whitesmiths optimizing C cross compilers and CXDB C source-level cross
debuggers are now available for the Motorola 68HC11 and Intel 8051
microprocessors. The integrated compiler/debugger tool-sets support ANSI/ISO
Standard C with Whitesmiths' chip-specific language extensions for each
microprocessor.
The cross compiler packages include full macro assemblers that provide cross
reference listings and generate relocatable code. The packages also include a
linker, a librarian, a toprom utility (for copying data from ROM to RAM), a
library orderer, an object module inspector, an absolute hex file generator,
and a hex file format translator. Complete source to the ANSI-compatible
runtime libraries is also provided.
The CXDB debugger features include a multi-window user-interface, breakpoint
and program step control, user-defined functions, C and assembler display,
data monitoring, full access to local and global symbols and simple and
aggregate types, and simulated I/O.
The debuggers are available in two fully compatible versions: a simulation
environment for executing and debugging target code on a host computer system
and an emulator version that allows target code to be executed and debugged
with a compatible in-circuit emulator.
The compilers are available on PC compatibles, Sun, Apollo, HP and VAX hosts.
The CXDB debuggers run on PC compatibles. Compiler packages begin at $1800;
the debugger at $1500. Contact the Whitesmiths Products Sales Force at (800)
356-3594 or (617) 661-0072.


Roundhill Adds CUA Features To Panel Plus Screen Package


Version 2.1 of Roundhill's PANEL Plus II screen manager and screen library
incorporates changes which enhance conformance to the Common User Access (CUA)
specification of IBM's SAA.
Key assignments for data field editing and movement between entry fields now
match the CUA recommendation, and this release adds special routines to handle
CUA function key areas, prompts and pull-down menus.
This release also adds support for MSC v6.0 and includes libraries for all MSC
memory models, dynamic link libraries for MSC v6.0 and OS/2, and an on-line
help database that works with Microsoft's Quick-Help utility. The new version
also supports Watcom and Metaware compilers with Pharlap, Eclipse and Rational
Systems MS-DOS extenders. Field editing capabilities now include automatic
conversion of currency and date formats to different national standards and
full word-wrap text editing.
PANEL Plus II is available for MS-DOS, OS/2, UNIX, Xenix, QNX and VMS. Full
source code is included. A self-porting version is also available for
unsupported C environments. Upgrades are $75. New licenses $495. Contact
Roundhill Computer Systems Ltd., 1964 Richton Dr., Wheaton IL 60187. (708)
690-3737; FAX (708) 665-9841. UK 0672 84535.


CSL Ported To SPARCstation


The C Scientific Library (CSL), including over 600 functions, is now available
on the Sun SPARCstation. The CSL library includes functions to support linear
algebra, eigensystem analysis, matrix computations, time series analysis,
smoothing, filtering and prediction, statistics, regression, linear and
integer programming, optimization, differential equations, interpolation and
curve fitting, and nonlinear equation solvers.
CSL is also available for SCO UNIX and MS-DOS. Contact Apogee Software, 35486
Woodbridge Place, Fremont, CA 94536. (415) 790-9370.


Merit Lowers Expert System Price


Merit Technology has reduced the price on PC versions of their MeriTool expert
system software to $149.95 for single Ada users and $499 for C users. MeriTool
was initially offered only as part of Merit Technology's artificial
intelligence and expert system software developed for specific programs.
MeriTool was originally designed as a backward chaining expert system, but has
since been modified to include both backward and forward chaining inference
engines. MeriTool is structured as a rule compiler and a runtime support
library.
Contact Merit Technology Inc., 5068 W. Plano Parkway, Plano, TX 75075-5009.
(214) 248-2502; FAX (214) 733-4788.



Pixelab Offers 34010 Debugger


Pixelab, Inc., has released GSPOT v1.0, a symbolic debugger for the TI 34010
Graphics System Processor (GSP). This version adds full TIGA support, floating
points support, and the ability to select what and how much information is
sent to the console after each single-step. (This last feature, when used with
output redirection, can create log files for later examination.)
GSPOT runs on a PC or compatible and can be installed as a resident program
(for use with the target program) while using another debugger for the host
processor. The user can acquire control through a hotkey, breakpoints, or TIGA
control points.
GSPOT can be configured to work with any GSP-based hardware, including TI's
SDB board. GSPOT also supports symbolic debugging, C source debugging, in-line
assembly, watchpoints, script files, a sophisticated CodeView-like user
interface and expanded memory support.
Price is $995. Contact Pixelab, Inc., 4513 Lincoln Ave., Suite 105, Lisle, IL
60532. (708)


Microtec Releases Analysis Tool


Microtec Research, Inc., has announced the availability of XRAY/DX, a
performance and code coverage analysis tool for 680x0 microprocessor
applications. In place of the more common statistical sampling techniques,
XRAY/DX uses the XRAY CPU simulator to non-intrusively gather analysis data as
programs execute.
XRAY/DX can generate ten analysis reports: call graphs, function/subroutine
execution timings, frequency of function invocation, duration of function
invocation, statements/instructions executed, and branch analysis.
XRAY/DX is available on Sun3, Sun4, Apollo, and VAX workstations. Versions for
IBM-PC compatible platforms will follow. Prices start at $1,000. Contact
Microtec Research, Inc., 2350 Mission College Blvd., Santa Clara, CA 95054.
(408) 980-1300 or (800) 950-5554.


Borland Releases Turbo C++ Professional


Borland has released Turbo C++ Professional, a new developer's package that
supports AT&T's C++ 2.0 specification and the ANSI C Standard. Turbo C++ also
includes the Programmer's Platform (a new development environment) and
Borland's VROOMM memory manager.
The new Programmer's Platform features an open architure, overlapping windows
and mouse support, a new multi-file editor, dialog boxes, an intelligent
project manager, context-sensitive hypertext help and Borland's integrated
debugger. In addition, the transfer option opens the architecture of the
programming environment allowing the integration of other tools and compilers
into the Programmer's Platform, including third party software.
Borland's VROOMM (Virtual Run-time Object-Oriented Memory Manager) is an
overlay memory manager that allows applications to use EMS, extended memory or
disk swap space to create larger executable programs.
Turbo C++ Professional includes Turbo C++, Turbo Debugger, the new Turbo
Profiler and Turbo Assembler.
Turbo C++ is available for PC and PS/2 clones with at least 640K, a hard disk
and MS-DOS 2.0 or later. Turbo C++ is $199.95; The professional version is
$299.95. Borland will continue to offer Turbo C 2.0 at a reduced price of
$99.95. Upgrades start at $79.95.
Contact Borland International, 1800 Green Hills Road, Box 660001, Scotts
Valley, CA 95066. (408) 438-8400; FAX (408) 438-8696.


Zinc Class Library For Turbo C++


Zinc Software Inc., has released a user interface class library which supports
Borland's Turbo C++. Designed specifically for C++, ZIL (Zinc Interface
Library) is a customizable user interface class library that allows developers
to create applications that run in true graphics and text mode (including
dual-monitor support) from one set of source code, without recompiling or
relinking. ZIL exploits C++'s virtual functions, class inheritance, operator
overloading and multiple inheritance.
The ZIL window manager class contains seventeen window objects, including:
border, button, date, formatted string, icon, list, matrix, number, prompt,
pop-up menu, pop-up window, pull-down menu, scrollbar, string, text, time an
title. With ZIL, programmers can build applications that automatically allow
users to mark, copy, cut and paste between window objects, as well as utilize
a full undo/redo capability in each window object editor.
ZIL requires 640K of RAM, hard disk and MS-DOS 2.1 or higher. MS-DOS 3.1 or
higher is recommended, as is a Microsoft compatible mouse. Price is $199.95.
Contact Zinc Software Inc., 405 South 100 East, Suite 201, Pleasant Grove, UT
84062. (801) 785-8900; FAX (801) 785-8996.


New Windows Maker Features MS-Windows 3.0 Compatibility 


Concurrent with the Microsoft windows 3.0 announcement, Candlelight Software
will begin shipping WindowsMaker v3.0, a compatible upgrade of their
MS-Windows application interface generator.
WindowsMaker is an interactive WYSIWYG tool that generates commented C source
code for Windows application interfaces. Existing pieces of Windows
applications can be stored for use in future applications. WindowsMaker
generates C code and the necessary Windows files (i.e., resource script,
module definition, linker definition, make definition, and executable files).
The code is fully accessible and links can be generated for direct calls to
existing C code modules. No runtime royalties are required.
Contact Candlelight Software, 2375 E. Tropicana Ave., Suite 320, Las Vegas, NV
89119. (702)456-6365; FAX (702) 434-0580.


Visual Designer Writes C++


A new graphic design tool, ObjectVision, compiles diagrams into commented,
ready-to-compile code in C++ or Turbo Pascal 5.5. With ObjectVision,
programmers develop code by "drawing" a program's objects, interface and
database connections.
Price is $399. Contact ObjectVision, Inc., (415) 540-4889.


Zortech C++ V2.1 Adds Features To Support Large Scale Programs


Zortech has released C++ v2.1 for MS-DOS and OS/2. The major, new features of
this version are designed to let C++ programmers write, compile and debug
large scale projects under MS-DOS.
This release addresses MS-DOS memory restrictions with a combination of
existing MS-DOS extender technology and Zortech's new Virtual Code Management
(VCM) technology and new versions of the C++ debugger which have minimal
memory overhead. VCM allows MS-DOS applications to contain up to 4MB of
program code and still run in real mode. The VCM system does not require any
changes to application source code (except assembly language). See page 37 in
this issue for a detailed technical description of VCM.
This release includes a version of the compiler incorporating MS-DOS extender
technology, allowing the compiler to use extended memory when compiling
applications with large and complex class hierarchies. The compiler can also
be used to develop applications using the Rational Systems MS-DOS extender,
which may be purchased separately. Such applications can exceed the 640K limit
on 80286/386/486 systems with extended memory.

The use of the same technology in the debugger allows it to relocate itself
into extended memory, allowing much larger programs to be debugged. A virtual
8086 mode debugger is provided (for 80386 systems) which requires almost no
conventional memory. A facility for remote debugging via the serial port has
been added.
Version 2.1 also features a 20 percent improvement in compilation speed, a new
program development environment with a source level browser, multi-file
editing and integrated compilation, improved code generation, refined code
optimization, and a completely revised and expanded C++ Tools.
Pricing on v2.1 is unchanged. Special upgrade pricing is available. Contact
Zortech Inc., 4-C Gill Street, Woburn, MA 01801. (800) 848-8408 or (617)
646-6703; FAX (617)643-7969.


New Release Of Statemate Supports Larger Projects


i-Logix Inc., has released Statement 3.0, a major new version of its software
for modeling and analyzing real-time reactive system specifications.
The new version supports large, multi-user development projects with a new
central database, a graphics-based database management facility, and
individualized working environments. Statemate's prototyping facility now
generates C code in addition to Ada. Other new features include: an enhanced
ability to handle hierarchical models; an improved simulation control language
that permits playback of simulation scenarios and dynamic analysis results;
and the ability to handle user-written code as part of the Statemate
simulation.
Dataport, a new module in Statemate 3.0, provides a programming interface for
extracting information from the Statemate model. It facilitates access to
Statemate from other software development tools.
Statemate 3.0 operates on several new platform, operating systems and
environments. New platforms are DECstation and Sun4. New operating systems are
ULTRIX 3.2, Domain/OS SR 10.2, and Sun OS 4. New environments are DECwindows
and X Window.
Prices begin at $5,000. Contact i-Logix, 22 Third Ave., Burlington, MA 01803.
(617) 272-8090; FAX (617) 272-8035.


New Softaid Ice Supports 64180S


Softaid has released an In-Circuit Emulator for the 64180S, the telecom
version of Hitachi's 64180. Price is $5495. Contact Softaid, Inc., 8930 Route
108, Columbia, MD 21045. (301) 964-8455 or (800) 433-8812.


Source Browser Available For UNIX, MS-DOS Systems


Computer Enterprises, Inc., is offering their source browser Sbrowse in MS-DOS
and UNIX versions for PCs, Vax, Sun3, Sun4, Sun 386i, PSARCstation, HP-9000,
UNISYS, IBM (with AIX), NCR Tower and AT&T 6386.
Sbrowse integrates with the user's editor; you can call Sbrowse from within
the editor, stack Sbrowse and editor calls, or switch back and forth between
different Sbrowse invocations.
Sbrowse answers questions about variables, preprocessor symbols, functions,
operators, aggregates, types, enumerations, classes, structs, unions, error
messages, and source files from a symbol index that it constructs the first
time it is used on the source files. It reconstructs the symbol index only if
the source file is changed.
The PC versions lists for $245. Contact Computer Enterprises, Inc., Box 8,
Port Jefferson, NY 11777. (516) 473-7500.


Lattice Offers Upgrade Pricing


Lattice, Inc. has announced that registered owners of Lattice C v6.0 may
upgrade to their new 80286 C development system for MS-DOS and OS/2 for $195.
Normal retail is $495. Owners of pre-6.0 compilers may upgrade for $275.
Contact Lattice, Inc., 2500 S. Highland Ave., Lombard, IL 60148. (708)
916-1600; FAX (708) 916-1190.


Lattice Upgrades dBC III Library


Lattice has upgraded its dBC III Plus C function Library for MS-DOS and OS/2
to version 1.1 This release includes new functions to remove logically deleted
records from a DBF file; erase data from a DBF, NDX, or DBT file without
destroying the database record structure definition; update a record without a
prolonged record lock; and position the index file to a specific key/record
number. Each function in v1.1 is re-entrant -- allowing it to be used in
multi-threaded OS/2 applications.
Other features include new documentation, additional examples, and a tutorial
program; automatic installation; and improved source code readability.
The dBC III library includes more than 45 functions which allow C programmers
to create and maintain database programs that are compatible with dBase III
Plus files. dBC III Plus is available for Lattice, Microsoft, and Borland C
compilers. All functions are provided for all MS-DOS memory models. Upgrades
are $150. New licenses are $500.
Contact Lattice, Inc., 2500 South Highland Ave., Lombard, IL 60148. (708)
916-1600; FAX (708) 916-1190.


GSS Improves X Emulators


Graphic Software Systems, Inc., (GSS) has announced upgraded versions of its X
terminal emulators, PC-Xview/16 and PC-Xview, for personal computers. The new
version 1.4 features speed increases of 250 percent. The enhancements also
include improved installation and configuration procedures, support for
additional TCP/IP packages, support for international keyboards and expanded
documentation.
Price for PC-Xview/16 v1.4 is $395; $295 for PC-Xview v1.4. Upgrades are $100.
Contact GSS, 9590 SW Gemini Dr., Beaverton, OR 97005. (503) 643-8642.








































































We Have Mail
Dear CUJ:
While running the program floath.C on various machines which perform
arithmetic on other than IEEE style, I have found a few improvements to be
made. The most serious problem occurs in the first loop, labeled in the
comment "find smallest power of 2 . . "where the relational >= should be
changed to ==. The expression to the left of == evaluates as 1 until the last
time through the loop, when it becomes 0 on most machines, including those
which have IEEE style rounding. With DEC style rounding, the expression rounds
up to 2 at the end of the loop. If the loop is continued, with either rounding
behavior, the expression evaluates to 0 until an overflow occurs. The loop as
published ran one time too often on DEC-like machines, causing the erroneous
report of a radix of 4. The change will enable the loop to work also with
interval arithmetic rounding modes.
The expression for division rounding is not reliable on a machine where
subtraction is not properly guarded. As far as I know, there are no such
machines remaining on the market. The problem can be cured by setting 
v = l - z;
and replacing l-z-z by v-z, 1-z by v, and -1+z by -v. One of the problems with
68881 machines is to avoid using so many registers that the compiler decides
to store a register to memory, resulting in a mixture of precisions and
causing some of these tests to fail.
Ecosoft's 8-bit software floating point subtraction arbitrarily sets the
result to 0 when evaluating expression such as
1 - (1 - DBL_EPSILON*.5)
even though the evaluation is performed in the correct order. Correcting this
enables the division rounding test to work as published and passes the subgrd
test. One of the four tests for addition rounding still fails, since numbers
between 1 - DBL_EPSILON*65/256 and 1 - DBL_EPSILON/4 round up to 1 when they
should round down to 1 when they should round down to 1 - DBL_ EPSILON/2. This
discrepancy, which was shared by early Apollo models, is of no practical
consequence, so one might conclude that this rounding test is too severe. The
double rounding of double arithmetic on 68881 and 8087 processors is not much
more of a problem, and I have tried to avoid it by testing long double rather
than double arithmetic.
I apologize for inconvenience which may have been caused by the lack of
claimed generality in the code, and appreciate any information which may be
turned up by running it on other systems, even museum pieces.
Sincerely yours,
Tim Prince
39 Harbor Hill Rd.
Grosse Pte Farms, MI 48236
Dear Mr. Ward,
I would like to congratulate Mr. Frost on his excellent article describing the
use of files as semaphores ("Using Files as Semaphores", CUJ, April 1990).
This is a topic which has long been neglected and I was quite pleased to see
your magazine's coverage of the topic. Having implemented a mechanism very
similar to the one described by Mr. Frost, I feel a few points require
rebuttal.
When the existence of a file is used for resource locking, there are two
primary problems, the first revolves around the performance of constantly
creating and deleting files. Although operating system performance vis-a-vis
the file system is being enhanced on many platforms, it is still one of the
slower parts of the environment. Thus, repeated file creation and deletion can
result in unacceptable performance degradation. Secondly, while operating
systems such as UNIX will automatically remove a semaphore file, other systems
(DOS for example) will not do so. Thus, on a system without autodeletion of
semaphore files, if your program terminates abnormally for any reason, any
locked resources may remain inaccessible to other processes.
When I began designing a semaphore mechanism for my employer, the following
requirements were outlined: 1) access to a fixed number of shareable resources
were to be managed independently by the mechanism (a total of 19 resources
eventually), 2) multiple, simultaneous "read" operations were to be permitted
on any resource, 3) a single "append" operation was to be permitted to occur
on a resource while "reads" were occurring, 4) a single "write" operation
could commence only after all active "append" and "read" operations were
completed for a resource. In this context, a "write" operation entailed
overwriting data which already existed in the resource. Finally, the software
being developed was to be ported from the original Xenix system, to UNIX, DOS
(3.1 +), VAX/VMS and the Macintosh.
It was obvious early in the design state that a file-based resource sharing
mechanism would be maximally portable since DOS and the Mac do not offer
shared memory or in-memory semaphores as Xenix and UNIX do. As outlined
previously, creating and deleting semaphore files was not an acceptable
solution for performance reasons. The following scheme was eventually settled
upon and is now in use on three platforms.
A single file is used and contains 3 bytes for each of the resources (3*19=57
bytes). Each byte has a special purpose for the resource. Byte 1 is the
"write" semaphore, byte 2 is the "read" semaphore and byte 3 is the "append"
semaphore. In order to gain access to a resource, a process locks these bytes.
Using file locks as the following benefit: abnormal program terminations
(power failure, control-c, kill signal,...) do not block further access to the
resource because the operating system guarantees that all locks held by the
process are removed when the file is closed (i.e. at process termination).
In order to access the "semaphore" set contained in the file, a process merely
opens the file. This is generally done as part of the program startup with the
file being closed at termination. Once access is gained to the file, the
following steps are required in order to gain access to a particular resource.


READ ACCESS


1) lock the resource's "write" byte for exclusive access
2) lock the resource's "read" byte for shared access (if the op.sys. permits
shared locks otherwise, use an exclusive lock)
3) unlock the resource's "write" byte
4) read the data from the resource
5) unlock the resource's "read" byte


APPEND ACCESS


1) lock the resource's "append" byte for exclusive access
2) lock the resource's "write" byte for exclusive access
3) unlock the resource's "write" byte
4) extend the resource by appending new data to the end
5) unlock the resource's "append" byte


WRITE ACCESS


1) lock the resource's "append" byte for exclusive access
2) lock the resource's "write" byte for exclusive access
3) lock the resource's "read" byte for exclusive access
4) write a non-zero value to the resource's "write" byte
5) write data to the resource
6) write a zero value to the resource's "write" byte
7) unlock the resource's "read" byte
8) unlock the resource's "write" byte
9) unlock the resource's "append" byte
Whenever the "write" byte is locked, a test is made whether a previous
transaction has failed before completion. If failure has occurred, the
resource in question may be corrupted and a "rollback" of the transaction is
executed. A failed transaction is indicated by the value stored in the "write"
byte being non-zero. This can only occur if step 4 was executed without the
complementary step 6 occuring. This mechanism is quite similar to the "readers
and writers" mechanism Mr. Frost describes. However, it is important to
remember that the count variable used to track the number of concurrent reads
MUST be accessible to every user of the resource. The operating system's file
locking mechanism is used for this purpose.
Russell Cook

Software Engineer
ZyLAB Corpoation
3105-T North Wilke
Arlington Heights, IL 60004
I think it's great to get this kind of exchange between programmers who have
implemented separate solutions to a similar problem. Thanks for contributing.
--rlw
Dear Editor:
First let me say thank you for providing an excellent magazine. It's the only
one that I make time to read cover to cover. I've even gone back and re-read
articles several times to squeeze all the good parts out.
I would like to recommend to anyone who is programming computers who is not
using UNIX, should try it at least once. Talk about heaven. I thought C was
wonderful coming from a background of BASIC and assembly. UNIX is to MS-DOS as
C is to BASIC.
Do you have any suggestions for good basic UNIX books that don't get stuck in
low gear? I'm having trouble finding basic information about UNIX. It appears
as though I find something new every day, and really could have used that
information yesterday.
In Sydney Weinstein's article "Games and Tongues" (vol. 8, no. 2), he
mentioned a game called Conquer. He also said it is available from archive
sites for comp.sources.games, including uunet. I would like to know how to get
this or other UNIX programs from somewhere, either disk or modem. If it is
through the modem, could you give some hints or instructions on how to do
this? I am familiar with downloading programs using MS-DOS type environments.
Thank you for all the support
Kenneth J. Suda
P.O. Box 507
Canton, GA 30114
UNIX software is usually exchanged via uucp facilities -- i.e. you must be
"attached" to USENET. You can find fundamental information about USENET in
UNIX Communications by The Waite Group. UNIX System Administration by David
Fiedler and Bruce Hunter includes quite a bit of very useful information about
how to get attached to USENET and how to configure your system.
Some books that don't get "stuck":
Advanced Programmer's Guide to UNIX System V by Thomas, Rogers and Yates.
Tricks of the UNIX Masters by Sage. UNIX for Super-Users by Foxley. The UNIX
System V Environment by Bourne. And just for fun, Writing A UNIX Device Driver
by Egan and Teixeira. --rlw
The following discrepancies have been noted in the source code for cstr. c
which appeared in The C Users Journal, February 1989, page 28-29:
1. The header file blstr. h was not included with the code.
2. The macro error() was not defined.
3. The error message when the output file can not be opened stated that "Can't
open %s for input \n". "output" should be substituted for "input".
4. String functions are included in the header file <string.h>. This file was
not included in the source code.
5. The string function strbreak does not appear to be a standard function. The
string function strpbrk which I believe serves the same purpose is a standard
function and is contained in Turbo C, Microsoft C, Lattice C at least.
6. The arguments of strcmp are character pointers. The source code for
function makeout contains *in, which contains the value of the character
pointed to by in.
7. The accompanying documentation for cstr.c does not point out how cstruse.c
is to be obtained. In usage (), I commented out the two lines of code dealing
with cstruse[], and then compiled, linked and ran the program with the text
file cstruse.str identified on the command line. Once cstruse[] was created by
cstr.exe and contained in cstruse.c, I uncommented the cstruse[] code lines in
usage (), appended cstruse.c to cstr.c, recompiled, linked and then
successfully ran the program.
Please find enclosed a copy of the revised program for cstr.c (see Listing 1).
I found the article "Scaffolding for C and C+ +", by Burk and Custer very
interesting and potentially very beneficial. I did find several discrepancies
in the source code for cstr.c. (I have not investigated the other programs as
of this date.)
Very sincerely yours,
R. Craig Olson
6306 Huntover Lane
Rockville, MD 20852
To the Editor:
In the letters section of your March 1990 (v8n3) issue Bob Barrett writes:
[....] On most (if not all) UNIX systems that use 'termcap', the database can
only be modified by the system administrator (or super user) and rightly so.
If you, yourself, are not the super user, experimental (trial and error)
modifications to terminal descriptions are impractical to say the least. With
'terminfo', the user is free to experiment with a terminal description that
only he or she will use (at least until it's fully tested)
I disagree. On most BSB based UNIX systems that I have been on, it is just as
easy to experiment with TERMCAP entries:
Normally, the following sequence (possibly in a .profile) can be used to set
up the TERMCAP environment variable:
TERM=foo
eval 'tset -sQ $TERM'
This would look up terminal type foo in /etc/TERMCAP and put the possibly
expanded (tc=...) termcap entry in the TERMCAP environment variable (tset - sQ
will return comands of the form:
export TERM TERMCAP;
TERM=foo;
TERMCAP = '--TERMCAP entry--')
Which you would then eval to set them in your current shell.
If instead you had your own termcap entry for terminal type foo, say in
$HOME/foo.TERMCAP, you can do the following:
export TERMCAP
TERMCAP=$HOME/foo.TERMCAP
eval 'tset -sQ foo'
tset(1) calls tgetent (3X) to get the terminal capabilities.
From the man page for tgetent (3X):
tgetent (bp, name)
char *bp, *name;
[. . .] tgetent begins searching for a termcap entry for name by looking in
the environment for a TERMCAP variable; if one does not exist, tgetent
searches for name in the /etc/termcap file. If TERMCAP does exist, tgetent
interprets its value as follows: if the first character is a /, the string is
interpreted as the pathname of an alternate file of termcap entries; this file
is read instead of /etc/termcap. If the first character is not a /, the string
is interpreted as a termcap entry. If name matches one of the names in the
first field, the string is stored in bp; otherwise, 0 is returned. If the
TERMCAP value refers to another terminal with the capability tc, tgetent
searches /etc/TERMCAP file; 0 if the terminal name given does not have an
entry; and 1 if all goes well.
In other words, it is even easier to experiment with termcap than it is to use
terminfo, which apparently requires compilation with tic(1M) according to Mr.
Barrett.
Sincerely,
Fuat Baran
Columbia University
712 Watson Labs
612 W 115th St.
New York, NY 10025

fuat@columbia.edu
. . . . rutgers!columbia!fuat
Of course you are correct. What you describe is the standard mechanism for
testing TERMCAP entries (described in several sources). Thanks for the extra
information.--rlw
Dear Editor:
In the May 1990 issue of The C Users Journal, Mr. Murphy asked why his BASIC
files have different float and double values from what C requires. Since I had
to deal with this problem, I thought I might be able to help.
The original Microsoft BASIC and Pascal (the versions IBM distributed) used a
floating point format refered to by Microsoft as MSBIN. This format has the
exponent in the ms 8 bits, followed by a sign bit, followed by a 23-bit
mantissa. Note that this is not 8x87 format, which is IEEE-compatible. Thus,
these languages could not take advantage of a numeric coprocessor without
conversion to/from the MSBIN format.
When C was implemented, they decided to use IEEE format. If you have a copy of
the Microsoft C compiler, there are 4 functions in the standard library:
dieeetomsbin, dmsbintoieee, fieeetomsbin and fmsbintoieee. These allow you to
convert to and from the MSBIN format.
If you don't have a copy of the Microsoft C compiler, a short routine can be
written to convert the values, such as the function read_MSBIN_float in
Listing 2.
Note that this routine has not been tested at all. In fact, I may have the
array indices the wrong way around. The best way to test the results is to run
some known values through the routine and see if you get the results you
expect. The casts in the mantissa generation to long are required to avoid
integer arithmetic overflow. It is only really required for the conversion of
the ms 7 bits, but I kept the same format for consistency.
The extension to double should be trivial. Just increase the size of the array
to 8, add 4 more terms to the mantissa generation code and change the number
23 in the normalization to 55. You also will have to find out how bit the
exponent is. I think it is 11 bits, but I'm not really sure.
The above routine is not very efficient, but should be easy to debug. It will
also work on any architecture, not just architectures that use IEEE format
floating point. If you're interested in efficiency, a direct binary conversion
can be coded. If you want to do this, you will need to look up the 8x87 format
and roll your own.
There are some range questions in the conversions, but if your numbers are not
already pushing the precision of double or float, you shouldn't have too much
problem. Note that the MSC library routines will return you an error if you
overflow/underflow the result.
If you have any questions or comments, please contact me. My phone number
(408)988-3818. My fax number is (408)727-9891.
Jim Schimandle
Primary Syncretics
473 Sapena Court, Unit #6
Santa Clara, CA 95054
Thanks for the information. I nominate these names for Ken Pugh's Dumb Name
contest. --rlw
Dear Mr. Ward:
All in all, I quite like the new format of The C Users Journal. But those
titles -- do you think we could tone them down just a tad?
When I first saw "WHA GANG AGLEY" in 96 point type, I thought not of Bobbie
Burns, but of some bizarre ac- cident at the printers.
Except for this minor indictment, I think you have an excellent magazine. I
scan many, but I READ few. The C Users Journal is always worth reading.
Patrick Conroy
Euless, TX
We could tone them down, but typesetters and graphic artists need to have fun
too. I figure if they have their fun in big type, they can't get too far out
of line without my noticing.--rlw
Dear Mr. Ward,
I have subscribed to your journal for several years and have found it very
informative and useful. Which is why I was shocked to read "Executable
Strings" by James A. Kuzdrall (May 1990).
In order to save ourselves from "learning the details of (our) C Compiler's
assembler and linker" we should have "a knowledge of machine language"? Why is
this method better than the common industry practice of writing a separate
assembly routine? Not only do I fail to see any advantage to this technique, I
see many disadvantages. Not the least of which Mr. Kuzdrall alludes to himself
when he writes: "For those who wish to believe C is inscrutable, this function
call may serve as the final proof."
It is possible to write readable C code! But, the programmer must understand
and use the rules of good programming practice. Please tell me that this
article was really an entry in the International Obfuscated C Code Contest
that got misplaced by the layout artist.
Sincerely,
William J. McMahon
106 Oceanside Drive
Scituate, MA 02066
I'm the last to suggest the "executable strings" are a better method for
anything. I just thought the story showed an interesting perversion of power.
Didn't you notice the skull and crossbones worked into the art on that page?
--rlw
Having resolved our change of address problem, I enclose my subscription
renewal. Perhaps I should add that I had nearly decided to allow the
subscription to lapse due to dissatisfaction with recent issues.
In explanation, I have noted that increasing space is being given in the
Journal to reviews of commercially available packages. My interest lies in
producing my own programs and I most often have little interest in and/or
cannot afford such products particularly when exchange is taken into account.
It was Leor Zolman's article (page 69) which swung the tide in your favour for
a further 12 months. While appreciating that your editorial policy may be in
favour of professional programmers, please also keep in mind the "roll your
own" C user.
Yours faithfully,
D. J. Omond
27 Kildare Avenue
Athelstone. SA 5076
Australia
I won't promise that we won't devote space to commercial products that look
useful to developers, but I will promise that we'll always include material
like Leor's series. (By the way the conclusion is in this issue.) Thanks for
sticking with us. Let us know how we're doing. --rlw
Dear Mr. Ward,
You guys are gettin' to me...
Last month (April 1990) on the cover it said "Obfuscated C Code Contest," much
as one might advertise "Fortified with 10 essential vitamins." But it was not
to be; we had to wait another month for "The REAL Obfuscated Code Contest".
I could handle that. But only because I knew the bad pun contest was on the
way. Alas, it was not to be.
And what's with the duplicate Publisher's Forum? Please tell us it was all a
mistake, that the old pages got mixed in at the printers.
Your readers are getting ANSI.
Sincerely,
Steven G. Isaacson
22815 Lakeview Drive #G211
Mountlake Terrace, WA 98043
It was all a mistake. But, the pages got mixed up here, not at the printers.
See last month's letters for the ugly details. The puns were in last month's
publisher's forum. --rlw

Listing 1
/*
&cstruse
cstr - Produce compilable C STRing from a text file.


Syntax:
cstr file.str

Where:
file.str
is the name of a file containing text. The filename need not end
in ".str", but it cannot end in ".c" (since the output would
destroy the input).

Descriptionn:
From a text file, cstr produces a C file containing a character
array named after the input file and initialized with its text.
If the first line of the text file is of the form:

&name size

it will be taken to indicate the name and the dimension(in bytes)
of the character array. The "size" specification is optional.
*/
/*------------------------------------------------------------------*/
/* cstr - produce compilable C STRing from a text file. */
/* */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_LINE 512 /* largest input line expected */
#define MAX_NAME 128 /* largest file name expected */

main(argc,argv)
char **argv;
{
int i;
FILE *fin, *fout;
char foutname[MAX_NAME+1];
char sname[MAX_NAME+1];

fprintf(stderr, "cstr 1.0\n");
if(argc < 2) /* if not command-line args... */
usage();
for(i = 1; i < argc; ++i) { /* for each input file... */
if( !makeout(argv[i], foutname, sname)) {
printf("%s: Input filename ('%s') must not end in '.c'\n",
argv[0],argv[i]);
exit(EXIT_FAILURE); /* defined in <stdlib.h> */
}
fin = fopen(argv[i], "r");
if(!fin) {
printf{"%s: Can't open '%s" for input\n", argv[0],argv[i]);
exit(EXIT_FAILURE);
}
fout = fopen(foutname, "w");
if(!fout) {
printf("%s: Can't open '%s' for output\n", argv[0],argv[i]);
exit(EXIT_FAILURE);
}
cstr(fin, fout, sname);
fclose(fin);

fclose(fout);
}
}
/* usage - display correct usage and exit. */

usage()
{
/* extern char cstruse[]; */

/* fputs(cstruse,stderr); */
exit(EXIT_FAILURE) /* defined in <stdlib.h> */
}

/* makeout - make output name from input name. */

static
int makeout(in, out, sname)
char *in, *out, *sname;
{
char c;
char *t = "c";
while((*sname++=*out++=c=*in++) != '\0')
if(c == '.')
if (!strcmp(*in, *t))
return(0);
else
break;
if{c != '.') { /* output has '.', even if input didn't. */
*out++ = '.';
*sname = '\0';
}
else
*--sname = '\0';
*out++ = 'c';
*out++ = '\0';
return(1);
}

/* cstr - do the actual work... */

static
int cstr(fin, fout, sname)
FILE *fin, *fout;
char *sname;
{
char line[MAX_LINE+1], *maxdimp;
int maxdim = 0, c, i;

if((c = fgetc(fin)) == '&') { /* if input specifies name and */
/* dimension... */
fgets(line, MAX_LINE, fin);
maxdimp = strpbrk(line, "\t\n");
if(*maxdimp)
*maxdimp++ = '\0';
maxdim = atoi(maxdimp);
}
else {
ungetc(c, fin);
strcpy(line, sname);

}
fprintf(fout, "char %s[", line);
if(maxdim)
fprintf(fout, "%d", maxdim);
fprintf(fout, "] =\n {");

for(i = 0; (c=fetc(fin)) !=EOF; ++i) {
if(!(i%12))
fprintf(fout,"\n ");
if(c == '\n')
fprintf(fout," '\\n',");
else if(c == '\t')
fprintf(fout," '\\t',");
else if(c == '\\')
fprintf(fout," '\\\\',");
else
fprintf(fout," '%c',",c);
}
if (!(i%12) {
fprintf(fout, "\n ");
fprintf{fout," '\\0'\n };\n");
}
else
fprintf{fout," '\\0'\n };\n");
fprintf(stderr, "Did %d chars\n", i);
}
/*------------------------------------------------------------------*/
/*------------------------------------------------------------------*/


Listing 2
float
read_MSBIN_float(FILE *fp);
/* read a 4 byte MSBIN float from file fp
and convert to internal format
*/
{
unsigned char org_data[4];
int negative;
double mantissa;
double exponent;
double return_value;

/* get data */
fread(org_data, sizeof(org_data), 1, fp);

/* get exponent form msb */
exponent = (double) org_data[0];

/* get sign bit from bit 23 */
negative = (org_data[1] & 0x80) == 0x80;

/* Generate mantissa from ls 23 bits */
mantissa = ((double) (((long) (org_data[1] & 0x7f)) << 16));
mantissa += ((double) (((long) org_data[2] << 8));
mantissa += ((double) ((long) org_data[3]));

/* normalize mantissa */
mantissa = mantissa / pow(2.0, 23.0);


/* Generate full value */
return_value = mantissa * pow(2.0, exponent);

/* adjust for sign */
if (negative) return_value = -return_value;

return ((float) return_value);
}






















































Writing Portable TIGA Code


Tom Friend


Tom is the president of ATC Graphics Company, a software developer
specializing in graphics device drivers for CAD applications and the X Window
System. Tom has been involved in the computer graphics world since the late
70s. You can reach him at (206)297-4648. email: ... sco!atcgrfx!tomf.


While TIGA was designed to be a portable grapics interface, in practice there
are always problems in porting an application from one machine to another.
I've written device drivers that allow TIGA to work with CAD applications such
as VersaCAD Design, Generic CADD, and Microstation. In these environments, the
driver code must work across a wide variety of platforms and TIGA devices.
These platforms are always MS-DOS, but could be anything from 8088-to
486-based. This article will help explain some of the techniques of portable
TIGA design. Many of the examples used in this article were taken from a
VersaCAD driver originally written in 1988.


Background


TIGA, the Texas Instruments Graphics Architecture is a software interface and
toolkit developed by TI to support their TMS34010 graphics chip. The 34010 or
GSP (Graphics System Processor) is a general-purpose, 32-bit microprocessor
that has a mix of general-purpose instructions and graphics-oriented
instructions.
TI originally developed the assembler, compiler, and debugger tools for the
GSP, and left the actual communications scheme up to the hardware vendors.
Consequently, each graphics board vendor had to invent their own library of
graphics functions and develop device drivers for every common device.
In 1988 TI released their first attempt at a standard communications scheme
and graphics library. This release (version 0.50) had bugs and other
shortcomings, but was a step in the right direction.
In spring of 1989, TI released version 1.0 of the TIGA interface. At the time
of release, I was assured that all the code our company had written to the
TIGA 0.50 standard would be upgraded with "a simple re-compile". Yeah,
right... Working with one driver package to learn TIGA all over again, it took
me four days to get light back on the screen, and another week to get the
application fully debugged and polished.
Later that summer TI again released an upgrade to TIGA, and the version number
went to 1.1 -- an incremental upgrade. Once again my firm was promised
compatibility with "a simple re-compile". This time the "simple re-compile"
only required about three hours of changes to each TIGA module.


Basic Guidelines


The most basic principle of working with device-independent graphics code is:
no magic numbers. Unlike other graphics standards that draw into a virtual
space, TIGA deals with the real device resolution. All graphics calls must be
scaled to the capabilities of the installed TIGA device.
Before you can begin using the drawing functions, you must query the TIGA
device for its capabilities, its resolution, and the number of colors
displayable (see Listing 1). Even though TIGA will clip intelligently under
certain initialization conditions, the conservative programmer will also set
clipping limits and enable clipping as a precaution.
The screen origin for TIGA is the top-left corner. Thus, if your program draws
with the bottom left corner as 0,0, you must invert the y-axis portion of all
drawing commands sent to TIGA. Listing 2 shows how to draw a polygon, while
inverting the y-axis (converting from bottom-origin to top-origin).
TIGA boards come in many flavors, and not all have what would be considered
the basic essentials. Implementing look- up tables (LUTs or palettes) is an
option, and not all boards have such capabilities. Also, you must consider how
your code will run on monochrome screens, as these are very popular in the
desktop publishing arena.
Unless you need an exact character set, it is best to use the default
character font. TIGA allows characters to be drawn in arbitrary positions
(without regard for row and column constraints) on the graphics screen. The
font information structure will give you the dimensions of the font, which you
should use to calculate the size of text windows.


Development Environment


The TIGA spec allows for three types of functions; core functions,
TIGA-extended primitives, and user-extended primitives. Core primitives are
always available to the programmer, a combination of TIGA-extended and
user-extended primitives may also be available, depending upon how the
programmer initializes the TIGA environment.
Core functions allow the application to query TIGA about its capabilities, and
to modify the environment by installing TIGA-defined extensions (with the
install_primitives() function) and user-defined extensions (with the
install_rlm() and install_alm() functions). The install_xxx functions load a
34010 object code file into the instruction RAM of the TIGA board, linking the
relocatable modules so they can communicate with the core code.


Extended Primitives?


The TIGA developer's first big design issue is often whether to use standard
TIGA-extended primitives or write extensions. Quite often your application's
memory needs will determine the answer.
In order to use the extended primitives, you must be able to install them
(with a call to install_primitives()). Installing the extended primitives
requires 70 - 100K of free RAM, since the TIGA extended primitives are in a
Relocatable Load Module (RLM) format. Linking the TIGA relative symbols
requires a gread deal of memory. Thus, you must wait until after calling
install_primitives() to malloc all the memory needed by your application.
If you cannot use the extended primitives, as in the case of a device driver,
you must create your own. For custom primitives, memory isn't an issue because
they are loaded as an Absolute Load Module -- the linking is trivial and
requires no heap space.
Custom primitives have other advantages: reduced memory usage for both the
TIGA device and the PC, and potentially enhanced performance. By writing your
own primitives, you can move the bulk of the parsing code into the ALM, along
with the drawing functions. Now the driver is smaller (less code and data),
and faster, due to the extra parallelism achieved.
You can approach this goal incrementally if you begin by writing the driver to
use the extended primitives, and when fully debugged, move as much
functionality as possible into the ALM.


Gotchas And Work-Arounds


Often it is desirable to have a realmode device interface that works with
protected-mode DOS applications. For example, the Phar Lap DOS conflicts with
the TIGA interrupt usage. Phar Lap takes over interrupts 0x71 through 0x7f and
TIGA's default interrupt is Ox7f. In order to use TIGA in this environment,
you must change the TIGA environment variable prior to running the TIGACD.EXE
communications driver that will tell TIGACD to use an alternate interupt. This
command (see Figure 1) must be in the autoexec.bat file before the TIGA
communications driver (TIGACD.EXE) is installed. Device independent code
should avoid calling palette functions from custom primitives. For example, in
one recent project, I wanted to load the LUT from the custom primitives to
conserve data segment space. Placing the LUT constants into the ALM would have
made the TSR-based driver 768 bytes smaller (256 colors * 3). Unfortunately
this proved unportable because the core primitive functions set_palet(), and
set_palet_entry() are based upon receiving red, green and blue values as
unsigned chars (range 0-255), while the equivalent graphics library functions
have device dependent values. The TI documentation calls these palette
functions heavily device dependent.
Drawing circles with the TIGA-extended primitives can present their own set of
problems, too. Many CAD and drawing packages must draw circles with patterns,
and so does MS- Windows. Unfortunately the pattern function is incompatible
between the two. In Windows, the driver must be able to draw circles with an
area pattern. That is, the circle is drawn as if there was a piece of window
screen acting as a pattern mask.
In some CAD device drivers (like VersaCAD), the circle draw function must be
able to use the line pattern. The circle drawn would appear as if drawn with a
patterned wheel. The work around involves writing your own ptn_oval()
function.
Many of you are familiar with using read and write masks to preserve bit
planes during graphics operations. If you want to protect plane 0, you would
use a write mask of 0xfe, right? Not in TIGA. The mask value could be 0x0001,
0x0101, 0x1111, or 0x5555, depending upon whether the device has 16, 8, 4, or
2 bit planes total. You can work around this deficiency with the code in
Listing 3.

Usually, the external primitives can be loaded outside of a TIGA application.
This practice allows the install_primitives() function to execute much faster,
since the code has already been installed and saves the 70+ KB link overhead.
Unfortunately there are certain TIGA boards that will always reload the TIGA
extended primitives.
The Verticom MX series are nice TIGA cards. They have a full VGA with separate
video memory, support monitors from 640 x 480 to 1024 x 768, and also support
DGIS and AI. I started a project using this card, and when I switched to
another, found out about the primitive reloading problem. Because the MX uses
a VGA chip set to provide console support, the primitives could remain loaded
at all times, masking a portability problem that showed up with a different
set of TIGA cards.
The Wyse 7190 is a 1280 x 960 monochrome TIGA card with CGA console support. I
was very anxious to see my driver run this high-res card, and immediately
tried it upon receipt of same. It wouldn't run at all, and didn't seem to
initialize. Subsequent debugging revealed that the install_primitives()
function was always attempting to reload the primitives, even when they had
been loaded prior to my driver being invoked. This posed a problem, as I had
no extra memory available to load the primitives, (remember, you need more
than 70K) and the function was returning a failure code indicating that it had
no memory available.
Further investigation revealed that the Wyse card was using the 34010 to
provide CGA emulation, and that whenever the emulation was running, all the
external primitives and the general TIGA environment are trashed. Since there
is no way to tell if this has happened, I had to take the conservative
approach that the graphics device was completely reset each time I switched to
graphics mode. This also meant that I had to re-write the driver to use a
custom set of primitives.
This card's architecture created other problems. Because the TIGA environment
was trashed each time I toggled between alpha and graphics mode, each switch
was also resetting the clipping, drawing colors and op, and transparency
modes. Once I took the defensive posture of always restoring my entire
graphics environment on every mode switch to graphics, the driver worked
correctly.
To be fair to Wyse, the problem originated at TI. When designing the TIGA
interface, TI failed to provide for the possiblity that a TIGA device might
also be providing console emulation.


Conclusion


Overall, TIGA is a very good graphics interface that was obviously designed
for use in a C environment. Once a simple graphics program has been developed
and debugged to get the feel of the interface, very large tasks can be tackled
with complete confidence. Like the TI hype says, the power lies in its
extensibility.
References
To obtain the following materials, write to: TI Semiconductor Group, SC-9042,
PO Box 809066, Dallas, TX 75380-9066, or call (800) 336-5236 ext. 700. Outside
the U.S., call (214)-995-6611 ext. 700.
TIGA-340 Interface User's Guide
TMS34010 Math/Graphics Function Library User's Guide
TMS340 Family Code Generation Tools User's Guide
The Invisible Text Bug
TIGA's behavior when clipping text is not well documented in the TI
literature. Text will be clipped to whatever the clipping rect is, even if
clipping is disabled. Also, none of the text string will be drawn if any part
of the string extends beyond the clipping rectangle. This presents a minor
problem unless you set the clipping rectangle to the screen extents before
each call to the text_out() function.
void graphtext(row, col, string)
int row,col;
char *string;
{
/* set clipping to screen extents */
set_clip_rect(xdots, ydots, 0, 0);

/* draw the text */
text_out(col * CHAR_WIDE, row * CHAR_HIGH, string);

/* restore the clipping rect */
set_clip_rect (
viewport.vpxr - viewport.vpxl,
viewport.vpyt - viewport.vpyb,
viewport.vpxl,
ydots-viewport.vpyt);
Figure 1

Listing 1
/* a few defines that make this more readable */
#define VRES config.mode.disp_vres
#define HRES config.mode.disp_hres
#define CHAR_WIDE font.charwide
#define CHAR_HIGH font.charhigh

integer opengraphics()
{
integer ret_code;
integer status = 1;
integer screens;

screens = scrnparams[0];

/* init the TIGA interface */
if (!set_videomode(TIGA, INIT_GLOBALS CLR_SCREEN))
{
/* tell them we just couldn't init the card */
cputs("failed in set_videomode()\n\r");
status = 13;
}

else
if ((ret_code = install_primitives()) < 0)
{
cputs("failed in install_primitives()\n\r");
status = 12;
}
if (status == 1)
{
/* init global variables */
get_config(&config);
get_fontinfo(0, &font);
dispcolors = (integer)config.mode.palet_size - 1;
xdots = HRES - 1;
ydots = VRES - 1;
maxgrrow = VRES / CHAR_HIGH;
maxgrcol = HRES / CHAR_WIDE;
gdrawmode = REPLACE;

set_textattr( "%1a", 0, 0 );

/* set up the viewport struct so VCAD knows how big
this card is... */
if (screens == 1)
{
sharedscreen = TRUE;
/* Setup viewport related parameters */
viewport.vpxl = 10 * CHAR_WIDE;
viewport.vpxr = xdots;
viewport.vpyb = 4 * CHAR_HIGH;
viewport.vpyt = ydots;
/* Viewport extremes in character (row,col)
coordinates */
viewport.vtxl = 10;
viewport.vtxr = maxgrco—;
viewport.vtyb = maxgrrow = 4;
viewport.vtyt = 0;
}
else
{ /* 2 screen */
sharedscreen = FALSE;
/* Setup viewport related parameters */
viewport.vpxl = 0;
viewport.vpxr = xdots;
viewport.vpyb = 0;
viewport.vpyt = ydots;
/* Viewport extremes in character (rew,col)
coordinates */
viewport.vtxl = 0;
viewport.vtxr = maxgrco;
viewport.vtyb = maxgrrow;
viewport.vtyt = 0;
}
/* set up the clipping rect */
set_clip_rect(
viewport.vpxr - viewport.vpxl,
viewport.vpyt - viewport.vpyb,
viewport.vpxl,
ydots-viewport.vpyt);
clear_screen(0);

if (sharedscreen) dispstate = GRAPHICS;
else dispstate = TEXT;

graphwasinit = TRUE;
}
return((integer) (status));
} /* opengraphics */


Listing 2
poly(nverts, poly_list)
int nverts, *poly_list [];
{
int i;

/*
see if vertices list is closed
and close it if not
*/
if (poly_list [0] ! = poly_list [(nverts*2) - 2] 
poly_list [1] ! = poly_list [(nverts*2) - 1])
{
poly_list [(nverts*2) +0] = poly_list[0];
poly_list [(nverts*2) +1] = poly_list[1];
nverts++;
}

/*
invert the Y axis of each vertice in list
*/
for (i=0; i < nverts; i++)
poly_list[(i * 2)+1] = ydots-poly_list[(i * 2)+1];

/*
then draw it
*/
if (filling)
fill_polygon_a(nverts,poly_list);
else
draw_polyline(nverts,poly_list);
}


Listing 3
/* look-up tables for several TIGA modes */
long bit2 mask[]={
0x00000000,
0x55555555,
0xaaaaaaaa,
0xffffffff };

long bit4 mask[]={
0x00000000,
0x11111111,
0x22222222,
0x33333333,
0x44444444,
0x55555555,
0x66666666,

0x77777777,
0x88888888,
0x99999999,
0xaaaaaaaa,
0xbbbbbbbb,
0xcccccccc,
0xdddddddd,
0xeeeeeeee,
0xffffffff };

int set_mask( mask )
unsigned short mask;
{
long real_mask;
real_mask = mask & (ncolors-1);
switch(nplanes)
{
case 1:
if (mask)
real_mask = -1;
break;
case 2:
real_mask = bit2_mask[mask];
break;
case 4:
real_mask = bit4_mask[mask];
break;
case 8:
real_mask = real_mask << 8;
real_mask = real_mask << 16;
break;
case 16:
break;
} /* end switch */
set_pmask(~real_mask);
return(0);
}


























Image Manipulation By Convolution


Wesley Faler


Wesley Faler is a senior Manufacturing Systems Engineering student at GMI
Engineering & Management Institute in Flint, MI. He has worked with C for 2
years now, and interests include user interface coding, AI, and scientific
computing. He is presently on the programming team that is writing the GMI CIM
Lab system code. All comments should be sent to 1409 New York Ave., Flint, MI
48506.


Several months ago, a project led me to a particularly interesting image
manipulation algorithm called convolution, which is capable of finding edges,
horizontal lines, vertical lines, and even diagonal lines in an image.
Convolution is a reasonably simple process whereby we interpret an image's
pixels in terms of how they fit with their neighbors. In other words, the
image's pixels after convolution represent some function of their original
neighbors. The following pseudocode makes the process a little clearer:
For every pixel in the original image, redraw the pixel with a color that is a
function of the colors of the original pixel's neighbors.
Continue the first step until the original image has been completely
"convolved" into the new image.
This algorithm doesn't choose the function involving the original neighbors. I
chose a simple and effective function: a weighted sum, including the original
pixel.
In a simple, one-dimentional example, a row of pixels is represented by (1 2 3
4), with each integer defining a pixel's color. The pixel colored (2) is the
first that can be redrawn since it is the first (left to right) with left and
right neighbors. Assume we apply a "convolution matrix" of (-5 0 6) to the
pixel and its neighbors. The pixel's new color will be
(1)(-5)+(2)(0)+(3)(6)
or (13). The convolved value for the original pixel colored (3) will be
(2)(-5)+(3)(0)+(4)(6)
or (14). The two end pixels (1) and (4) cannot be convolved since there are no
left or right points to which to apply the convolution matrix. They are
assigned a value of (0). The new row of pixels is now (0 13 14 0).
The two-dimentional case is similar, involving 2-D regions of pixels and a 2-D
convolution matrix.


Choosing A Convolution Matrix


How you choose the convolution matrix determines the final picture's
appearance. The program in Listing 1 accepts a file named matrix.dat that has
the following format:
<matrix width> <matrix height>
<row 1,col 1 factor> ... <row 1,col w factor>
... ... ...
<row h,col 1 factor> ... <row h,col w factor>
Note that both the width and height must be odd, since the matrix must have a
center point.
Some sample matrices will give you a feel for what convolution can do. For
example, the matrix file
3 3
-1 0 1
-1 0 1
-1 0 1
will light pixels that were on only the left edge of a dark-to-light
transition on the screen (see Figure 1). Reversing the 1s and -1s will detect
a light-to-dark transition (i.e. lighting the "right" side of the object).
Likewise, the matrix file:
3 3
1 1 1
0 0 0
-1 -1 -1
will detect a dark-to-light transition from top to bottom (Figure 2).
Switching the 1s and -1s here allows you to detect top-side lines. To generate
a picture of only an object's edges (like the image on this issue's cover),
use the matrix file:
3 3
-1 -1 -1
-1 8 -1
-1 -1 -1
Diagonal lines can be detected by using a matrix such as
3 3
0 1 0
-1 0 1
0 -1 0
In choosing your own matrix, observe the following simple rules for best
results:
Make sure the sum of the factors in the matrix is positive or zero. A zero sum
will eliminate extraneous pixels, while a small net positive sum can seem to
fill in missing pixels. A negative sum usually filters out too many pixels
(especially on a monochrome screen).
Use a zero in matrix locations where you are ambivalent about the color.
Use negative numbers for locations where you wish not to have color.
Use positive numbers for locations where you want a non-zero value.



The Program


Listing 1 contains a program that accepts images in the CUT file format used
by the Dr. Halo products. This format was easy to use with a hand scanner, but
you may substitute any other means of getting a picture onto the screen. If
you should replace the present routine for loading images (load_cut (...) ),
be sure to set the global variables minx, miny, maxx, maxy to the image's
location on the screen, but do not change the display page. To keep the
convolution code reasonably clean, the program also requires that you leave at
least <matrix height+1>/2 and <matrix width+1 >/2 margin on the top and left
side of the image, respectively.
CONVOLVE.C was written in Turbo C 2.0 and the graphics calls should be fairly
portable to other Cs. I frequently used pointers to minimize the execution
time that several hundred thousand array accesses would normally take. The
program could be further optimized by forcing it to ignore matrix entries of
zero.
When you run the program, it asks for the name of a CUT file to load and some
information about image cleanup. If the file is not in the same directory,
enter the path and the filename (including the .CUT extension).
The image cleanup option will delete stray pixels before convolution. A single
pixel with zero neighbors usually does not belong to a surface of interest.
You can delete such pixels before convolution by answering 0 at the cleanup
prompt. To disable the cleanup process, respond with -1.
Once the image has been convolved, press the space bar to switch between the
original image (or cleaned image) and the convolved image. The effects of
convolution become very apparent by simply holding down the space bar! Should
you be fortunate enough to use the program on a color system, you can toggle
colors by pressing the letters A-P (A=black, B=blue, etc...). Finally, hitting
the ESC key will get you out (as will pressing any key during convolution).


Conclusion


The program works fine now, though you may wish to add the capability for
single page display modes, developing a method to save convolved images,
further optimization (assembly?), and the use of other image file formats.
Convolution is an amazingly straightforward algorithm for image manipulation
and can serve as the starting point for AI programs, contour analysis, or
image recognition.
Figure 1
Figure 2

Listing 1
/**********************************************************/
/* CONVOLVE.C - Turbo C 2.0 implementation of image */
/* convolution */
/* ---------- by Wesley G. Faler. All code is "as is". */
/* There is NO copyright. Use this code as you will, and */
/* if you make money at it, good for you. */
/**********************************************************/

#include<stdlib.h>
#include<stdio.h>
#include<graphics.h>
#include<alloc.h>
#include<ctype.h>

int load_cut(char *fname);
int load_convolution_matrix(char *fname);
int convolve_image(void);
int swap_pictures(void);

int minx,maxx,miny,maxy;
int LOADPAGE=0;
int ENHANCEPAGE=1;
int *cmat, *pmat, *vmat;
int cmx,cmy,cmnum;

struct palettetype palette,newpal;
int driver,mode;

int cleancut=-1;

int init_graphics(void)
{
driver=DETECT; mode=0;
detectgraph(&driver,&mode);
if(driver==VGA) mode=VGAMED;
initgraph(&driver,&mode,"");
getpalette(&palette);
getpalette(&newpal);
}


int cleanup_image(void)
{
int i,j,num,x,y,k;

if(cleancut<0) return;
setactivepage(LOADPAGE);
setvisualpage(ENHANCEPAGE);
for(x=minx;x<maxx;x++) {
for(y=miny;y<maxy;y++) {
if(getpixel(x,y)!=0) num=-1;
else num=0;
for(j=-1;j<2;j++) {
for(i=-1;i<2;i++) {
if(getpixel(x+i,y+j)!=0) num++;
}
}
if(num>cleancut) {
k=getpixel(x,y);
setactivepage(ENHANCEPAGE);
putpixel(x,y,k);
setactivepage(LOADPAGE);
}
}
}
k=ENHANCEPAGE; ENHANCEPAGE=LOADPAGE; LOADPAGE=k;
}

void show_test_image(void)
{
int i;

minx-cmx; miny=cmy;
maxx=100+minx; maxy=100+miny;
setcolor(1);
moveto(minx,miny);
randomize();
for(i=0;i<20;i++)
lineto(random(100)+minx,random(100)+miny);
for(i=0;i<10;i++)
fillellipse(random(50)+25+minx,random(50)+25+miny,
random(25),random(25));
}

main( )
{
char fname[50];
int flag=0;

load_convolution_matrix("matrix.dat");
printf(".CUT file (1) or test image (0)?");
scanf("%d",&flag);
flag= flag? 1:0;
if(flag) {
fflush(stdin);
printf("filename to process:");
gets(fname);
}


printf("Delete pixels with x or fewer neighbors. x=");
scanf("%d",&cleancut);
if(cleancut>8) cleancut=8;

init_graphics();
setactivepage(1); cleardevice();
setactivepage(0); cleardevice();

setactivepage(LOADPAGE); setvisualpage(LOADPAGE);
if(flag) load_cut(fname);
else show_test_image();
cleanup_image();

setvisualpage(ENHANCEPAGE);
convolve_image();

swap_pictures();
restorecrtmode();
}

int toggle_colors(char c)
{
c=tolower(c);
c=c-'a';
if(c<0 c>=palette.size) return 0;
newpal.colors[c]= palette.colors[c]-newpal.colors[c];
setpalette(c,newpal.colors[c]);
return 1;
}

int swap_pictures(void)
{
int mode=0;
char a;

setvisualpage(LOADPAGE);
for (;;) {
a=getch();
if(a==27) return;
if(toggle colors(a)) continue;
if(mode==0) setvisualpage(ENHANCEPAGE);
if(mode==1) setvisualpage(LOADPAGE);
mode=1-mode;
}
}

int convolve_image(void)
{
int i,j,k,nval;
int *vx, *vy, *c;
int colmax,offset,end,midy;
char **lines=NULL;
char *temp=NULL;

offset =-minx+(mcx/2);
end=cmy-1; midy=cmy/2;
lines=(char **)malloc(cmy*sizeof(char *));
for(i=0;i<cmy;i++)
lines[i]-(char *)malloc(sizeof(char)*(maxx-minx+cmx+1));

setactivepage(LOADPAGE);
for(j=-cmy/2;j<cmy/2;j++) {
for(i=minx-cmx/2;i<(maxx+cmx/2+1);i++) {
lines[j+midy][i+offset]-getpixel(i,j+miny);
}
}
colmax=getmaxcolor();
for(j = miny;j<maxy;j++) {
setactivepage(LOADPAGE);
for(i=j+cmy/2,k=minx-cmx/2,nval=maxx+cmx/2;k<nval;k++)
lines[end][k+offset] = getpixel(k,i);
for(i=minx;i<maxx;i++) {
/* Load & multiply neighbors into matrix */
setactivepage(LOADPAGE);
vx=vmat; vy=vmat+1; c=cmat; nval=0;
for(k=0;k<cmnum;k++) {
if(*c) nval+= lines[(*vy)+midy][i+(*vx)+offset]*(*c);
/* if(*c) nval+= getpixel(i+(*vx),j+(*vy)) * (*c); */
c++;
vx+=2; vy+=2;
}
/* Cut off values too high or too low */
if(nval<0) nval=0;
if(nval>colmax) nval=colmax;
/* Place new pixel value */
setactivepage(ENHANCEPAGE);
putpixel(i,j,nval);
}
if(kbhit()) { getch(); break; }
/* rotate line pointers */
temp=lines[0];
for(i=1;i<cmy;i++) lines[i-1]=lines[i];
lines[end]=temp;
}
for(i=0;i<cmy;i++) {
if(lines[i]!=NULL) free(lines[i]);
}
if(lines!=NULL) {
free(lines);
}
return;
}
int build_offset_vectors(void)
{
int *t;
int il,im,jl,jm,i,j;
il=-cmx/2; im=cmx+il;
jl=-cmy/2; jm=cmy+jl;
t=vmat;
for(j = jl;j<jm;j++) {
for(i=il;i<im;i++) {
*t++=i; *t++=j;
}
}
}
int load_convolution_matrix(char *fname)
{
/* Layout of matrix file:
#x #y

x0y0 x1y0 ... xny1
.... .... ... ....
x0ym x1ym ... xnym
*/
FILE *mf;
int *t;
int i,j,im,jm;
if( (mf=fopen(fname,"rt"))==NULL) {
printf("Cannot load matrix file.\n");
abort();
}
fscanf(mf,"%d%d",&im,&jm);
if( (im&1)==0 (jm&1)==0) {
printf("Convolution matrix MUST have a center point.\n");
abort();
}
if( (cmat=(int *)calloc(im*jm,sizeof(int)))==NULL ) {
printf("Unable to calloc convolution matrix.\n");
abort();
}
cmx=im; cmy=jm; cmnum=im*jm;
t=cmat;
for(j=0;j<jm;j++) {
for(i=0;i<im;i++) {
if( fscanf(mf,"%d",t++)!=1 ) {
printf("Unable to read matrix.\n");
abort();
}
}
}
fclose(mf);
build_offset_vectors();
}

int load_cut(char *fname)
{
static unsigned char st[3000];
char *sp=st,*spend;
int stp=0;
int width,height;
FILE *fp;
int x,y,xl,yl;
int i,n,len,d,j;

fp=fopen(fname,"rb");
width=getw(fp); height=getw(fp);
xl =cmx; yl=cmy;
minx=xl; miny=yl;
maxx=xl+width; maxy=yl+height;
if(maxy>(getmaxy()-cmy)) {
maxy=getmaxy()-cmy;
height=maxy-yl;
}
getw(fp);
y=yl-1;
for(sp=st,n=0;n<height;n++) {
stp=getw(fp);
for(sp=st,spend=st+stp;sp<spend;) *sp++=getc(fp);
sp=st; spend=sp+stp; x=xl; y++;

while(sp<spend) {
if(*((unsigned char *)sp)>0x80) {
len=(*sp++) & 0x7f;
if(!(*sp)) { x+=len; continue; }
setcolor(*sp++);
moveto(x,y);
linerel(len,0);
x+=len;
continue;
} else {
len=*sp++;
for(j-0;j<len;j++) putpixel(x++,y,*sp++);
continue;
}
}
}
fclose(fp);
}













































Detecting Video Adapters At Runtime


Marcus Johnson


Marcus Johnson received his B.S. in math from the University of Florida in
1978 and is currently Senior System Engineer for Prericision Software in
Clearwater, Florida. You may contact him at 6258 99th Circle, Pinellas Park,
Florida.


To self-configure for a user's hardware, your video code must first determine
which video adapter is installed. Armed with this knowledge, the video code
can determine whether or not color is appropriate (or even possible), how to
best execute graphics, whether or not snow is a factor to be dealt with,
whether or not the hardware can handle text modes with more than 25 lines, and
so on.
The code presented here detects these display adaptor types: Monochrome
Display Adaptor (MDA), Color Graphics Adaptor (CGA), Enhanced Graphics Adaptor
(EGA), Multi-Color Graphics Array (MCGA), Video Graphics Array (VGA), Hercules
Graphics Card, Hercules Graphics Card Plus, and Hercules In Color Card
The code also identifies these display types: MDA-compatible (monochrome),
CGA-compatible (color), EGA-compatible (color), PS/2-compatible monochrome,
and PS/2-compatible color.
The code presented also detects up to two video hardware systems and
distinguishes which is active, and which is not.
Some of these characteristics can't be guaranteed. The assumption of color in
the case of CGA and EGA compatible displays is just that, an assumption. The
CGA video system has no idea what kind of monitor is attached, and EGA systems
can be fooled. Also, there is no way to distinguish for all CGA cards and
their clones whether or not the card is prey to the display interference
problem of snow. Having discovered that the CGA is present, the programmer
must either make an arbitrary decision to allow for snow or ignore its
possibility, or have the user make that decision.


The Method


The identification process begins with calls to the enhanced video BIOS; first
a call that only a VGA or MCGA equipped system will support, then one that an
EGA card should support. These calls will only function with a video BIOS that
supports these more sophisticated video systems; CGA, MDA and Hercules cards
do not have enhanced video BIOS ROMs, and the calls should fail harmlessly,
with all registers preserved. The enhanced BIOS calls will succeed, modifying
the registers to identify the installed hardware. If the calls succeed, the
job is done, or nearly so; it may be necessary to hunt for a CGA or MDA card
of which the more sophisticated video system is unaware.
Next the routines attempt to identify the CRT controller (CTRC) status port's
address in the PC's I/O space. The MDA and Hercules cards' CRTC status port is
usually found at 3B4H, and the CGA cards' CRTC status port at 3D4H. The
routines write an arbitrary value to the Cursor Location Low register (0Fh)
and then, after a reasonable delay, attempt to read that value back from one
of the status port addresses. If the value is recovered, the routines assume
they have found the CTRC of the associated card.
In the case of the CGA cards and their clones, there is nothing left to do --
that's all you can learn. If an MDA card is detected, however, you can also
distinguish between an MDA card and the Hercules cards. To do this, the
vertical sync bit is sampled and then for a certain time interval, is
continually and frequently re-sampled. If no change is observed, the video
card is assumed to be an MDA card. If the vertical sync bit changes, the video
card is one of the Hercules graphics cards. The specific card is identified by
examining bits 4-6 of the status port; a pattern of 001 indicates a Hercules
Graphics Card Plus, a pattern of 101 indicates a Hercules In-Color Card, and
any other pattern is a Hercules Graphics Card.
Distinguishing CGA from MDA and MDA from Hercules cards are both
time-dependent tasks. How much delay is enough to insure the CGA card has had
time to update the status port? How frequently must the code sample the
vertical sync to ensure it hasn't just missed a transition? Will compiler
generated code be fast enough to catch the sync transition at all? If the
distinction between the MDA and Hercules cards is important enough, the
vertical sync test might best be written in assembly language. In my software,
because of these factors, and because I don't usually need to know the
difference, I often use a modified version of this code that simply identifies
whether or not the system has EGA graphics or better.
In any case, the identification of the video hardware, as with any code that
attempts to exploit the hardware environment to its benefit, is not a
risk-free undertaking. Even the seemingly straightforward matter of making
BIOS calls assumes that the video BIOS will behave in the expected fashion,
and in particular that undocumented calls will return with the caller's
register values intact. These techniques are very specific to true IBMs.
Nevertheless, this and other routines based on [1], have worked on every "100%
PC-compatible" I've been able to test.
References
[1] Wilton, Richard, Programmer's Guide to PC and PS/2 Video Systems.
This article, and the C source code presented here, are based on Wilton's
assembly language version. This excellent reference provides a reasonably
complete answer to the problem of detecting and exploiting the video hardware
and should be a standard reference for any programmer who wants or needs to
know how to milk the PC's video hardware for all it's worth.

main.c
/*------------------------------------------------------------\
 Marcus W. Johnson 1990 
 
 Code to demonstrate IdentifyVideo() 
\------------------------------------------------------------*/

#include <stdio.h>
#include "video.h"

/* Print the name of the adaptor and the display device */
static void Name(struct video *v)
{
switch (v->VideoAdaptor)
{
case UnknownAdaptor:
puts("No Video Adaptor Detected");
break;
case MDA:
puts("Monochrome Display Adaptor");
break;
case CGA:
puts("Color Graphics Adaptor");
break;
case EGA:
puts("Enhanced Graphics Adaptor");
break;
case MCGA:
puts("Multi-Color Graphics Array");

break;
case VGA:
puts("Video Graphics Array");
break;
case HGC:
puts("Hercules Graphics Card");
break;
case HGCPlus:
puts("Hercules Graphics Card Plus");
break;
case HerculesInColor:
puts("Hercules InColor Card");
break;
default:
puts("Program Error: Unidentified Video Adaptor");
break;
}
switch (v->VideoMonitor)
{
case UnknownMonitor:
puts("No Monitor Detected");
break;
case MDAMonochrome:
puts("Monochrome Monitor");
break;
case CGAColor:
puts("CGA Color Monitor");
break;
case EGAColor:
puts("EGA Color Monitor");
break;
case PS2Monochrome:
puts("PS/2 Monochrome Monitor");
break;
case PS2Color:
puts("PS/2 Color Monitor");
break;
default:
puts("Program Error: Unidentified Video Monitor");
break;
}
}
/* Demonstates use of IdentifyVideo() */
void main()
{
struct video *v;

v = IdentifyVideo();
if (v->VideoAdaptor != UnknownAdaptor)
{
Name(v++);
if (v->VideoAdaptor != UnknownAdaptor)
Name(v);

}
else
puts("No known video adaptor or monitor");
}


video.h
/*----------------------------------------------------\
 Marcus W. Johnson 1990 
 
 Definitions of detected video systems and displays 
 
 adapted from: Programmer's Guide to PC & PS/2 
 Video Systems 
 Richard Wilton 
 Microsoft Press 
 Redmond, Washington 
\-----------------------------------------------------*/

enum adaptor
{
UnknownAdaptor,
MDA,
CGA,
EGA,
MCGA,
VGA,
HGC,
HGCPlus,
HerculesInColor
};

enum monitor
{
UnknownMonitor,
MDAMonochrome,
CGAColor,
EGAColor,
PS2Monochrome,
PS2Color
};

struct video
{
enum adaptor VideoAdaptor;
enum monitor VideoMonitor;
};

extern struct video *IdentifyVideo(void);

video.c
/*--------------------------------------------------------------------\
 Marcus W. Johnson 1990. 
 
 Code to identify video display adaptors and display devices 
 
 Adapted from: Programmer's Guide to PC and PS/2 Video Systems
 Richard Wilton 
 Microsoft Press 
 Redmond, Washington 
\--------------------------------------------------------------------*/

#include <dos.h>
#include "video.h"


enum boolean
{
NO,
YES
};

#define N_SYSTEMS (2)

static struct video Device[ N_SYSTEMS ];

/*---------------------------------------------------------------------\
 Detects whether or not a given I/O address is that of a CRT 
 Controller; the Cursor Location Low register of the alleged 
 CRTC is written with an arbitrary value (I used the one Wilton 
 uses in his Find6845 procedure), we wait an arbitrary period of
 time (I waited a millisecond), and we see if the value is still
 there, then this is probably the CRTC. 
\---------------------------------------------------------------------*/

static int FindCRTC(int Port)
{
unsigned char CursorLow;
unsigned char NewCursorLow;

outportb(Port++, 0x0F);
CursorLow = inportb(Port);
outportb(Port, 0x66);
delay(1);
NewCursorLow = inportb(Port);
outportb(Port, CursorLow);
return (NewCursorLow == 0x66);
}

/*---------------------------------------------------------------------\
 Places the specified adaptor and monitor data in the next 
 unused element of the Device array. 
\---------------------------------------------------------------------*/

static void FoundDevice(enum adaptor AType, enum monitor MType)
{
if (Device[ 0 ].VideoAdaptor == UnknownAdaptor)
{
Device[ 0 ].VideoAdaptor = AType;
Device[ 0 ].VideoMonitor = MType;
}
else
{
Device[ 1 ].VideoAdaptor = AType;
Device[ 1 ].VideoMonitor = MType;
}
}

/*---------------------------------------------------------------------\
 Attempt to find a monochrome adaptor; attempts to detect a CRTC
 at I/O address at 0x3B4. If this is successful, we read the 
 vertical sync bit, and wait a while to see a transition; if it 
 occurs, it's a plain monochrome display adaptor, otherwise it's
 a Hercules card of some sort. What type is decided by bits 4-6 
 of the CRTC port. 

\---------------------------------------------------------------------*/

static void DetectMono(void)
{
if {FindCRTC{0x3B4))
{
auto unsigned char VSync;
auto unsigned int k;
auto enum boolean FoundIt;

VSync = inportb(0x3BA)& 0x80;
FoundIt = NO;
for (k = 0; k < 0x8000; k++)
{
if (VSync != (inportb(0x3BA) a 0x80))
{
switch (inportb(0x3BA) & 0x70)
{
case 0x10:
FoundDevice(HGCPlus, MDA);
break;
case 0x50:
FoundDevice(HerculesInColor,
EGAColor);
break;
default:
FoundDevice(HGC, MDA);
break;
}
FoundIt = YES;
break;
}
}
if (FoundIt == NO)
FoundDevice(MDA, MDAMonochrome);
}
}

/*---------------------------------------------------------------------\
 Attempt to find a CGA adaptor; if a CRTC is detected at I/O 
 address 3D4, must be CGA... 
\---------------------------------------------------------------------*/

static void DetectCGA(void)
{
if (FindCRTC(0x3D4))
FoundDevice(CGA, CGAColor);
}

/*---------------------------------------------------------------------\
 Fills in the Device array and returns its address to the 
 caller, who can then examine its contents. 
\---------------------------------------------------------------------*/

struct video *IdentifyVideo(void)
{
int k;
union REGS r;


for (k = 0; k < N_SYSTEMS; k++)
{
Device[ k ].VideoAdaptor = UnknownAdaptor;
Device[ k ].VideoMonitor = UnknownMonitor;
}

/*------------------------------------------------------------\
 Attempt to detect PS/2-type systems by making a BIOS 
 call to get the video display combination from the 
 video BIOS. On return, the AL register is set to 1A, 
 BL will contain the display code for the active display
 and BH will contain the display code for the inactive 
 display. The BL and BH registers are used to index 
 arrays containing codes for the appropriate display and
 display adaptor. 
\------------------------------------------------------------*/

r.x.ax = 0x1A00;
int86(0x10, &r, &r);
if (r.h.al == 0x1A)
{
static struct video DeviceList[ ] =
{
{ UnknownAdaptor, UnknownMonitor },
{ MDA, MDAMonochrome },
{ CGA, CGAColor },
{ UnknownAdaptor, UnknownMonitor },
{ EGA, EGAColor },
{ EGA, MDAMonochrome },
{ UnknownAdaptor, UnknownMonitor },
{ VGA, PS2Monochrome },
{ VGA, PS2Color },
{ UnknownAdaptor, UnknownMonitor },
{ MCGA, EGAColor },
{ MCGA, PS2Monochrome },
{ MCGA, PS2Color },
};
if (r.h.bh != 0)
{
Device[ 1 ].VideoAdaptor =
DeviceList[ r.h.bh ].VideoAdaptor;
Device[ 1 ].VideoMonitor =
DeviceList[ r.h.bh ].VideoMonitor;
}
Device[ 0 ].VideoAdaptor = DeviceList[ r.h.bl ].VideoAdaptor;
Device[ 0 ].VideoMonitor = DeviceList[ r.h.bl ].VideoMonitor;
if (Device[ 0 ].VideoAdaptor == MDA 
Device[ 1 ].VideoAdaptor == MDA)
{

/*----------------------------------------------\
 If either the active display or the 
 inactive display is identified as MDA, 
 we clear the system and display 
 information for that display; we need 
 to further identify the system as 
 possibly a Hercules card. 
\----------------------------------------------*/


if (Device[ 0 ].VideoAdaptor == MDA)
{
Device[ 0 ].VideoAdaptor = UnknownAdaptor;
Device[ 0 ].VideoMonitor = UnknownMonitor;
}
else
{
Device[ 1 ].VideoAdaptor = UnknownAdaptor;
Device[ 1 ].VideoMonitor = UnknownMonitor;
}
DetectMono();
}
}
else
{
/*------------------------------------------------------\
 detect an EGA card; make a call to the BIOS to 
 get the video subsystem configuration. On 
 return, BL will be set to 0, 1, 2 or 3 (which 
 is the number of 64K blocks of RAM in addition 
 to the default 64K block on the video card), 
 and the least 4 bits of CL contain the 
 configuration switch settings for the card. 
 This includes information as to the display 
 type. 
\------------------------------------------------------*/
r.h.bl = 0x10;
r.h.ah = 0x12;
int86(0x10, &r, &r);
if (r.h.bl != 0x10)
{
auto enum monitor Display;
static enum monitor EGADisplay[ ] =
{
CGAColor, EGAColor, MDAMonochrome
};

Display = EGADisplay[ (r.h.cl % 6) >> 1 ];
FoundDevice(EGA, Display);

/*-----------------------------------------------\
 If a monochrome display is found, any 
 other system must be color, and 
 vice-versa. 
\-----------------------------------------------*/

if (Display == MDAMonochrome)
DetectCGA();
else
DetectMono();
}
else
{
DetectCGA();
DetectMono();
}
}
/*---------------------------------------------------------------\
 Resolve discrepancies between systems with multiple 

 display types and the current video state; not a 
 problem if only one type was found, or one of the types 
 found was MCGA or VGA. Basically, the active display, 
 which is in Device[ 0 ], should match the color 
 specification of the current video mode. Incidently, 
 the device swap is performed by the trick of using the 
 mathematical properties of the exclusive or operator to 
 avoid having to declare a temporary holding variable. 
\---------------------------------------------------------------*/

if (Device[ 1 ].VideoAdaptor != UnknownAdaptor &&
Device[ 0 ].VideoAdaptor != VGA &&
Device[ 0 ].VideoAdaptor != MCGA &&
Device[ 1 ].VideoAdaptor != VGA &&
Device[ 1 ].VideoAdaptor != MCGA)
{
r.h.ah = 0x0F;
int86(0x10, &r, &r);
if ((r.h.al & 7) == 7)
{
if (Device[ 0 ].VideoMonitor != MDAMonochrome)
{
Device[ 0 ].VideoMonitor ^=
Device[ 1 ].VideoMonitor;
Device[ 1 ].VideoMonitor ^=
Device[ 0 ].VideoMonitor;
Device[ 0 ].VideoMonitor ^=
Device[ 1 ].VideoMonitor;
Device[ 0 ].VideoAdaptor ^=
Device[ 1 ].VideoAdaptor;
Device[ 1 ].VideoAdaptor ^=
Device[ 0 ].VideoAdaptor;
Device[ 0 ].VideoAdaptor ^=
Device[ 1 ].VideoAdaptor;
}
}
else
{
if (Device[ 0 ].VideoMonitor == MDAMonochrome)
{
Device[ 0 ].VideoMonitor ^=
Device[ 1 ].VideoMonitor;
Device[ 1 ].VideoMonitor ^=
Device[ 0 ].VideoMonitor;
Device[ 0 ].VideoMonitor ^=
Device[ 1 ].VideoMonitor;
Device[ 0 ].VideoAdaptor ^=
Device[ 1 ].VideoAdaptor;
Device[ 1 ].VideoAdaptor ^=
Device[ 0 ].VideoAdaptor;
Device[ 0 ].VideoAdaptor ^=
Device[ 1 ].VideoAdaptor;
}
}
}
return (Device);
}


































































C Extensions For Multi-Threading


David A. Schmitt


Dave Schmitt was a founder and president of Lattice, the well-known C compiler
company, from 1983 until 1990. Prior to that, he worked at Bell Telephone
Laboratories where he was involved in the design of fault-tolerant operating
systems, including a nonstop version of UNIX. He is currently a free-lance
author and consultant. You may contact him at Pivot, 708-469-2235.


As more C programmers begin to work with applications that involve
multitasking, the weaknesses of C in this area become more apparent. One
particularly vexing problem arises when you use threads in OS/2 and other
operating systems. This article describes the problem and proposes a language
extension that solves it. This extension has been implemented in recent
versions of the Lattice C Compiler, and has greatly simplified our OS/2
programs.


Multi-Threading


Most modern operating systems are process-oriented. A process is an entity for
which the operating system can allocate dynamic resources, such as the
processor, memory, and I/O devices. Static resources, such as disk files, are
usually allocated to a higher-level entity known as a user, while human
interfaces such as the keyboard and screen are allocated to a collection of
processes known as a session.
The concepts of user, session, and process are easy to visualize and
understand because they relate to "real world" objects. In some sense, a user
is a person, a session is a terminal, and a process is a program. Threads are
a bit more difficult to understand because they don't have this real-world
aspect. A thread exists only as an ephemeral object called an execution
stream.


Execution Streams


A process is created when the operating system loads a program into memory and
schedules it for execution by placing its starting address and initial
register settings into an execution queue or list. Eventually the process
percolates to the top of the execution list, where it obtains control of the
processor for its allotted time period or until it pauses for an I/O
operation. At that point, the operating system makes an entry in the execution
list so that the process can continue when it is ready. This is a simplified
description of the way most operating systems manage the primary execution
stream, or thread, of an active process.
In a single-thread system, each process can have only one entry in the
execution list, while a multi-thread system allows a process to have several
scheduling entries. That is, multi-threading means that the operating system
manages two or more execution streams for the same program.
Perhaps the best way to visualize multi-threading is to think of the initial
execution stream as the parent thread and to view the child threads as
functions that execute concurrently with the parent. You create a thread by
telling the operating system, "Transfer control to this function, but return
to me before the function is finished." Then every so often you ask the
operating system, "Is that function finished yet? If so, give me its return
code."


Using Threads


Consider a program that maintains a representation of a spreadsheet in memory.
The program's primary purpose is to allow its user to enter data and formulas
into the spreadsheet and to display the current spreadsheet information in
row-column format. Each time the user changes a spreadsheet cell, the program
calls a function to compute the related changes in other cells.
As the spreadsheet grows, the computation takes longer and longer. At some
point the delays begin to annoy the user, whose terminal becomes sluggish or
locked out while the function executes. A multi-threading system like OS/2
solves this problem easily by calling the computation function as a child
thread running at a lower priority than the parent. It will then go about its
business in the background while the parent thread continues to interact with
the user.
Notice that it is easier and more efficient to treat the computation function
as a thread instead of creating a separate process for it. Since the only
resource that this procedure needs is the spreadsheet data, it would be
wasteful to have the operating system go through all the work of building I/O
and other resource management structures that would never be used.
In other words, if you want to perform some operation using the resources
already allocated for your process, it is usually more efficient to create a
thread instead of another process. When you ask the operating system to create
a thread, it need only make another entry in the execution list. Process
creation, on the other hand, involves a lot of time-consuming resource
allocation operations. This distinction has led some people to refer to
threads as lightweight processes.


Interference Among Threads


Any software system that employs shared resources is prone to interference
among the sharers, be they users, sessions, processes, or threads. For
instance, if two processes are sharing the same data memory, they must
carefully coordinate their actions so that the data remains consistent.
Similarly, if two users are accessing the same file, they must coordinate
their activities so that the data in the file does not get corrupted.
Operating systems usually provide a variety of interprocess communication
(IPC) techniques to facilitate this coordination. In addition, most operating
systems require explicit declarations of shared resources at the user,
session, and process levels. Two processes may share memory or I/O devices
only if they both inform the operating system of their intention to do so. Two
users may share a file only if the owner agrees and so informs the operating
system. Several application programs may share the same terminal session only
if they work through some arbiter, like the Presentation Manager, which
prevents the screen appearance from confusing the user.
At the thread level, coordination problems are more subtle and more severe.
The threads of a process share all of that process's resources, making the
chance of unwanted interference very high. For example, consider the error
code errno that C programmers use to determine if a library function has
failed. Suppose one thread calls a library function, which takes an I/O break.
During the break, a second thread calls a library function which places an
error code into errno. But before the second thread can read errno, the first
thread resumes and changes it to a different value. The second thread has been
compromised and may make an incorrect decision because the threads have not
coordinated their use of errno, which is a shared resource.
This problem would not occur if all threads used a semaphore or other IPC
mechanism to control their access to errno. In fact, the ANSI C committee
decreed that programmers should not assume that errno is simply a memory
location, but should code it as a macro that invokes a function returning the
appropriate value. In other words, a C compiler could implement errno as
#define errno get_errno()
Then each reference to errno would actually call the get_errno function, which
would return the copy that is appropriate for the current thread.
The trouble with this approach is that the typical C compiler/library system
is chock full of these subtly shared memory areas. For instance, what about
the static work areas used by many floating point libraries? What about the
stream I/O data structures and buffers? What about the static data areas used
by the ANSI-defined time and date functions? The library would be much slower
and larger if each of these shared objects had to be protected by a semaphore.
Furthermore, my experiences with OS/2 demonstrated that it is very easy for an
application programmer to unwittingly define data items that other threads
corrupt. This situation often occurs when additional threads are introduced
into a debugged program in order to make it more efficient. Suddenly the
program behaves strangely because the threads are not coordinating their
access to shared objects which were previously used in a serial fashion.


Eliminating Thread Interference


When adapting the Lattice C compiler to OS/2, we decided that a C extension
would be an appropriate solution to the problems of shared access. If we could
inform the compiler about possible thread interference, it could automatically
generate code to avoid this problem. Three approaches looked promising:
Tell the compiler which data objects must be replicated for each thread.
Tell the compiler that a certain sequence of statements should not be
interrupted by another thread.
Tell the compiler that a certain sequence of statements should be controlled
by a semaphore.
We decided that all three approaches were useful and that the first two were
relatively easy to implement via two new keywords: private and critical. The
third approach is more complex and involves two other new keywords: prolog and
epilog. The Lattice v6 compiler for DOS and OS/2 supports private and
critical, but doesn't implement the third approach.



The Private Keyword


The private keyword describes a data object which must be replicated for each
thread. In other words, each thread needs a private copy of the object.
Whenever the program uses a private object, the compiler generates code to
access the appropriate instance.
The programmer defines and declares private objects using the same syntax as
near, far, and huge, which were introduced by Microsoft in order to handle
various size pointers. These keywords, together with const and volatile
defined by ANSI, specify the access method for data objects. The other data
definition keywords, such as int and float, are used to specify an object's
"shape".


Defining Private Data


You can use the private keyword to define data and pointers.
(1) private int errno;
(2) int private counter = 200;
(3) char private buffer[512];
(4) char*(private ptr1);
(5) int private *ptr2;
(6) float private *(private ptr3);
The first two examples define private integers named errno and counter, while
the third defines a private buffer of 512 bytes. In these three cases, you can
write the int and private keywords in any order. You can also initialize data
objects when defining them as private, as in example two.
Using the private keyword with pointers is a bit more complicated, although no
more so than using near, far, and huge. When forming such a definition, you
must ask yourself, "What is private? The pointer itself, the object being
pointed to, or both?" Examples four, five, and six illustrate each of these
cases. Example four defines a pointer that will be replicated for each thread.
Notice that the private keyword appears closest to the object name, ptr1.
Example five defines a normal pointer to a private integer, and example six
defines a private pointer to a private floating point number. You could omit
the parentheses in these three examples, but they do make the statements
easier to understand.
I must point out a terminology problem that Microsoft created in the
documentation of the near, far, and huge keywords. Consider the two
statements:
char far *ppp;
char *(far qqq);
Microsoft calls ppp a far pointer when it actually resides in the default data
area and only points to a far object. In other words, although ppp refers to a
far object, the pointer itself is not a far object. The second statement
defines a pointer qqq that actually resides in the far area, and so seems more
properly called a far pointer. Despite our misgivings, we decided to propagate
Microsoft's terminology when using private. So, in the statements
char private *ppp;
char *(private qqq);
ppp is called a private pointer even though it is really a pointer to a
private object. It's not clear what to call qqq.


Declaring Private Data


Using the private keyword in external data declarations couldn't be easier;
you just place the extern keyword in front of the definition and strip off any
initializers. For example, the external declarations for the preceding
definitions are:
(1) extern private int errno;
(2) extern int private counter;
(3) extern char private buffer[512];
(4) extern char*(private ptr1);
(5) extern int private*ptr2;
(6) extern float private
*(private ptr3);
It is extremely important to declare private data correctly so that the
compiler will generate the proper accessing instructions. This is also true
with near, far, huge, volatile, const, and any other access method keywords.
Later I'll explain a technique that the Lattice compiler and linker use to
catch situations in which private data is not declared consistently within a
group of modules.


Pointer Conversion Rules


The general rule when working with access method keywords such as private is
not to make an assignment that would cause the compiler to "forget" how to
access an object. For example, consider the following definitions and
assignment statements:
int ni, private pi;
int *np, private *pp;
(1) np = &ni; (Correct)
(2) np = &pi; (Incorrect, perhaps)
(3) pp = &pi; (Correct)
(4) pp = &ni; (Incorrect, perhaps)
You can assign the address of the normal integer ni to the normal pointer np,
but the compiler should complain if you attempt to place the address of the
private integer pi into np. Similarly, private pointer pp can hold the address
of private integer pi but not the address of normal integer. The compiler
should warn you about this questionable assignment.
In practice, a compiler would probably not complain about the fourth statement
because most computer addressing methods can cope with this type of
assignment. Similarly, the second statement would usually be accepted in the
large memory model because normal pointers are large enough to hold the
addresses of private objects. Similar loopholes exist in Microsoft's
implementation of near, far, and huge. Nonetheless, as the concept of C access
methods becomes more widely accepted, you will be ahead of the game if you use
the more strict (and therefore more portable) rule stated earlier.


Private Data Initializers



Initializers can be used when defining private data objects, as shown in the
earlier example where we initialized the private object counter. However, you
are not allowed to take the address of a private object in a static
initialization. Consider the following example:
int private counter = 200;

int private *ppp = &counter;
/*** WRONG ***/

foo (void)
{
int private *qqq = &counter;
int private *rrr;

rrr = &counter;
*rrr -= 20;
}
All statements in this example are correct except the initialized definition
of pointer ppp. Static initializations like this are performed at compile
time, but the address of private objects cannot be determined until runtime.
Definitions such as that for qqq are allowed because the initialization occurs
at runtime. And, of course, you can always take the address of a private
object in an assignment statement.


Intel 16-Bit Implementation


Lattice's implementation of private on the Intel 80286 is simple and
efficient. The compiler includes each private object in the PRIVATE linking
class, and it generates code to access these objects via the Stack Segment
(SS) register. The linker gathers all private objects into a contiguous area
that is placed at the lower end of the STACK segment. Each time the program
starts a thread via the thstart function from the Lattice OS/2 library, a new
stack is allocated, and the private area in the current stack is copied to it.
This method implies that a child's private objects have values identical to
the parent's at the point thstart was called.
This technique works in the Lattice D, L, and H memory models because SS can
take on different values in those models. Since in the S and P models SS must
always point to the default data segment, DGROUP, the private keyword is not
allowed in those models. Given that most OS/2 programs use the large model,
this is not a serious restriction.
Earlier I mentioned that you must use the private keyword consistently. If an
object is defined as private, then all external declarations of that object
must also use the keyword. In order to catch inconsistent declarations at link
time, the Lattice compiler automatically appends an @ character to each
private name. Thus, the declaration
private int counter;
actually defines counter@ as a public object. If another program then declares
counter without the private keyword, it will refer to the global symbol
counter, which should cause the linker to complain about an unresolved
reference. This technique is not foolproof, but it goes a long way towards
eliminating inconsistent declarations.
The Lattice implementation of private will not work with Microsoft's compiler
because Microsoft requires SS to contain the DGROUP address in all memory
models, an unfortunate design decision on Microsoft's part. It not only makes
multi-thread and re-entrant C programming more diffficult, but it also limits
the stack size to substantially less than the 64K allowed by Lattice and other
compilers.
Nonetheless, the private keyword could be implemented in the Microsoft
environment by using the more general technique that Lattice employs in the
80386 32-bit mode.


Intel 32-Bit Implementation


OS/2 v2 runs on the 80386 in the 32-bit "flat address" mode. Flat addressing
means that segmentation is not used, and the data area is accessed via simple
byte addresses. This addressing method is similar to that of the Motorola
68000 family and other true 32-bit processors.
In the flat addressing mode, the stack segment register SS does not change for
each thread. Instead, OS/2 allocates the thread stacks within the 32-bit
addressing spectrum and adjusts the stack pointer register SP for each thread.
This means that we cannot use SS as the base address for private data; we use
EBX instead. The compiler then generates EBX-relative addressing to access
private data objects. Each time thstart is called, the compiler allocates a
new private data area, copies the current thread's data to it, and sets EBX to
that area for the new thread.
Although reserving a base register for private data might seem wasteful, it
does not seriously impair the compiler's ability to generate good 386 code.
The flat address model eliminates many of the inefficiencies caused by
segmentation, and the 386's 32-bit registers make pointer manipulation and
floating point computations much easier. In light of these major architectural
improvements, the reservation of the EBX register is small price to pay for
the advantages of the private keyword.


The critical Keyword


The critical keyword describes a block of statements or a function that must
not be interrupted by some other program that might interfere with it. In
other words, the designated block or function forms a "critical section" of
the program. Listing 1 shows how you define a critical function.
In Listing 1, the entire function foo is protected from interference.
Furthermore, any functions that foo calls are also protected, even if they are
not deftned as critical.
Suppose that only the work associated with type code 2 is prone to
interference. Then you would use a critical block instead of a critical
function, as shown in Listing 2.
Preceding a bracketed block of statements with critical informs the compiler
that those statements should not be interrupted. Functions called from within
the bracketed code are also protected.
You might conclude that critical is a frivolous language extension, since the
C programmer can easily create a critical section by calling the appropriate
lock and unlock routines. However, I've seen many cases where a programmer
forgets to call the unlock routine at every function return point. The
critical keyword makes the compiler do this automatically.
Listing 3 shows how our previous critical function would look if we had to
explicitly call the lack and unlock routines.
The function is very messy without the critical keyword. You must call the
unlocking routine before each return statement, or else use goto statements to
jump to a common exit point for each type of return. It's easy to forget to
call the unlocking routine, especially in if statements like the one in the
default case.


Lattice v6.0 Implementation


The initial implementation of the critical keyword for DOS and OS/2 allows you
to define critical functions only. Critical blocks may be supported in a
future release. Furthermore, the Lattice implementation assumes that you are
concerned only with interference from the other threads within your process.
You can easily change this assumption by providing your own lock and unlock
routines.
When you define a critical function, the compiler generates a special prolog
and epilog, which call library routines named _CXENTRY and _CXEXIT,
respectively. _CXENTRY uses the OS/2 Application Program Interface (API)
service named DosEnterCritSec, and _CXEXIT uses DosExitCritSec.
Since _CXENTRY and _CXEXIT are just library functions, you can replace them
with versions that are appropriate for another environment. For example, when
writing an I/O driver, you probably want critical sections to actually block
interrupts. Likewise, in a shared database scenario where threads may
interfere with other processes, the critical section routines could use a
global semaphore.
Note that critical functions can be nested. The protection begins when the
first (i.e., outermost) critical function is entered and ends when that
function returns. Nested critical functions may require that some versions of
the _CXENTRY and _CXEXIT routines maintain an up-down counter.



prolog and epilog Keywords


Although the critical keyword can be used to prevent interference among
threads that are sharing data, it is deficient in two ways. First, it blocks
all other threads in the current process, even those that are not likely to
interfere. Second, it does not block threads in other processes that may be
sharing the data, unless you replace the critical section lock and unlock
routines as described earlier. Most programmers employ semaphores to provide
finer-grained control of individual resources. Local semaphores can coordinate
the threads of a process, while global semaphores provide inter-process
coordination.
With this method a semaphore controls a related set of shared data objects.
You then request the operating system to give you control of the appropriate
semaphore before accessing any of the controlled objects. If no other process
or thread is using the semaphore, it is assigned to you; otherwise the
operating system suspends your thread until whoever owns the semaphore
releases it. After finishing with the data, you again call the operating
system to release the semaphore.
Although this semaphore protocol is simple, it's tedious to code each time you
access shared data. The prolog and epilog keywords place this burden on the
compiler in a way that should make your programs easier to understand and
maintain.
Listing 4 defines a general locking prolog called lock and a companion epilog
called unlock. These would normally be placed in a header file just like
structure definitions. We then code an application function named update which
updates a shared buffer whose address is passed as its first argument. Access
to the buffer is controlled via a semaphore whose handle is passed as the
second argument.
The first line of the update function is the usual prototype. Then, before
defining the body of the function, we specify that function's prolog and
epilog calls. The body of the function is coded in the normal way. The
compiler, in effect, inserts the prolog before the first statement in the
body, and the epilog before each return point.
prolog and epilog form a powerful extension to the C language and address a
problem that is not currently handled by C+ + or other extensions. However,
the Lattice compiler does not yet support this feature because its syntax and
semantics are still being discussed.


Summary


Programmers have used C in real-time and multitasking applications for many
years, and yet the language has not been extended to handle the basic problems
of these environments. When I worked with UNIX at Bell Laboratories, I
remember that the kernel was littered with explicit calls to critical section
control routines and semaphore handlers.
Perhaps the lack of features was acceptable in the days when only a few
skilled programmers dealt with multitasking and shared data. Now, however, the
high-level API of OS/2 allows the "average" programmer to exploit these
advanced operating system features. In addition, more and more programmers
need these features in their applications, whether under OS/2, UNIX, or an
embedded system real-time executive.
Recognizing this, Lattice has taken a tentative step towards evolving C so
that it can better support multitasking. The company welcomes comments and
criticisms, with the hope that this discussion will lead to an enhanced ANSI
standard.


Acknowledgements


I would like to thank my friends and associates at Lattice and SAS Institute
for their assistance with this article. Special thanks to John Pruitt and
Glenn Musial, who have primary responsibility for the Lattice C Compiler and
are keenly concerned about the evolution of the C language.

Listing 1
critical foo(int type)
{
switch(type)
{
case 1:
/*** Do work for type code 1 ***/
return(SUCCESS);

case 2:
/*** Do work for type code 2 ***/
return(SUCCESS);

default:
if(type > 5) return(ERROR);
/*** Do default work ***/
return(SUCCESS);
}
}


Listing 2
func(int type)
{
switch(type)
{
case 1:
/*** Do work for type code 1 ***/
return(SUCCESS);
case 2:
critical
{
/*** Do critical type 2 work ***/
}

return(SUCCESS);
default:
if(type > 5) return(ERROR);
/*** Do default work ***/
return(SUCCESS);
}
}


Listing 3
func(int type)
{
/* Begin critical section */
lock();
switch(type)
{
case 1:
/*** Do work for type code 1 ***/
/*** End critical section ***/
unlock();
return(SUCCESS);
case2:
/*** Do work for type code 2 ***/
/*** End critical section ***/
unlock();
return(SUCCESS);
default:
if(type > 5)
{
/*** End critical section ***/
unlock();
return(ERROR);
}
/*** Do default work ***/
/*** End critical section ***/
unlock();
return(SUCCESS);
}
}


Listing 4
prolog lock(HSEM sem, long timeout)
{
int error;
error = DosSemRequest(sem,timeout);
if(error) return(error);
}

epilog unlock(HSEM sem)
{
DosSemClear(sem);
}

update(char *buffer, HSEM sem)
prolog lock(sem,10000);
epilog unlock(sem);
{
/*** UPDATE A SHARED BUFFER ***/

}






























































Pricing A Meal: An Object-Oriented Example In C++


Charles Havener


Charlie Havener is a senior principal engineer at GenRad Inc., where he
specializes in test languages for automatic printed cirucuit board test
equipment. He is also a C++ instructor in the Northeastern University
State-of-the-Art Program. Charlie has masters degrees in electrical
engineering from Cornell University and in computer science from Boston
University. He may be contacted at (508) 369-4400 extension 3302.


The essence of Object-Oriented Programming (OOP) is that it can often provide
a model of the problem domain in software. OOP's direct modeling ability
simplifies design and provides a solid framework on which to graft the
features that end-users inevitably demand for software products. Naturally,
implementing object-oriented designs (OODs) is much easier in a language that
directly supports the three primary characteristics of OOP: inheritance,
dynamic binding, and data abstraction. This article focuses on these
characteristics in three stages, each a short listing, while implementing a
program to price a meal. Only a little knowledge of C+ + is assumed.
Suppose a cafeteria has just installed a new PC-based cash register system,
and the cafeteria manager has chosen us to create the software. Initially, the
system needs only to price the meals, but eventually must also provide weight
conscious customers with the calories in the meal, and do inventory control.
The system must be flexible since the cafeteria often changes prices and
sometimes offers global discounts, on all desserts for example.
Suppose also that our team chooses to develop the system in C++. C++ not only
supports OOP via the three main characteristics -- inheritance, dynamic
binding, and data abstraction -- but also allows the programmers to overload
operators, such as +, and create user-defined data types such as complex, or
String. However, object-oriented modeling problems do not require overloaded
operators.
An object-oriented design begins by defining the objects in the problem
domain. In the cafeteria domain, an obvious object is the meal itself. The
meal is composed of other objects, an Appetizer, an Entree, and a Dessert. To
facilitate the "structuring" of objects, C++ supports two kinds of
inheritance: the isA, kindOf, or subtype inheritance via the class derivation
mechanism; and the partOf or composition inheritance via "layering" objects,
one inside another.
Listing 1 contains our first prototype. (The code should work with any C++ 2.0
compiler, including Zortech C++ for PCs. Copy the Zortech stream.hpp to
stream.h and string.h to strings.h to maximize portability of code to other
compilers). The objects are represented by classes, which are like C structs
that can also contain functions. The Meal class contains three pieces of data,
a, e, and d that represent the three parts of the meal. An integer data member
in the Dessert class remembers the particular kind of dessert when the Dessert
object is instantiated. The same mechanism is used for Appetizer and Entree.
The naming convention used by most C++ programmers requires that the first
letter of a class name be capitalized.
The main() function shows how to create and price the meal.
Meal m(Melon,Fish,Jello);
declares m to be of type Meal. A member function of the class which has the
same name as the class is called a constructor. The compiler invokes the
constructor implicitly when the program declares an object of the class type.
The Meal constructor is passed the enumerated data types, Melon etc., for the
parts of the meal. A strongly-typed language like C++ provides protection
against inadvertent, though not devious, coding mistakes. For example, you
cannot assign an integer (except zero) to a pointer variable. You cannot
invoke Meal x(0,1,2) because the integers are not the same type as the new
types we declared, i.e., ENTREE, DESSERT, and APPETIZER.
The constructor for Meal passes the data along to the constructors for the
Appetizer, Entree, and Dessert objects via the weird syntax of the member
initialization list
: a(aval), e(eval), d(dval)
The actual body of the Meal constructor does nothing { }. The old style printf
(which is not type safe) prints the meal cost.
m.cost();
invokes the cost function on the object m. In other words, m.cost() sends the
cost message to object m. In all of the listings, the function bodies that are
defined within the class declaration are expanded inline wherever they are
used. Inline functions should be small. They are usually used just for private
data access.
At this point, a little thought shows that we must implement a case statement
in each Class to properly compute the cost. That is, depending on the kind of
dessert ordered, the total cost will be different. The cost() function in the
Meal class simply invokes the others, i.e., a.cost()+ e.cost() + d.cost() to
obtain the total cost. The little red flag should go up! This is not a good
OOD. Whenever you would use a case statement in the class implementation,
something is probably wrong. As my OOP instructor often said, "narrow and
deep, narrow and deep". The class hierarchy is not deep enough. We need more
objects, one for each kind of Appetizer, etc.


A Second Implementation


Listing 2 shows a much improved implementation. Diagram 1 shows the class
hierarchy. The Appetizer, Entree and Dessert classes have been made into
abstract classes, i.e., ones that are not meant to be instantiated, just
inherited from. The isA or kindOf inheritance is shown as a tree with abstract
classes as dashed boxes. To avoid arguments about which way the arrows should
point on the lines, I just leave them off. The partOf inheritance is
effectively portrayed by drawing the boxes for the class objects inside the
Meal object box.
To prevent overcrowding the name space, I add _obj to the publicly derived
subclasses so that the enumerated names (Fish, Jello, etc.) can stay the same.
(If the derivation were private, the derived classes would not be considered a
kind -- Of the base class. Private derivations are used when the goal is
merely code reuse.)
Another major change in the second implementation is that the Meal object
contains pointers to the component parts, rather than containing the component
parts themselves. In general, using pointers makes the dynamic binding of
functions at runtime in C++ work best. Note that you use the component parts
in main() the same as before, even though the underlying implementation has
been changed radically. The Meal constructor no longer passes data back to
embedded object constructors via the member init list. Instead, the new
operator creates the constituent part objects. The new operator allocates
space for the object, much like C's malloc() function. However, new is type
safe and automatically invokes the constructor, if any, for a new object.
Constructors ensure that new data objects are initialized properly. In this
example, the objects don't contain data so nothing requires explicit
initialization.
The Meal class also has a destructor now, ~Meal(). The destructor cleans up
the space allocated by the new operator when the meal object goes out of
scope. In this example ~Meal() is invoked when the program returns from
main().


Runtime Binding


An abstract class serves as a central location for common data or functions
used by the classes derived from it. If in the abstract base class we declare
cost() as virtual, the compiler arranges for runtime binding to occur whenever
that function is invoked on a pointer to the base class. For example, if p is
a pointer to dessert, then p->cost() will access the correct dessert cost
function. In other words, if you have
p = new Pie_obj;
then p->cost() will be 1.50. If the derived class omits the cost() function,
then the base class cost() function is used.
Each class with virtual functions actually contains a pointer to a table of
function pointers. When we write p->cost(), the compiler generates something
like p->vtbl[1]. Since every derived class has its own vtbl (virtual table),
the correct function is invoked (see Diagram 2). A function that is virtual in
the base class is always virtual, even if the derived classes leave off the
virtual keyword.
A strongly typed language like C++ generally doesn't let you assign a pointer
of one type to a pointer of another type. For public class derivations there
is an important exception: you may assign to a variable declared to be a
pointer to a base type, a pointer to any type that was publicly derived from
that base type.
Unlike many OOP languages, C++ ensures at compile time that if you invoke the
cost() function on a dessert object, that function will exist. If you don't
redeclare the function in a derived class, then the base class stopper
function is invoked. Is there a better way? Yes, pure abstract classes.
Listing 3 shows the final prototype. C++ 2.0 allows virtual function
declarations to be followed by = 0;. This somewhat strange syntax means every
derived class must implement the function or again define it as pure virtual.
The compiler checks the declarations and issues an error message if this is
not so. The compiler check ensures that we don't inadvertently leave out the
cost() function.
This third version adds some new features. This week all desserts are 25% off
except for Pies. By adding a virtual discount() function to the Dessert
abstract class we automatically, via inheritance, apply it to all derived
desserts. The Meal class's cost function was changed to multiply the dessert
cost (d->cost()) by the discount (d->discount ()).
Since pies are the only desserts that have no discount, a discount()
multiplier of 1.0 is placed in the Pie_obj class. Every object has a text()
function to print information about itself. The Meal::print() member function
in Listing 3 uses the standard stream I/O, which is type safe, instead of
print f(). In general, using cout, cin and cerr via the overloaded insertion
<< and extraction >> operators is safer than printf() and you can easily
overload the operators for new class types you create; printf() only knows
about built-in types like double and char *. However, overloading operators is
material for another article. [Ed. note: see David Clark's "A Date Object In
C++" in the June 1990 C Users Journal for an example of operator overloading.]
The final design is clean and elegant. The class framework models the real
world problem domain directly. Adding calorie information would be much like
cost. The class hierarchy actually needs to be deepened further to represent
different kinds of cakes and pies. Instead of an item's cost residing in the
object, the cost() member function could access a database to fetch the most
recent price increase. This short example demonstrates just some of what OOP
(and C++ in particular) has to offer.
Diagram 1 Class Hierarchy
Diagram 2

Listing 1
// meal1.cxx - first attempt at OOP design
#include <stream.h>

enum ENTREE {SteakPlatter,Fish};
enum DESSERT {Pie,Cake,Jello};
enum APPETIZER {Melon,ShrimpCocktail};

class Dessert
{
int kind;
public:
Dessert{int what=Pie) { kind = what;}
cost();
};

class Entree
{
int kind;
public:
Entree{int what=SteakPlatter) { kind = what;}
cost();
};

class Appetizer
{
int kind;
public:
Appetizer(int what=Melon) { kind = what;}
cost();
};

class Meal
{
Appetizer a;
Entree e;
Dessert d;
public:
Meal{APPETIZER=Melon,ENTREE=Fish,DESSERT=Jello);
cost();
};

//-------------------------------------------
// class member function definitions

int Meal::cost() {return 1;}
Meal::Meal(APPETIZER aval,ENTREE eval,DESSERT dval)
: a(aval),e(eval),d(dval) { }

//--------------------------------------------
main()
{
Meal m(Melon,Fish,Jello);
printf("Price %d\n",m.cost());
}


Listing 2
// meal2.cxx - second attempt at pricing a meal
#include <stream.h>

enum ENTREE {Steak,Fish};
enum DESSERT {Pie,Cake,Jello};
enum APPETIZER {Melon,ShrimpCocktail};

class Dessert // An abstract class - never instantiated by itself
{

public:
virtual double cost()
{ printf("Dessert error, no cost() provided\n"); return 0;}
};

class Jello_obj : public Dessert
{
public:
double cost() {return .60;}
};

class Pie_obj : public Dessert
{
public:
double cost() {return 1.50;}
};

class Cake_obj : public Dessert
{
public:
};

class Entree
{
public:
virtual double cost()
{ printf("Entree error, no cost() provided\n"); return 0;}
};

class Fish_obj : public Entree
{
public:
double cost() {return 4.00;}
};

class Steak_obj : public Entree
{
public:
double cost() {return 7.50;}
};

class Appetizer
{
public:
virtual double cost()
{ printf("Appetizer error, no cost() provided\n"); return 0;}
};

class Cocktail_obj : public Appetizer
{
public:
double cost() { return 2.00;}
};

class Melon_obj : public Appetizer
{
public:
double cost() { return .85;}
};


class Meal
{
Appetizer *a;
Entree *e;
Dessert *d;
public:
Meal (APPETIZER=Melon,ENTREE=Fish,DESSERT=Jello);
˜Meal();
double cost();
};

//------------------------------------------
// class member function definitions

double Meal::cost() {return d->cost() + a->cost() + e->cost(); }

Meal::Meal(APPETIZER aval,ENTREE eval,DESSERT dval)
{
if ( dval == Jello ) d = new Jello_obj;
else if ( dval == Pie ) d = new Pie_obj;
else d = new Cake_obj;
if ( eval == Steak ) e = new Steak_obj;
else e = new Fish_obj;
if ( aval == Melon ) a = new Melon_obj;
else a = new Cocktail_obj;
}
Meal::˜Meal () { delete a; delete e; delete d; }
//-------------------------------------------

main()
{
Meal m1(Melon,Fish,Jello);
Meal m2(Melon,Steak,Pie);
Meal m3(ShrimpCocktail,Steak,Cake);
printf("Meal1 Price %6.2f, Meal2 price %6.2f, Meal3 price %6.2f\n"
,m1.cost(),m2.cost(),m3.cost());
}
// ---- sample output below -----
Dessert error, no cost() provided
Meal1 Price 5.45, Meal2 price 9.85, Meal3 price 9.50


Listing 3
// meal3.cxx - third attempt at 00P meal pricing program
#include <stream.h>

enum ENTREE {Steak,Fish};
enum DESSERT {Pie,Cake,Jello};
enum APPETIZER {Melon,ShrimpCocktail};

class Dessert // An abstract class
// - never instantiated by itself
{
public:
virtual double cost() = 0; // pure virtual ==>
// abstract class
virtual double discount() { return .75; } // 25% off
virtual const char* text() = 0;

};

class Jello_obj : public Dessert
{
public:
double cost () {return .60;}
const char* text() { return "Jello";}
};

class Pie_obj : public Dessert
{
public:
double cost() {return 1.50;}
double discount() { return 1.00; } // no discount
const char* text() { return "Pie ";}
};

class Cake_obj : public Dessert
{
public:
double cost() {return 1.00;}
const char* text() { return "Cake ";}
};

class Entree
{
public:
virtual double cost() = 0;
virtual const char* text() = 0;
};

class Fish_obj : public Entree
{
public:
double cost() {return 4.00;}
const char* text() { return "Fish ";}
};

class Steak_obj : public Entree
{
public:
double cost() {return 7.50;}
const char* text() { return "Steak ";}
};

class Appetizer
{
public:
virtual double cost() = 0;
virtual const char* text() = 0;
};

class Cocktail_obj : public Appetizer
{
public:
double cost() { return 2.00;}
const char* text() { return "Cocktail ";}
};


class Melon_obj : public Appetizer
{
public:
double cost() { return .85;}
const char* text() { return "Melon ";}
};

class Meal
{
Appetizer *a;
Entree *e;
Dessert *d;
public:
Meal(APPETIZER=Melon,ENTREE=Fish,DESSERT=Jello);
˜Meal ();
double cost();
void print();
};

//-------------------------------------------
// class member function definitions

double Meal::cost() {return d->cost()*d->discount() +
a->cost() + e->cost(); }

Meal::Meal(APPETIZER aval,ENTREE eval,DESSERT dval)
{
if [ dval == Jello ) d = new Jello_obj;
else if ( dval == Pie ) d = new Pie_obj;
else d = new Cake_obj;
if ( eval == Steak ) e = new Steak_obj;
else e = new Fish_obj;
if ( aval == Melon ) a = new Melon_obj;
else a = new Cocktail_obj;
}

Meal::˜Meal() { delete a; delete e; delete d; }

void Meal::print()
{
cout << a->text() << e->text() << d->text() <<
", Meal cost =" << a->cost() + e->cost() +
d->cost()*d->discount() << "\n";
}
//-------------------------------------------
main()
{
Meal m1(Melon,Fish,Jello);
Meal m2(Melon,Steak,Pie);
Meal m3(ShrimpCocktail,Steak,Cake);
m1.print(); m2.print(); m3.print();
}
// ---- sample output below -----

Melon Fish Jello , Meal cost = 5.3
Melon Steak Pie , Meal cost = 9.85
Cocktail Steak Cake , Meal cost = 10.25


































































Writing Your Own Quicksort


Mark Nelson


Mark Nelson is a programmer for Greenleaf Software in Dallas. Mark works on
Greenleaf's line of C libraries for the MS-DOS, 0S/2, UNIX, Xenix, and VMS
operating systems.


Sorting is a fundamental problem in computer programming and the quicksort, as
described by Hoare[1] and Knuth[2], is generally recognized as one of the best
general-purpose sorting algorithms. Quicksort is useful enough that a standard
library routine, qsort(), has been included in many C programming libraries
such as the UNIX C, Borland Turbo C and Microsoft C libraries. qsort() is now
in the ANSI C standard. These facts have encouraged C programmers to treat the
quicksort as a "black box" and to be relatively unconcerned about how it
operates.
qsort() has its limitations. It lacks the flexibility we might want for a
library routine. qsort() operates only on fixed-width arrays in RAM. To sort
records in a file, for example, you would have to sort an index array in RAM
using qsort() and then rearrange the records based on the index array. In the
MS-DOS world, qsort() can only handle indices 16 bits or less. Sorting a list
with 250,000 records would not be possible.
In some cases, though, qsort() is too flexible. Almost all implementations of
qsort() require a call to a user-supplied comparison routine definitely a
detriment to performance. Hand coding a sort routine for a specific
application will frequently produce a sort routine that runs much faster than
qsort().
It's reasonable to want a general-purpose source code version of the quicksort
algorithm that could be inserted into programs needing something more
versatile than qsort(). This article describes such a routine.


Terminology


Here are my assumptions. The sorting operations I discuss are designed to sort
arrays of records. A record is a collection of data, but the record does not
need to be fixed length. Records are accessed using indices, the first index
being 0. The sort operation will sort N records, so the indices will run from
0 to N-1. The sort will rearrange the records in ascending order, based on a
comparison function. The comparison function can use any criteria you want to
determine which record should precede another. However, in general, records
will be sorted on one or more keys, which are usually a single element in the
record. Finally, the sort routine also needs a swap function which, given
their indices, can swap two records.
A truly general-purpose sorting routine only needs three input parameters: N,
to tell it how many records to sort; a comparison function, to tell it when to
swap; and a swap function, to perform the swap. Note that the ANSI qsort()
function does away with the swap function by assuming that the data will
always consist of fixed-length records in RAM. This makes the code more
efficient, but it does away with some of the flexibility a good sorting
routine needs.


Quicksort


The reasons for using the quicksort algorithm are fairly well established. It
runs much faster than the simpler bubble and insertion sorts. These two sorts
can both be coded in just a few lines, but have the unfortunate property of
running in N2 time. Quicksort runs in N*log(N) time, which is considerably
faster as N becomes very large. Other N*log(N) sorting algorithms exist, but
at least in the general case, the quicksort will usually run faster.
The idea underlying the quicksort is simple. To sort a group of records, you
first select an arbitrary record. The initial quicksort algorithm described
below selects record number 0. Then you partition the entire array of records
into three distinct groups by a series of exchanges. The first group consists
of all the records having keys below that of the selected record. The second
groups consists of the selected record by itself. The third group consists of
all the records with keys above that of the selected record. When you place
these three groups in order in the array, the arbitrarily selected record will
be in its proper place in the list. All the records in the first group are
going to stay in the first group, and all of the records in the third group
are going to stay in the third group.
This completes the first portion of the quicksort algorithm. Now, the same
algorithm will be applied recursively to the lower and upper partitions. The
process continues until the recursion creates partitions that are only one
record long.
In principle, this seems simple enough. You pick an arbitrary record in your
array, and put it exactly where it belongs in the final sorted array. At the
same time, you make sure that all records that should be lower than the
arbitrary record are positioned below it in the array, and all of the records
that belong above it are positioned above it. A simple idea, but the
implementation is not necessarily intuitive.
Traditionally, you perform the first part of the quicksort algorithm by using
a pair of pointers that start at either end of the array and work their way
towards the middle. For example, if your selected record was at position 0,
you'd start the low pointer at position 1 and the high pointer at position N-1
(see Figure 1).
The partitioning process starts by increasing the low pointer. The record
pointed to by the low pointer is compared to the key record. If the record is
lower than the key record, the low pointer can be bumped up a notch. You
repeat the process until either the low pointer is pointing to a record higher
than the key record or has gone to the end of the list. In Figure 1, the low
pointer would be advanced to position 5.
After the low pointer has been pushed as far up as it can go, you start
lowering the high pointer, using a similar decision rule. If the high pointer
is pointing to a record that is higher than the key record, and the high
pointer is not below the low pointer, the high pointer is decremented. In
Figure 1, the high pointer will be bumped down two notches to record 8.
At this point every record above the high pointer also belongs above the key
record, and every record below the low pointer belongs below the key record.
If there are still some records between the high pointer and the low pointer,
then we haven't finished partitioning the array into the higher and lower
sections. To continue, we exchange the two records pointed to by the high
pointer and the low pointer, putting them into their proper places in the
array. We can then continue the process of moving the low pointer up and the
high pointer down. Figure 2 shows the array before and after the switch. As
you can see, all the names above the high pointer belong above KURT in the
list, and all the names below the low pointer belong below KURT.
Figure 3 shows that after another increment/decrement cycle, the high pointer
has fallen below the low pointer, meaning the high pointer now points to the
spot in the array where the key record KURT belongs. An exchange is made, and
now KURT is in the right spot.
Not only is KURT in his correct spot, but the main array has now been split
into two smaller arrays. All of the records above KURT will stay above KURT.
All of the records below KURT will stay below KURT. Now you have two smaller
arrays to sort.
Now comes the recursive part of the algorithm. After you've located the
position of the key record, you can apply the quicksort algorithm to the three
high records and the six low records and continue to subdivide the array
recursively into smaller and smaller partitions until all have been sorted.
Figure 4 shows the array after the first sort. Note the resulting division of
the array into partitions one and two. The algorithm then moves on to sort
partition one. Figure 4 shows the array after partition one has been divided
into partitions three and four.
Figure 5 shows the array after partition three is sorted. There are now four
remaining partitions to sort, but they will remain unchanged. The quicksort
algorithm effectively broke the sorting operation into several smaller sorts.
The first operation yielded partitions one and two, which held six and three
records respectively. The second operation broke partition two into partitions
three and four, which had four and two records. Finally, partition three was
subdivided into a two-record and a one-record partition.
The array in Figure 5 still must go through four small sort operations.
Partitions five and four will each be sorted once. Partition two will be
broken down into two more partitions, one of which will have to be sorted
again.


The Algorithm


Figure 6 shows a pseudo-code description of the algorithm; quicksort is both
attractive and easily defined in a language that supports recursion. Not that
you can't (or shouldn't) implement it in languages that don't support
recursion. Note that Figure 6 defines the algorithm, not the implementation.
The algorithm is based on ordinary comparisons between array (key) elements,
such as ARRAY[HIGH] <= ARRAY[FIRST], where [HIGH] and [FIRST] are records.
(The algorithm definition should be rewritten with a generic comparison
function replacing the specific comparison.)


Analysis


The quicksort algorithm runs faster than the N2 time characteristic of slower
methods because of its divide and conquer philosophy. If an array has 100
records, dividing it into two 50-record sorts may speed things up. After all,
100 times through an array of 100 records yields 10,000 potential exchanges
(100*100 = 10,000) but 50 times through 50 records in two partitions produces
only 5000. Even better would be to split the file into four arrays, since
4*25*25=2500. So, if the splitting algorithm is efficient, quicksort can
outperform N2 sorts. As it happens, the partitioning method described in the
algorithm is efficient, at least for large numbers. For sorts of fewer than 10
or so records, however, the algorithm slows down.
During the sort operation of our first example, the quicksort algorithm
performed 11 exchanges and 46 comparisons. Performing a bubble sort on the
same data takes 21 exchanges and 55 comparisons. The gap between the two
algorithms will continue to widen as the size of the array increases. As a
test, I increased the array size to 100 elements and did another comparison.
The quicksort algorithm performed 169 exchanges and 1045 comparisons. The
bubble sort performed 2360 exchanges and 4950 comparisons. The insertion sort
did a little better, with 1280 exchanges and 2459 comparisons.


The Code



There are at least three good reasons to code your own quicksort routine:
The qsort() routine supplied with your compiler is too slow.
The qsort() routine supplied with your compiler won't work with your problem
because you have too many records, because they're stored in a file, or
because they're not fixed length.
You're working in a language other than C.
The code I will develop below addresses all three issues.
The minimal quicksort routine in Listing 1 works fairly well. In fact, if you
write a short main() that sets up a big array of random integers, you'll
probably find this routine runs faster than the qsort() supplied with your
compiler. Performing the comparison function in-line with the code gives this
minimal quicksort a big advantage.
To compare the performance of this program to qsort(), write a short driver
program that creates a big array of integers (see Listing 2). To use your
qsort(), you'll also need to create a compare routine to pass to qsort().
Unfortunately, C compilers lack standardization in their implementation of
time/timer functions. My driver program uses the Microsoft C timer functions
to give a rough look at the elapsed time.
My timing routines have only a one-second resolution, so be sure your data and
ARRAY_SIZE constant are set up for a sort of at least ten or so seconds.
Otherwise the jitter in the timer will cause unacceptably high error levels.
Using a test program nearly identical to the one shown in Listing 2, I
obtained the test results shown in Figure 7. The quicksort algorithm exhibits
nearly straight-line behavior, not exponential. This test graphically
demonstrates the superiority of the quicksort algorithm. An interesting
exercise would be to plot the behavior of the bubble and insertion sorts over
the same range of array sizes.
I recompiled the program using my_qsort() routine instead of qsort(). I
changed the line
qsort( test_array, ARRAY_SIZE,
sizeof(int), compare );
to read
my_qsort( test_array, 0,
ARRAY_SIZE-1 );
I compiled the program with optimization turned off. Figure 8 shows results of
the same series of timing tests.
Once again, you can see that quicksort performance is nearly linear. However,
note the improved performance gained by coding the comparison function in
line. Calling the comparison function indirectly adds a heavy burden to
qsort(), and almost doubles the runtime.
This qsort() routine is one to tuck in your toolbox. However, there are some
improvements that can be made to this algorithm.


Recursion -- Friend Or Foe?


There is something appealing about using recursion in an algorithm. When used
properly, it can simplify a definition. However, a "good" algorithm definition
does not necessarily make "good" code. my_qsort() uses a recursive call to
itself, and this can cause several problems.
While C and Pascal (and some BASIC implementations) support recursion, other
languages don't. Implementing recursion in Assembly language can be simple or
extremely difficult depending on your target machine.
Secondly, recursion can place a big burden on the runtime stack. In the MS-DOS
implementations of C, the stack size is usually relatively small. A recursive
program that goes a few hundred levels deep may very well blow up with a
"stack overflow" message. Each call to my_qsort() burdens the stack with a
return address, a few parameters, and a few automatic variables. A slightly
different implementation can eliminate most of this baggage.


A Quicksort Shortcoming


As you will recall, quicksort managed to improve on N2 performance by
repeatedly partitioning the data array into smaller arrays. A little
mathematical analysis shows that the best way to partition the array will be
to divide it into equal halves, repeatedly. 100*100 gives us 10,000. A single
partition into two equal parts yields 50*50 + 50*50, which is only 5,000.
Another partition gives (25*25) + (25*25) + (25*25) + (25*25), which is down
to 2,500.
Unfortunately there is one case where even quicksort will show this
performance when working on an already-sorted array!
My code arbitrarily picks record 0 to start the quicksort. The low pointer
starts at element 1 and will not advance, since it already points to an
element greater than element 0. The high pointer is then decremented all the
way down through the array until it reaches element 0, where the partitioning
is complete. Since the high pointer has decremented all the way down to 0, the
next partition to be sorted is now only one element smaller than the previous
one. This process continues, over and over, with the partition decrementing by
only one element each time.
It's safe to assume that any qsort() routine will encounter sorted data at
least some of the time. When it does, the quicksort routine will yield
exceptionally long sort times. You can address this problem in a couple of
fashions.
One way is to check if the data array is already sorted. A quick scan through
the data won't take long, and if it indicates an already sorted array, you can
simply exit the routine without doing anything. From the caller's point of
view the quicksort will appear to run lightning fast on sorted data. Microsoft
has taken this approach in its runtime library.
Checking to see if the array is already sorted will drastically improve your
sort times for the fully sorted array, but it won't help a bit in the other
pathological case, the reverse sorted array. And a "very nearly" sorted array
will also perform extremely poorly.
To demonstrate Microsoft's poor design, I rewrote the test program shown in
Listing 2 to sort the array again after it has finished the first sort. To
make the array "nearly sorted", I just changed a single element of the test
array so that it won't pass the "already sorted" test run.
The resulting test program is shown in Listing 3. Once you adjust ARRAY_SIZE
so the first sort takes a reasonably long time, you'll find that the second
sort seems to take forever. At one point, my random array sorted in about
three seconds, but the nearly sorted array took 300 seconds!
There are several ways of tackling this problem, and the method you choose
will depend on the assumptions you make about your data. One good assumption
to make is that your data is in one of two states: either "fairly sorted", or
random. Fairly sorted data consists of an array that has been tinkered with
only slightly after a previous sort, or is still sorted. I won't even try to
rigorously define "random data," other than to say that it doesn't show any
apparent organization.
If your goal is to choose a key record such that the record nearly bisects the
array when sorted, you can see where our current algorithm is flawed. On a
random data set, record[0] is just as likely to end up in the center of the
sorted array as any other record. But in a partially sorted array, record[0]
is more likely to end up on the near or far end of the array. So what would be
a better choice for a first record? The natural choice would be the record in
the middle of the partition to be sorted.
You can do this by simply performing an exchange of record[0] and
record[size/2] at the start of the sort. In a random set of data, the middle
record is as good a choice as any, and in a nearly sorted array, the middle
record ought to nearly partition the array down the middle. In fact, if we are
using this version of the quicksort algorithm, it is somewhat of a challenge
to devise an array ordering that will generate worstcase timing.
Choosing the "median of three" for the key record is somewhat more
sophisticated. You compare record[0], record[size/2], and record[size-1], and
move the record located between the other two (the median) to position 0 where
it serves as the key record. "Median of the three" requires more code than the
previous approach, but it tends to work better in some less pathological
cases, and probably improves the speed of the sort on many sets of data.
For simplicity's sake, my quicksort routine uses the first option. Borland's
documentation states that Turbo C uses the "median of three" approach, and my
tests show that the Borland qsort() routine does very well on sorted data. In
fact, it tends to sort a nearly sorted array faster than a random one,
behaving almost like most human beings do.


The Final Enhancement


Earlier, I pointed out that the quicksort algorithm tends to run in N*log(N)
time as N increases, while the bubble sort and insertion sort algorithms tend
to run in N2 time. However, when N is quite small, say under 25, the insertion
sort will actually run faster than a quicksort over the same size data set.
Therefore, as the quicksort routine recursively develops smaller and smaller
partitions, you'll reach a point where an insertion sort should sort the
resulting partition. You can do this by changing the last few lines of the
sort program in Listing 1 to look like Listing 4.
Insertion sorts are frequently described by analogy as the process of picking
up playing cards from a stack and forming a sorted hand with them. Each time a
new card is selected, you find its correct spot in the hand. All the cards
that will reside above it in the hand are moved up, and you insert the card in
its correct spot. Listing 5 shows a code fragment that accomplishes this.
The insertion sort routine is short and simple. The main loop goes through all
the possible values of i, where i represents the card we'll pick up. The
variable j represents the position you're testing as the insertion point.
First you try to insert the new card at the highest possible position. Then
you work your way down to the lowest position, and shuffle all the rejected
cards up by one to make room for the card to be inserted.
Ironically, this final implementation of quicksort does most of the actual
sorting by the insertion sort! The quicksort routine simply creates several
smaller partitions to be sorted with a different method.
Knuth credits R. Sedgewick with one final refinement of the quicksort. Rather
than performing an insertion sort on each small partition as qsort() spits
them out, it's more efficient to leave the small partitions unsorted until
qsort() has finished. Only at this point, should the entire array be subjected
to what amounts to a big insertion sort over the entire range. Although you'll
still have to do exactly the same amount of comparisons and exchanges, you'll
reduce the overhead, since the insertion sort will only be called once.
The final question is: what is a good value of SMALLEST_QSORT_PARTITION? After
running test programs on various data sets, I settled on a value of 15. Any
value between about 5 and 25 gave consistently good results. Bear in mind,
though, that the optimum value will change when the relative costs of
comparisons and exchanges differ from the ones I used here.



Conclusion


The final, well-adjusted version of my_qsort() is shown in Listing 6. In this
version, I incorporate the refinements suggested in the article, but the
result is still a fairly short piece of code. When coded with in-line
comparisons and exchanges, my_qsort() handily beats the qsort() routines that
come with my compilers. You should also find it relatively easy to translate
into other languages, if you need to.


Bibliography


[1] Hoare, C. Computer J. 5, 1962, pp. 10-15.
[2] Knuth, Donald. The Art of Computer Programming, Volume 3, Sorting and
Searching, 1973, Addison-Wesley, Reading, Mass., pp. 105-139.
Figure 1 An Unsorted Array of Strings
High pointer-> 10 : "WALKER"
 9 : "SANDI"
 8 : "DON"
 7 : "HERBERT"
 6 : "GEORGE"
 5 : "PATRICK"
 4 : "JOEY"
 3 : "ANDY"
 2 : "KATIE"
Low pointer -> 1 : "DENISE"
Key record -> 0 : "KURT"
Figure 2
 10 : "WALKER"
 9 : "SANDI"
High pointer-> 8 : "DON"
 7 : "HERBERT"
 6 : "GEORGE"
Low pointer -> 5 : "PATRICK"
 4 : "JOEY"
 3 : "ANDY"
 2 : "KATIE"
 1 : "DENISE"
Key record -> 0 : "KURT"
 Before the Exchange

 10 : "WALKER"
 9 : "SANDI"
High pointer-> 8 : "PATRICK"
 7 : "HERBERT"
 6 : "GEORGE"
Low pointer -> 5 : "DON"
 4 : "JOEY"
 3 : "ANDY"
 2 : "KATIE"
 1 : "DENISE"
Key record -> 0 : "KURT"
 After the Exchange
Figure 3
 10 : "WALKER"
 9 : "SANDI"
Low pointer -> 8 : "PATRICK"
High pointer-> 7 : "HERBERT"
 6 : "GEORGE"
 5 : "DON"
 4 : "JOEY"
 3 : "ANDY"
 2 : "KATIE"

 1 : "DENISE"
Key record -> 0 : "KURT"
 Before the Exchange

 10 : "WALKER"
 9 : "SANDI"
Low pointer -> 8 : "PATRICK"
High pointer-> 7 : "KURT"
 6 : "GEORGE"
 5 : "DON"
 4 : "JOEY"
 3 : "ANDY"
 2 : "KATIE"
 1 : "DENISE"
Key record -> 0 : "HERBERT"
 After the Exchange
Figure 4
 10 (2) : "WALKER"
 9 (2) : "SANDI"
 8 (2) : "PATRICK"
 7 (*) : "KURT"
High-> 6 (1) : "GEORGE"
 5 (1) : "DON"
 4 (1) : "JOEY"
 3 (1) : "ANDY"
 2 (1) : "KATIE"
Low -> 1 (1) : "DENISE"
Key -> 0 (1) : "HERBERT"

 Before Sorting Partition 1

 10 (2) : "WALKER"
 9 (2) : "SANDI"
 8 (2) : "PATRICK"
 7 (*) : "KURT"
 6 (1) : "KATIE"
Low -> 5 (1) : "JOEY"
High-> 4 (1) : "HERBERT"
 3 (1) : "ANDY"
 2 (1) : "GEORGE"
 1 (1) : "DENISE"
Key -> 0 (1) : "DON"

 After Sorting
Figure 5
 10 (2) : "WALKER"
 9 (2) : "SANDI"
 8 (2) : "PATRICK"
 7 (*) : "KURT"
 6 (1) : "KATIE"
 5 (1) : "JOEY"
 4 (1) : "HERBERT"
High-> 3 (1) : "ANDY"
 2 (1) : "GEORGE"
Low -> 1 (1) : "DENISE"
Key -> 0 (1) : "DON"

 Before Sorting Partition 3


 10 (2) : "WALKER"
 9 (2) : "SANDI"
 8 (2) : "PATRICK"
 7 (*) : "KURT"
 6 (1) : "KATIE"
 5 (1) : "JOEY"
 4 (1) : "HERBERT"
Low -> 3 (1) : "GEORGE"
High-> 2 (1) : "DON"
 1 (1) : "DENISE"
Key -> 0 (1) : "ANDY"

 After Sorting 3
Figure 6 The Quicksort Alogrithm in Pseudo-code
ALGORITHM QUICKSORT (ARRAY, FIRST, LAST)
 LOW = FIRST+1;
 HIGH = LAST;
 DO
 WHILE (ARRAY[LOW] <= ARRAY[FIRST] AND LOW <= LAST)
 LOW = LOW +1
 WHILE (ARRAY[HIGH] >= ARRAY[FIRST] AND HIGH > FIRST)
 HIGH = HIGH - 1
 IF HIGH > LOW
 exchange ARRAY[LOW] and ARRAY[HIGH]
 UNTIL LOW > HIGH
 exchange ARRAY[FIRST] and ARRAY[HIGH]
 IF (HIGH - FIRST) > 1 THEN
 QUICKSORT(ARRAY, FIRST, HIGH-1)
 IF (LAST - HIGH) > 1 THEN
 QUICKSORT(ARRAY, HIGH+1, LAST)
END OF ALGORITHM
Figure 7 Quicksort Performance
Figure 8 Comparison of qsort() vs. my_qsort ()
Array Size Average qsort() Time Average my_qsort() Time
---------------------------------------------------------
 7500 3.6 seconds 1.8 seconds
10000 4.8 seconds 2.0 seconds
12500 6.0 seconds 2.4 seconds
15000 7.6 seconds 3.0 seconds
17500 8.8 seconds 3.8 seconds
20000 10.6 seconds 4.6 seconds
22500 12.0 seconds 5.0 seconds
25000 13.0 seconds 5.4 seconds
27500 14.8 seconds 6.0 seconds
30000 16.2 seconds 6.6 seconds
32500 17.8 seconds 7.6 seconds

Listing 1 A Minimal Quicksort
my_qsort(int data[],int first,int last)
{
int i;
int j;
int temp;

i = first + 1;
j = last;
while ( i <= j )
{
while ( data[i] <= data[first] && i <= last )

i++;
while ( data[j] >= data[first] && j > first )
j--;
if( j > i )
{
temp = data[i];
data[i] = data[j];
data[j] = temp;
}
}
temp = data [first];
data[first] = data[j];
data[j] = temp;
if ( (j-first) > 1 )
my_qsort( data, first, j-1 );
if ( (last-j) > 1 )
my_qsort( data, j+1, last );
}


Listing 2 A Driver Program to Test Quicksort Performance
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define ARRAY_SIZE 1000

int test_array[ARRAY_SIZE];

int compare(int *a,int *b)
{
return(*a-*b);
}

main()
{
int i;
time_t start_time;
time_t stop_time;

for ( i=0 ; i<ARRAY_SIZE ; i++ )
test_array[i] = rand();
time( &start_time );
qsort( test_array, ARRAY_SIZE, sizeof(int), compare );
time( &stop_time );
printf( "%f seconds elapsed.\n",
difftime(stop_time,start_time) );
for ( i=0; i<ARRAY_SIZE-1 ; i++ )
if ( test_array[i] > test_array[i+1] )
printf( "Mismatch at position %d\n", i );
}


Listing 3 Demonstrating Worst-Case qsort() Performance
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define ARRAY_SIZE 1000


int test_array[ARRAY_SIZE];

int compare( int *a, int *b )
{
return( *a - *b );
}

main()
{
int i;
time_t start_time;
time_t stop_time;

for ( i=0; i<ARRAY_SIZE ; i++ )
test_array[i] = rand();
time( &start_time );
qsort( test_array, ARRAY_SIZE, sizeof(int), compare );
time( &stop_time );
printf( "%f seconds elapsed.\n", difftime( stop_time, start_time));
test_array[0] += 100;
time( &start_time );
qsort( test_array, ARRAY_SIZE, sizeof(int), compare );
time( &stop_time );
printf( "%f seconds elapsed.\n", difftime( stop_time, start_time));
for (i=0 ; i<ARRAY_SIZE-1 ; i++ )
if ( test_array[i] > test_array[i+1] )
printf( "Mismatch at position %d\n", i );
}


Listing 4 Using Insertion Sort on Small Arrays
if ((j - first) > SMALLEST_QSORT_PARTITION)
my_qsort( data, first, j-1 );
else
insertion_sort( data, first, j-1 );

if ((last - j) > SMALLEST_QSORT_PARTITION )
my_qsort( data, j+1, last );
else
insertion_sort( data, j+1, last );


Listing 5 The Insertion Sort
insertion_sort( int data[], int first, int last )
{
int i;
int j;
int temp;

for ( i=first+1 ; i <= last ; i++ )
{
temp = data[i];
j = i-1;
while (j >= first)
{
if ( data[j] > temp )
{
data[j+1] = data[j];

j--;
}
else
break;
}
data[ j+1 ] = temp;
}
}


Listing 6 The Final Version of my_qsort()
/*
* Copyright (C) 1989, Mark R. Nelson
*
* This routines sorts an integer array, data[], that has
* size elements. It is used as a boilerplate function
* for a Quicksort, and is intended to replace the ANSI
* qsort() routine.
*
*/
#define SMALLEST_PARTITION 15

my_qsort( int *data, int size )
{
register int i;
register int j;
int first;
int last;
int temp;
int stack_pointer;
struct {
int left;
int right;
} stack[100];
/*
* The stack structure is used to hold a list of left and
* right pairs. These pairs define partitions that still
* need to be sorted. When the stack pointer drops below
* zero, it means all the quicksort partitions have been
* done.
*/
stack_pointer = 0;
stack[0].left = 0;
stack[0].right = size-1;
/*
* The while loop below here is the Quicksort partitioning
* code. It runs until there are no more (left,right)
* pairs left on the stack. When it is complete, the
* insertion sort over the whole array still needs to be
* done.
*/
while (stack_pointer>=0)
{
first = stack[stack_pointer].left;
last = stack[stack_pointer].right;
stack_pointer--;
/*
* Here is where I swap the first and middle records, in
* hopes of finding a good key.

*/
temp = data[ first ];
data[first] = data[ (last+first)/2 ];
data[ (last+first)/2 ] = temp;
/*
* The while loop here is the main loop of the Quicksort.
* It moves i up and j down until j is positioned at
* the correct spot where data[0] belongs. There may
* or my not be some exchanges along the way.
*/
i = first+1;
j = last;
while ( i <= j )
{
while ( data[i] <= data[first] && i <= last )
i++;
while ( data[j] >= data[first] && j > first )
j--;
if( j > i )
{
temp = data[i];
data[i] = data[j];
data[j] = temp;
}
}
/*
* At this point, j is pointing to the final position in
* the array for data[first]. After an exchange, data[j]
* is set, and will not have to be moved again, ever.
*/
temp = data[first];
data[first] = data[j];
data[j] = temp;
/*
* After the partitioning is complete, there are two
* smaller partitions that may need to have a Quicksort
* performed on them. This is done here. A good
* programmer would probably check for stack overflow here.
*/
if ( (j-first) >= SMALLEST_PARTITION )
{
stack[++stack_pointer].left = first;
stack[stack_pointer].right = j-1;
}
if ( (last-j) >= SMALLEST_PARTITION )
{
stack[++stack_pointer].left = j+1;
stack[stack_pointer].right = last;
}
}
/*
* When we reach this point, the Quicksort portion of the
* routine is complete, and it is time for the insertion sort.
*/
for ( i=1 ; i<size ; i++ )
{
temp = data[i];
j = i-1;
while ( j >= 0 )

{
if ( data[j] > temp )
{
data[j+1] = data[j];
j--;
}
else
break;
}
data[j+1] = temp;
}
}



















































Using OOPs In Mac Designs


Richard Rathe


Richard Rathe has worked with C since 1985. He is an assistant professor of
family medicine at the University of Florida and a researcher in medical
computing and informatics. You may contact Richard at Box J-215 JHMHC,
University of Florida, Gainesville, FL 32610


The Macintosh lets users manipulate graphic screen elements with a mouse or
keyboard. A typical Macintosh application will feature one or more windows of
related text and graphic elements, such as those used to edit text. When a
user initiates an action such as a mouse click, the action is recorded as an
event and placed in a queue. The application must then respond to this event
as appropriate.
Since the burden of managing this complexity is left to the programmer, even
simple Macintosh applications are prone to bugs and are difficult to maintain.
Object-oriented programming techniques can help reduce the complexity. In this
article, I use standard C to define window objects as part of a generic
application. Any Macintosh program derived from these templates will easily
support many windows of several different types.


Object-Oriented Style


Encapsulation is the combination of data with functions that act on that data.
Together, data and data-specific functions constitute a class. All or part of
a class's data might be private, invisible outside the class. A class's
functions are often called methods and are invoked in response to function
call messages. An object is an actual instance of a class, a "value."
A polymorphic function takes on different behavior depending on which object
invokes it. The classic example is the draw() function in the world of circle,
square, and triangle shaped objects. Three different (virtual) functions
exist, one for each shape, but all share the same name. Sending a draw message
to any shape object results in the correct version of draw() being called by
virtue of late binding.
Because object-oriented programming encourages deriving new classes from
existing classes, it should increase your productivity. Code reuse minimizes
maintenance headaches. A derived class might inherit most of its data and
functionality from its parent, but might add new data elements or functions.
You can also override specific functions by defining new functions that have
the same name, but different behavior. Using derived classes takes advantage
of inheritance.


The refCon Field


The key to creating window objects on the Macintosh is the reference constant,
or refCon field, provided as part of the built-in window data structure. These
data structures and functions are part of the Macintosh ROM toolbox and can be
called directly from an application as if they were language extensions. The
application programmer uses this 32-bit integer to store application specific
data, such as storing a pointer to an additional data structure.


Event Records


The Macintosh maintains information about events in the event record data
structure, which has the following fields:
what
message
when
where
modifiers
The meaning of each of these fields depends on the type of event. For
instance, a mouse-related event record might contain a pixel location in the
where field.


GetNextEvent()


Each time a user, application, system, or network event occurs, the Macintosh
OS creates a new event record and places it in the event queue. Your
application can access this queue by calling the built-in function
GetNextEvent(). You can use a related function, WaitNextEvent(), with newer
versions of the system software to allow background multi-processing. In
virtually every Macintosh application, GetNextEvent() is called repeatedly as
part of an event loop (Listing 1) to get event records off the queue for
processing.


activate Events


On the Macintosh, only one window is on top or "active," but you can change
which window is active at any time. While the Macintosh system software draws
the window frames, it's your application's responsibility to manage the
windows' contents. A window becoming active generates an activate event. The
previously active window generates an inactivate event. These events allow
your application to do required window housekeeping.


update Events


Whether a window is active or not, you may need to redraw all or part of it
the screen changes. The system generates an update event in response to such a
need.



mouseDown Events


Mouse presses generate mouseDown events. The meaning of a given mousepress
event depends on where it occurred. Basic tasks such as moving a window are
handled by built-in functions. Complex tasks such as scrolling text must be
handled by the application. Mouse clicks inside the menu bar are a separate
issue not covered here.


keyDown Events


A keyDown event record contains a key code and information about any modifier
keys (command or option) that might have been used. The interpretation of a
keyDown is entirely application-specific. When users hold down a key
continuously, they generate an autoKey event.


Declaring Window Classes


The generic event loop in Listing 1 provides infrastructure for creating a
window class. The WIND typedef (Listing 2) defines generic window data and
window messages/methods. Notice the use of the **data handle. A handle is a
pointer to a pointer. The memory manager can compact the heap with the
intermediate master pointer, while the application deals with the handle
exclusively. Macintosh memory management makes extensive use of relocatable,
handle-based memory blocks for things such as text and graphics. Following the
window data, I declare standard window operations as pointers to functions.
The function create_t_window() (see Listing 3) creates an instance of the text
window class. Both window and WIND data structures are allocated from the heap
using the built-in functions NewPtr() and NewWindow(). The arguments to
New-Window() specify the window rectangle, its title, visibility, and so on.
The window methods are added, and finally the window class data t_windinfo is
inserted into the t_window refCon field. The new window object joins the
window list maintained by the system software.


Window Methods


The window methods are specified in a separate file for each window class. The
activate method for the text window class t_activate() (Listing 4) takes a
window pointer and a WIND data structure as its arguments. First, you obtain
the TextEdit data handle. (TextEdit is the name for the built-in ROM functions
that support simple text editing.) Then, you call any of several update
functions depending on whether the active-Flag is set. These functions
include: activate/ deactivate; hilite scroll bar; transfer to/from scrap
buffer; and enable/disable menus.


Window Messages


Window messages are dispatched from the event loop based on the event type.
The activate event dispatcher in Listing 5 simply obtains the WIND data
structure and calls the dereferenced activate method
*windinfo->activateproc(). The activate method is polymorphic, with the actual
function call being determined at runtime. A summary of the entire process is
shown in Figure 1.


Off And Running


The body of a typical application (see Listing 6) consists of initialization
and the main event loop. Once control is handed over to the user, the main
event loop controls all interaction with the application. Window objects are
created dynamically as needed. The appropriate window methods maintain the
contents of each window, and the Macintosh operating system maintains the
windows themselves.
You can implement a variety of window classes with the object-oriented
approach. Note that all text windows support editing, scrolling, cut/paste,
find/replace, and disk access. A list window class provides spreadsheet-like
tables to display data. These tables support the selection of individual cells
and scrolling in two dimensions. Finally, the picture window class is used to
display graphics. Refer to Figure 2 for an example application.
The Macintosh lacks a standard console, so there's no place to display output
from printf() or assert() statements used to debug programs. Therefore, I
derived a debugging window class from the text window classes for this
purpose. The debugging window class inherits most of the messages/methods of
the text window class, but includes a new debug() function. Arguments to
debug() are displayed in the debugging window. Debugging output can be
scrolled, edited, and saved as a text file.


The Payoff


Programmers will benefit from this object-oriented approach. Since global
window pointers are no longer needed, the number of windows is dynamic. You
can avoid the lengthy conditional or case statements used to dispatch events.
Events will be "transparently" directed to the correct window regardless of
the window's position or type. Each window will respond as expected to the
events it receives. You can easily integrate new windows, created on the fly,
into the application's display. Inheritance simplifies the definition of new
window types since existing types form the basis for new window classes.
Finally, reusable code and decreased window-related complexity help streamline
the development of Macintosh applications.


Annotated Bibliography


Apple Computer, Inc. Inside Macintosh (vol I - V), Addison-Wesley (1984-1988).
The definitive documentation for programming the Macintosh.
Chernicoff, Stephen. Macintosh Revealed (vol. I and II), Hayden Books (1987).
A good programmer's introduction. Summarizes information found in Inside
Macintosh. Unfortunately, the example code is written in Pascal.
Jonathan Amsterdam, "Object-Oriented Mac Windows", BYTE (July 1989); 14(7)
pages 277-287. A similar but more complicated approach to Macintosh window
support using object-oriented techniques. Uses the refCon field to store
window data and methods as discussed here.
JS Linowes, "It's an Attitude," BYTE (Aug 1988); 13(8), pages 219-224. Another
example of object-oriented programming techniques in "straight C."
Figure 1
Figure 2

Listing 1
/*** generic main event loop ***/

void mainevent()

{
EventRecord event;

while(1) /* loop forever */
{
GetNextEvent(everyEvent,&event)

switch (event.what)
{
case mouseDown:
do_mousedown(event);
break;
case keyDown:
case autoKey:
do_keydown(event);
break;
case activateEvt:
do_activate(event);
break;
case updateEvt:
do_update(event);
break;
case nullEvt:
do_idle();
break;
}
}
}


Listing 2
/*** window info structure ***/

typedef struct wind
{
char dirty; /* contents saved? */
char zoom; /* window "zoomed"? */
void **data; /* window contents */

/* window messages/methods */

void (*activateproc)();
void (*updateproc)();
void (*keydownproc)();
void (*contentproc)();
void (*goawayproc)();
void (*growproc)();
void (*zoomproc)();
void (*idleproc)();
void (*disposeproc)();
void (*cutproc)();
void (*copyproc)();
void (*pasteproc)();
void (*clearproc)();
void (*findproc)();
} WIND;


Listing 3

/*** text window creation function ***/

void create_t_window(title,wrect,closebox)
char *title;
Rect wrect;
int closebox;
{
/* ... */

/* allocate new window info */

t_windinfo =
(WIND *)NewPtr(sizeof(WIND));

/* allocate new window */

t_window =
NewWindow(NIL,&wrect,
title,TRUE,8,FRONT,closebox,NIL);
/* ... */

/* insert window methods */

t_windinfo->activateproc = t_activate;
t_windinfo->updateproc = t_update;
t_windinfo->keydownproc = t_keydown;
/* etc... */

/* insert info into window refCon */

SetWRefCon(t_window,t_windinfo);
}


Listing 4
/*** text window activate method ***/

void t_activate(window,windinfo)
WindowPtr window;
WIND *windinfo;
{
TEHandle tehandle;

/* get data handle */

tehandle =
(TEHandle)windinfo->data;
/* activate event */
if(event.modifiers & activeFlag)
{
TEActivate(tehandle);
HiliteControl(((WindowPeek) window)
->controlList,ENABLE);
TEFromScrap();
enable_edit((**tehandle).selEnd
- (**tehandle).selStart);
enable_find();
}
else /* deactivate event */

{
TEDeactivate(tehandle);
HiliteControl(((WindowPeek) window)
->controlList,DISABLE);
ZeroScrap();
TEToScrap();
disable_edit();
disable_find();
}
}


Listing 5
/*** activate message dispatcher ***/

void do_activate(event)
EventRecord event;
{
WindowPtr window;
WIND *windinfo;

window = (WindowPtr)event.message;

windinfo =
(WIND *)GetWRefCon(window);

(*windinfo->activateproc)(window,windinfo);
}


Listing 6
/*** Macintosh application ***/

main()
{
/* Mac specific initialization */

init_mac();

/* set up application menus */

setup_menus();

/* etc... */

/* call main event loop */

mainevent();
}










































































Evaluating Your Floating Point Library


Gene Sheppard


Gene Sheppard has B.S. degrees in computer science and mathematics and is a
software engineer for Solid State Systems, Inc. He has authored and designed
software systems for business/inventory, graphics, and embedded systems, and
has co-authored a paper on graphics which was published in the Journal of
Pascal, Ada, & Modula-2. You may contact Gene at 2610 Whitehaven Dr.,
Marietta, GA 30064 (404) 439-0146.


Most numerical software is designed using a high level language compiler which
in turn uses mathematical algorithms to approximate functions such as the sine
and cosine functions. The operative word here is "approximate". The resulting
values are the result of two different kinds of approximation: first, the
hardware representation of the number is often approximate because the exact
value cannot be contained in the largest variable type (e.g. 1/3 doesn't have
a finite binary representation); second, the software can iterate through a
potentially endless approximation algorithm only a limited number of times if
the software is to achieve practical throughputs. Thus instead of returning
exact values for complex functions, the math package returns values which its
developer deems "close enough". Different developers, each with different
audiences, reach different conclusions about how close is "close enough".
Programs developed in fields such as mathematical physics, numerical analysis,
and scientific programming are sensitive to small differences in precision and
accuracy. In these fields an understanding of the limits of the floating point
is imperative. This article will discuss topics germane to the evaluation of a
system's numerical capabilities and ways to assist the reader in evaluating a
floating point package.
First, I will give some insight as to how a math package may estimate the
transcendental functions to illustrate how problems may occur in numerical
processing. Second, I will define the subtle differences between precision and
accuracy. Finally, I will discuss code to evaluate the compiler's ability to
handle numerical software.


Trig Approximations


Many trig approximations are based on the Taylor series. (The series is
defined in Figure 1a and expanded to show the first few terms in Figure 1b.)
Notice that the series never ends, thus practical approximations based on it
must be terminated after a finite number of terms have been evaluated. Herein
is the problem. Not all math packages terminate at the same point, thereby
giving rise to dissimilar results under different compilers.
The function Rn(X) -- otherwise known as Joseph Louis Lagrange's form of the
remainder function -- is used to determine where to end the series. (X is the
parameter passed to the function and n is the number of iterations.) To
estimate the sine function using the Taylor series to an accuracy of 0.00001,
you would evaluate the remainder function until the result was less than
0.00001. If n were 10 then you would take the Taylor series out to 10
iterations to find your answer.
The developer's challenge is to evaluate only as many terms as necessary to
supply the needed precision and accuracy. If he evaluates too many terms, his
package will be unnecessarily slow; too few terms and the results won't be
accurate or precise enough.


Precision Vs. Accuracy


Precision refers to the number of digits used in the results of a numerical
analysis and does not imply that all the digits are correct.
On the other hand, "accuracy" is a measure of the correctness of each digit in
the result. For example, the statement "PI = 3.14" is accurate to three
places. The statement "PI = 3.139" has more precision than the first, but it
is not as accurate.
However
"PI = 3.14159265358979323846264338327950288"
is as accurate as the first statement and has more precision than the second.
Your software may format all output to 16 places, but if you don't validate
the floating point routines, you'll never know how many of those digits are
meaningful and how many are just spurious noise.
The quality of numerical applications is also affected by the math package's
dynamic range -- the range between the largest and smallest values the package
can handle.
For example, in a loop controlled by
while( x > TEST )
if TEST has a value of 10-6 then the floating point package with which x is
being calculated must have a dynamic range of at least 10-6. Otherwise the
loop may never terminate.
In a loop controlled by
while( x ! = 0 )
the dynamic range will determine how close to zero x must become to terminate
the loop.


The Code


The four routines accompanying this article will provide useful information
about most any math library and, with little modification, even about
hardware-based floating point support. When you run the routines you should
carefully inspect the results of each iteration for precision, accuracy, and
gross failures.
When you have identified an iteration in which the calculation is breaking
down, you must then examine the contents of various variables before
determining if the cause was a limitation in accuracy, precision, or dynamic
range.
Was the problem caused by blending precision operations by the inappropriate
use of the printf function, or is the floating point package in error?
Gross failures are perhaps the easiest to identify. Each routine will take
most math packages to the their limits. If your system does not reach an error
condition or overflow in 74 iterations, then expand the loop termination to a
larger number.


Listing 1


Code in Listing 1 demonstrates the limitations of single precision. The
variable f is single precision and the printf function is instructed to output
in scientific notation. Everything is fine up to and including the 41st
iteration in Figure 2, where things seem to go awry.
It is not immediately clear if the math package is at fault. Closer inspection
finds that the system's internal mechanism maintains a more precise value than
can be printed. Calculators often follow the same practice; they have a
limited display but maintain a much larger number inside. At the 42th
iteration, the system's internal mechanism begins to manifest its limitation;
this system's internal single precision has a dynamic range from 10e-37 to
10e+38. Beginning at the 42nd iteration, garbage is produced until overflow
errors begin to occur. At the 49th iteration, it can no longer handle the
small numbers.


Listing 2



Some compilers recognize the keyword double, but implement double and float
identically. The program in Listing 2 tests a system's double precision. The
output starts with a number smaller than in Listing 1 and the variable f is
declared as double. Comparing the output from both Listing 1 and Listing 2
(Figure 2 and Figure 3) will show that this system really makes a distinction
between single and double precision. In Figure 2 the system gives up after the
48th iteration and in Figure 3 the system does well on all 74 iterations.
To find the breaking point of this system, I rewrote the loop to perform 310
iterations -- it broke down on the 304th iteration.


Listing 3


Listing 3 checks the SIN function for roundoff error and for subtle
sensitivities to the way parameters are handled.
This program is especially useful to C programmers because C wants to convert
all arguments passed to the transcendental functions to double and roundoff
error can become a problem if the programmer is not careful.
In some implementations of the sine approximation, the argument is squared
before the range is checked, resulting in an underflow internal to the
function and garbage for a return value. This program tests whether the SIN
function returns its argument unchanged when the argument is very close to
zero. The SIN function should return correct values over the entire dynamic
range of the floating point operations.
The system used in my test seems to be in good health because both the
returned value from the SIN function and the passed parameter do not show
signs of being corrupted.


Listing 4


This innocuous program is not as straightforward as it may seem.
The program loops 25 times, computing roots of x twice and assigning the
product of these roots to y. In otherwords, y is theoretically identical to x
because it is the square of x's square root. The variables y and x are checked
and printed when found to be unequal.
(Again, to avoid problems the argument must be of appropriate type. C allows
the calling routine to pass arguments of a different type than is declared in
the called routine.)
The results, as indicated in Figure 5, are not as one might expect. Only seven
of the 25 evaluations were found to be equal when using double precision
variables. I evaluated this program using only single precision and found that
y and x were equal.
This may indicate that in my system either the square root function and/or the
multiplication routines can't handle the results.


Perspective


Anyone using or contemplating the development of numerical software should
evaluate their system's numerical capabilities in order to gain confidence in
their results. A starting point could be to adapt this code to test all of the
functions in your math package. But, you should keep in mind that these tests
aren't exhaustive, and that a solid math package doesn't guarantee solid
results -- even after your math package has passed muster, your code can still
introduce many subtle problems.


Additional Reading


Cheney, Ward and David Kincaid. Numerical Mathematics and Computing. Monterey,
CA: Brooks/Cole Publishing Company, 1980.
Encyclopedia of Computer Science. New York: Van Nostrand Reinhold Company,
1976.
Pachner, Jaroslav. Handbook of Numerical Analysis Applications. New York:
McGraw-Hill, Inc., 1984.
Press, William H. et al. Numerical Recipes: The Art of Scientific Computing.
New York: Cambridge University Press, 1986.
Smith, W. Allen. Elementary Numerical Analysis. New York: Harper & Row,
Publishers, 1979.
Figure 1 (a) Taylor series for the sine funciton; (b) Taylor series in an
easier to understand form.
Figure 1a
Figure 1b
Figure 2 Output from Listing 1
1 3.333333e+02
2 3.333333e+01
3 3.333333e+00
4 3.333333e-01
5 3.333333e-02
6 3.333333e-03
7 3.333333e-04
8 3.333333e-05
9 3.333333e-06
: :::::::::::
: :::::::::::
42 3.333332e-39
43 3.333339e-40
44 3.333409e-41
45 3.333689e-42
46 3.335090e-43
47 3.363116e-44

48 2.802597e-45
49 0.000000e+00
: :::::::::::
: :::::::::::
73 0.000000e+00
74 0.000000e+00
Figure 3 Output from Listing 2
1 3.333333e-06
2 3.333333e-07
3 3.333333e-08
4 3.333333e-09
5 3.333333e-10
6 3.333333e-11
7 3.333333e-12
8 3.333333e-13
9 3.333333e-14
:: :::::::::::
:: :::::::::::
42 3.333333e-47
43 3.333333e-48
44 3.333333e-49
45 3.333333e-50
46 3.333333e-51
47 3.333333e-52
48 3.333333e-53
49 3.333333e-54
:: :::::::::::
:: :::::::::::
73 3.333333e-78
74 3.333333e-79
Figure 4 Output from Listing 3
1 3.333333e-06 3.333333e-06
2 3.333333e-07 3.333333e-07
3 3.333333e-08 3.333333e-08
4 3.333333e-09 3.333333e-09
5 3.333333e-10 3.333333e-10
6 3.333333e-11 3.333333e-11
7 3.333333e-12 3.333333e-12
8 3.333333e-13 3.333333e-13
9 3.333333e-14 3.333333e-14
:: ::::::::::: :::::::::::
:: ::::::::::: :::::::::::
42 3.333333e-47 3.333333e-47
43 3.333333e-48 3.333333e-48
44 3.333333e-49 3.333333e-49
45 3.333333e-50 3.333333e-50
46 3.333333e-51 3.333333e-51
47 3.333333e-52 3.333333e-52
48 3.333333e-53 3.333333e-53
49 3.333333e-54 3.333333e-54
:: ::::::::::: :::::::::::
:: ::::::::::: :::::::::::
73 3.333333e-78 3.333333e-78
74 3.333333e-79 3.333333e-79
Figure 5 Output from Listing 4
1.0 1.00000000000000044 -4.44e-16
2.0 2.00000000000000044 -4.44e-16
3.0 3.00000000000000044 -4.44e-16
4.0 4.00000000000000177 -1.78e-15

6.0 6.00000000000000088 -8.88e-16
7.0 7.00000000000000088 -8.88e-16
8.0 8.00000000000000177 -1.78e-15
10.0 10.00000000000000176 -1.78e-15
12.0 12.00000000000000176 -1.78e-15
13.0 13.00000000000000176 -1.78e-15
15.0 15.00000000000000176 -1.78e-15
16.0 16.00000000000000709 -7.11e-15
18.0 18.00000000000000354 -3.55e-15
19.0 19.00000000000000355 -3.55e-15
21.0 21.00000000000000709 -7.11e-15
23.0 23.00000000000000353 -3.55e-15
24.0 24.00000000000000353 -3.55e-15
25.0 25.00000000000001064 -1.07e-14

Listing 1
#include <stdio.h>
#include <ctype.h>
#include <math.h>

void main()
{
float f;
int index, n = 75;

f = 10000.0 / 3.0;
for( index = 1; index < n; index++)
{
f = f / 10.0;
printf(" %d %e\n", index, f);
}
}


Listing 2
#include <stdio.h>
#include <ctype.h>
#include <math.h>

void main()
{
double f;
int index, n = 75;

f = 1.0 / 10000.0 ;
f = f / 3.0;
for(index = 1; index < n; index++)
{
f = f / 10.0;
printf(" %-2d %e\n", index, f);
}
}


Listing 3
#include <stdio.h>
#include <ctype.h>
#include <math.h>


void main()
{
double f, ssin;
int index, n = 75;

f = 1.0 / 10000.0 ;
f = f / 3.0;
for(index = 1; index < n; index++)
{
f = f / 10.0;
ssin = sin(f);
printf(" %-2d %e %e\n", index, f, ssin);
}
}


Listing 4
#include <stdio.h>
#include <math.h>
#include <ctype.h>

void main()
{
double x,y ;

for ( x = 1.0; x <= 25.0; (x = x + 1.0) )
{
y = sqrt(x) * sqrt(x);
if (y != x)
printf("%5.1f %21.17f %10.2e\n", x, y, x-y);
}
}































Standard C


Library Ground Rules




P.J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standards committee. His latest book is Standard C which
he co-authored with Jim Brodie.




History


X3J11 began its deliberations in 1983 amid many uncertainties. One of the
largest areas of uncertainty was the library. Kernighan and Ritchie, that
venerable de facto standard for the C programming language, mentioned library
functions only in passing. The language definition in Appendix A said nothing
about the library. Nor was there an "Appendix B" to fill in the blanks. What
was said in the running text was heavily influenced by the UNIX programming
environment. After all, that was where C was born and that was where Kernighan
did all his work.
A continuing tension in the early years of X3J11 was this gap of perception
between UNIX and non-UNIX communities. The former felt proprietary about C. It
was rather as if any other implementation of C were somehow contrived and
substandard. The latter community, on the other hand, felt responsible for the
commercial success of C. It was all those IBM PC and Motorola 68000
programmers who were making C an important force in the world.
The differences were felt most were in the libraries that grew up around each
implementation of C. People writing in C under UNIX wanted to keep C as close
as possible to its roots. They did not want to lose the clean interface they
had come to love. People writing for other specific operating systems wanted
to access their special capabilities. They did not want to make their systems
slavishly match the idiosyncracies of UNIX. A few of us were trying to keep C
highly portable across many environments. We did not want to sacrifice the
power of C to keep it portable.
An earlier decision of mine did not help matters. When I wrote the library for
the Whitesmiths C compiler, there was no clear standard. Most utilities under
UNIX were written using the original PDP-11 C library. A few daring souls were
fiddling with "streams" and other niceties added when C migrated off the
PDP-11. Neither library was as complete, consistent, or compelling as one
could wish for a major language.
So I swallowed hard and developed yet a third set of functions. The
Whitesmiths C library had no printf or scanf. Instead it had putfmt and
getfmt. Format codes were more complete and more consistent. So were the names
of I/O and string functions. There were added functions for parsing arguments
on command lines and for walking lists of filename arguments. Many people
agreed that it was a nice job of re-engineering.
Had I chosen to allow unrestricted use of the Whitesmiths C library, it may
have been more widely adopted. As it was, it had a constituency just large
enough to be perceived as a threat. I found out years later that AT&T was
nervous in the early days of X3J11. They were afraid that I would push for my
library over the one in UNIX. As it turned out, I did put forth a number of
features that had been proved in that library. Some were even adopted. But I
could see that even in 1983 many people thought that printf was practically a
keyword in C.
Adding to the excitement was my decision to volunteer as chair of the library
subcommittee. To some, this was tantamount to putting the fox in charge of the
henhouse. My motives were more noble than your typical fox, but I understand
the apprehension. In my enthusiasm for technically inventive solutions, I did
not always behave with the disinterest of a good subcommittee chair.
Nevertheless, I believe that the library portion of the C standard turned out
pretty good. I claim only a small share of the credit for that. Many people
labored long and hard developing that portion and cleaning it up. The folks at
/usr/group get high marks for getting the PDP-11'isms out of the UNIX library
descriptions. Much of our work consisted only of getting the UNIX'isms out of
their product.


Misperceptions


The ANSI C standard has a lot to say about how the library looks to the user.
It is no longer sufficient just to provide printf, scanf, the usual math
functions, and a passel of string and character manipulation functions. Many
more functions are required. And gone are the days when each site could toss
in a few dozen implementation-specific functions. Many more constraints exist
on what names must not be visible.
As chair of the library subcommittee of X3J11, I fought hard for many of these
requirements. My experience implementing C on numerous and varied operating
systems taught me that most of the requirements were important. If Standard C
was to be both powerful and portable, many of the variations present in C in
the early 80's would have to be eliminated. Too many critical variations
resided in the C library.
I find it mildly annoying that some of these requirements are widely
misunderstood. The Standard C library, for example, is required to have a
fairly clean name space. The library defines a couple hundred external names.
Beyond that, certain classes of names are reserved for use by the
implementors. All other names belong to the users of the language.
Most implementations have to change to satisfy this requirement. For example,
UNIX has low-level I/O functions with names such as open, close, read, write,
and lseek. These functions are not part of the Standard C library. UNIX
traditionally implements the stream functions in terms of calls to these
low-level I/O functions. That is no longer permissible under Standard C. A
conforming C program must be able to define a function (or data object) called
open with no fear that it will interfere with the correct operation of fopen.
I have heard cries that this requirement "breaks" the UNIX implementation of
C. It does not. It does require that fopen and its brethren call a different
set of low-level I/O functions. An implementor must make a copy of the code
for open and rename it _open. fopen must then call the new function. The
implementor must also do the same for all the other low-level I/O functions
used by the Standard C functions. The problem is solved.
Some people mistakenly assume that open must be banished from the library. It
does not. A program that refers to open and provides no definition will load
the library function, just like in the good old days. A program that defines
its own version of open will have no occasion to pull that function off the
library. If no part of the Standard C library expects that particular
function, no harm is done. You can always safely "knock out" isolated
functions from a library.
Still others complain that the clean name space requirement is a new and
onerous burden on implementors. It is not. There have always been de facto
requirements on what functions should be present in a C library. Otherwise,
people who try to move serious applications written in C say bad things about
implementations that have missing bits. Those same people continually bark
their shins on furniture that is present in the library that they don't
expect. I have repeatedly heard the same complaint from this important
constituency. Name space "pollution" has been the single largest source of
unexpected problems in writing large portable C applications.
The C standard has merely shone a harsh light on several existing problems.
And it has institutionalized solutions that were available only spottily in
the past. In this regard, Standard C says nothing really new. It has simply
codified the best of existing practice.
Many people, however, have formed a strong emotional attachment to their own
personal image of C. Where Standard C appears to distort that image, these
people react emotionally. Fine points get lost among strong feelings. That's
why it is important to keep clarifying the misunderstandings that crop up.
It's not sufficient that some of us believe the C standard to be a good one.
We must show the ardent fans of C that their language has not been damaged
beyond repair.
One way to show that the C library is not impossible to implement is to show
some ways to implement it. That's what this column is about. I don't expect to
quell all criticism of the decisions we made in X3Jll, but I do hope to pass
on some useful advice to implementors and users alike.


Name Space Issues


I have already touched on the major issues concerning names in the library.
For completeness, however, I will spell out the requirements of Standard C in
this area.
First, the library defines a long list of names. The language proper defines a
few more. With rare exception, the programmer had better not use any of these
names except for its predefined purpose. The programmer can, for example,
define a macro with the same name as a keyword. (Just don't do it before
you've included any standard headers your program needs.) The programmer can
define a name with internal linkage or no linkage that matches a name defined
with external linkage in the library. While both practices can cause
maintenance problems (for the programmer), the implementor must still support
them.
The implementor's first job, naturally, is to provide all those definitions.
His or her second job is to define each name in its proper name space. You
can't cut corners here or you will run afoul of some programmer pushing the
edges of the envelope.
Figure 1 shows the name spaces that exist in a C program. It is taken from
P.J. Plauger and Jim Brodie, Standard C, Microsoft Press (1989). The figure
shows that you can define an open-ended set of name spaces:
Two new name spaces are created for each block (enclosed in braces within a
function). One contains all names declared as type definitions, functions,
data objects, and enumeration constants. The other contains all structure,
union, and enumeration tags.
A new name space is created for each structure or union you define. It
contains the names of all the members.
A new name space is created for each function you define. It contains the
names of all the labels.
You can use a name only one way within a given name space. If the translator
recognizes a name as belonging to a given name space, it may fail to see
another use of the name in a different name space. In the figure, a name space
box masks any name space box to its right. Thus, a macro can mask a keyword.
And either of these can mask any other use of a name. (That makes it
impossible for you to define a data object whose name is while, for example.)



Name Space Caveats


The Standard C language proper defines macros and keywords. The Standard C
library defines macros, functions, type definitions, structure tags, and
member names. Any function name can potentially be masked by a macro, if you
include the standard header that declares the function. All function names
have external linkage. Some macros can also mask names of library entities
that have external linkage. (Two examples of these odd creatures are setjmp
and errno.)
As an implementor, you must put each of the predefined names in its proper
name space. If you don't, you will surprise the more daring programmers who
recycle these names. Let's say, for example, that size_t has type unsigned int
on your implementation. If the programmer includes any of five different
standard headers, size_t should be defined thereafter in the program. You
might be tempted to write
#define size_t unsigned int /* DANGEROUS */
Mostly, that would work fine. Almost any redefinition or redeclaration of
size_t will break, however. A macro will be branded as an improper
redefinition. A declaration will be rewritten with the type names unsigned int
where the translator expects a name. Bad news.
The only safe implementation is to place the same type definition in each of
the five files. (You can include a common file instead, but the filename must
not collide with filenames that you promise the programmer can #include.) The
user must be able to include any combination of the five standard headers, in
any order, with no fear that the type gets multiply defined. That leads to a
construct, in each of the five standard headers, that looks something like
#ifndef __SIZE_T
#define __SIZE_T
typedef unsigned int size_t;
#endif
You as implementor must also resist two other temptations. You must not define
size_t outside any of the five standard headers in which it belongs. And you
must not have any of the standard headers include any of the others. In either
case, the programmer has unexpected definitions inflicted on the program.
Finally, the implementor must choose any secret names with care. The standard
reserves several sets of names for use by the implementor. A programmer who
chooses to define names in any of these sets runs the risk of colliding with
some secret name. Collision can occur even if the program includes no standard
headers. The sets are:
for secret macro names, any name that begins with an underscore, followed by
either an underscore or an upper case letter.
for secret names with external linkage, any name that begins with an
underscore.
The second set is useful only for names confined purely to executable code in
the library. Why? Say, for example, that your implementation computes sin(x)
by the secret call_sinq(x, 0). You might be tempted to place at the end of the
standard header <math.h> the macro
#define sin(x) _sinq(x, 0)
Nothing prevents the programmer from defining a macro named _sinq. And nothing
can be harder to debug than sneaky little code rewrites such as this. Beware.


Standard Header Caveats


An implementation must provide fifteen different standard headers. Any
predefined names not defined in the language proper are defined in one or more
of these standard headers. The headers have several properties:
They are mutually independent. No standard header requires that another
standard header be first included for it to work properly. Nor may any
standard header include another standard header.
They are idempotent. You can include the same standard header more than once.
The effect is as if you included it exactly once.
They are equivalent to file level declarations. You must not include a
standard header within a declaration. And you must not mask any keywords with
macro definitions, as I mentioned earlier.
To maintain mutual independence, the implementor must occasionally make use of
both redundancy and synonyms. I gave an example of redundancy earlier, for the
size_t definition. Whether you replicate the code or include a common secret
header is irrelevant. In either case, the effect is to inject the same code at
multiple places within the translation unit.
In a few situations, the translator must provide a synonym for a named entity
because the name might not be available. Here are two cases that sometimes
confuse readers of the C standard.
Some people think that you can use the sizeof operator only if size_t is first
defined in the program. Or worse, some people think that using the operator
somehow causes the associated type definition to appear. Neither is true. The
translator merely needs to know what existing integral type is the proper
synonym for size_t. There is never a need for the name proper.
A similar but different issue arises with the three print functions vfprintf,
vprintf, and vsprintf. All three are included in the standard header
<stdio.h>. All three have an argument of type va_list. But that type is not
defined in that particular standard header. It is defined only in the standard
header <stdarg.h>. How can this be?
The answer is simple, if a bit subtle. The standard header <stdio.h> must
contain a synonym for the type va_list. The synonym has a secret name from one
of the sets I showed earlier. That's all that's needed within the standard
header to express the function prototype for each of the three functions.
Now, it's rather difficult for you as a programmer to use any of these
functions without a definition for va_list. (It can be done, but it's probably
not good style.) That means you probably want to include the standard header
<stdarg.h> anytime you make use of any of these functions. Still, it's your
problem. The implementation need not (and must not) drag in <stdarg.h>
everytime you include <stdio. h>.
Idempotence is a little easier to manage. I showed you earlier how to avoid
multiple definitions of size_t. You use a similar macro guard for most of the
standard headers:
#ifndef __STDIO_H
#define __STDIO_H
..... /* body of <stdio.h>
#endif
The one exception is the standard header <assert.h>. It's behavior is
controlled by the macro name NDEBUG that you can choose to define. Each time
you include this standard header, the assert macro is turned off or on,
depending upon whether or not NDEBUG has a macro definition at that point in
the translation unit. But that's another story.
The final property of standard headers is purely for the benefit of
implementors. The programmer must include a standard header only where a file
level declaration is permitted. That means the #include directive must not
occur anywhere inside another declaration. Most standard headers must contain
one or more external declarations. Without this caveat, the standard headers
would be impossible to write as ordinary source text files.


Conclusion


Those are the principal ground rules for using and implementing the Standard C
library. I could go on to list any number of additional details, but I will
refrain from doing so here and now. I think I've hit the high spots.
As you can see, the C standard has a number of subtle implications for
implementors of the Standard C library. Some severely constrain how you can
write a conforming library. Some cause the standard headers to be less
readable than in simpler times past. None, however, are insurmountable or lead
to serious performance problems.
Figure 1










































































Doctor C's Pointers (R)


Operators And The Precedence Table




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091
or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net. 


Every useful C program contains statements which in turn, contain expressions.
According to ANSI C "An expression is a sequence of operators and operands
that specifies computation of a value, or that designates an object or a
function, or that generates side effects, or that performs a combination
thereof." As such, the C programs you write are full of operators and their
operands.
In this issue, I'll look at some of the more interesting aspects of C's
operators and the operator precedence table.


The Operator Precedence Table


In many languages such things as type conversion, procedure invocation, and
array subscription are achieved via keywords, punctuators, or intrinsic
functions. Not so with C, however. One of the most elegant aspects of C is
that all operations are performed via operators. So by mastering the operator
precedence table you can understand how to build expressions, then statements,
and finally, programs.
Operators in C are given precedence "values" according to their position in
Table 1. (It is impossible to accurately represent this table without using
footnotes or other such comments so these will follow below.)
Each operator is shown only once in the table. However, some operators are
overloaded and therefore, appear as many times as necessary. (For example, the
unary indirection operator * is in the second row while the binary
multiplication * is in the third row.) Note also, that the postfix versions of
++ and -- have higher precedence than their prefix counterparts with the
former being in row one and the latter in row two.
Let's see how to read the table correctly by using the following expression:
a = b + c * d
This expression contains the three operators: assignment, addition, and
multiplication. According to the precedence table multiplication has the
highest precedence (it's in the row closest to the top, row three), followed
by addition in row four, and finally by assignment in row 14. The table
indicates the precedence of these operators in the absence of grouping
parentheses. Let's rewrite this expression explicitly showing the grouping
dictated by the table.
(a = (b + (c * d)))
Of course, you can always use grouping parentheses yourself to either document
or change the default precedence. For example:
((a = (b + c)) * d)
produces an entirely different result.
Since most table rows contain more than one operator it is possible to have an
expression containing operators at the same precedence. To resolve this, we
must look at the associativity column. This column tells us whether the
operators in that row associate left-to-right or right-to-left -- in the
expression as we have written it not as the operators exist in the table's
row. For example:
x * y / z
a = b = c 
can be rewritten as:
((x * y) / z)
(a = (b = c))
Even though multiplication and division have the same precedence we generally
say that multiplication has higher precedence in this expression since these
operators associate left-to-right.
The precedence table is used by the compiler to construct an expression tree
and has nothing whatsoever to do with order of evaluation. It is a very common
mistake to say "order of evaluation" when you really mean "precedence."
Consider the following example:
a = b + c * d
The precedence of these operators is clear from the table. However, the order
of evaluation of these operators is unspecified by the language. In this case
it doesn't matter since the only side effect is the assignment and that can
only be done after all the other expressions are evaluated. Let's use a
different version of this same expression to show the problem.
((*(a())) = (b() + (c() * d())))
What is the order of evaluation of the function call side effects? That is, in
what order are the four functions called? The order is undefined and cannot be
made explicit via grouping parentheses.


Postfix Operators


The first row of the table contains the six postfix operators. C uses an
operator to call a function. It's an unusual operator in that it consists of a
postfix set of parentheses () which contain a possibly empty set of comma
separated expressions having an object type. To call a function one simply
names it and follows it with the function-call operator and an argument list.
Since function calling is achieved via a run-time operator the operands'
values need only be known at runtime. Specifically, the name of the function
is not needed at compile time. What is needed is an expression that designates
a function. This then permits a function to be called indirectly using a
pointer to a function. For example:
(*jmptable[i] ) ()
calls the function pointed to by the ith element in an array of function
pointers called jmptable. The function actually called depends on the run-time
value of the index i. This approach (and its corresponding flexibility) is not
possible in most high-level languages.
Similarly, the subscript operator [] is much more flexible in C than most
other languages. It requires that one of its operands have an object pointer
type (so that excludes pointers to incomplete and function types) and the
other have an integer type. It does not require that the name of an array be
present. Since subscripting is nothing more than dereferencing an object at
some integer offset from a base address, only a base pointer is needed. As a
consequence, you may arbitrarily subscript a pointer expression to one level.
(Of course, whether the resulting expression produces defined behavior or not,
depends on whether you go outside the bounds of the object to which you are
pointing.)
The ability to subscript a pointer gives rise to the identity
a[i] is equivalent to *(a + i)
It also allows expressions such as "abcd"[i] and f() [j]. One of the biggest
advantages of subscripting pointers is that space allocated by malloc and
friends can be treated the same as that allocated statically or automatically.
Subscripting pointers also permits multidimensional arrays to be referenced
with less than the maximum number of dimensions. For example, given int i [5]
[4] [6];
i
i [2]

i [2] [3]
i [2] [3] [4]
are all valid expressions.
Note that the order of the operands of [] is unspecified. That is, a[i] and i
[a] are equivalent. This is not to suggest that you should write 2[i] instead
of i[2], however, both are valid expressions under K&R as well as ANSI C. Some
mainstream compilers will not accept 2[i] and while this in itself is not a
big problem, it can make you suspicious about other possible (and illegal)
shortcuts the compiler writers may have made. With [] being commutative some
other interesting expressions are possible. For example:
a[i] [j] [k]
j[i[a]] [k]
k[j[i[a]]]
are all equivalent.
The member selection operators --> and . can always be written in terms of
each other. For example, s.m is the same as (&s)-->m and ps-->m is equivalent
to (*ps).m. The dot operator requires its left operand to designate a
structure although not necessarily by name. This makes expressions like f( ).m
and (*g( )).m possible.
ANSI C clearly indicates that the postfix versions of ++ and -- have different
precedence than their prefix counterparts. Historically, both prefix and
postfix versions were combined in the second row. However, this gave rise to
problems when faced with expressions such as:
ps++-->m
If the --> has higher precedence that ++, this expression is ill-formed.
However, many compilers treated it like
(ps++)-->m
By making postfix ++ and -- the same precedence as -->, this problem was
resolved since postfix operators associate left-to-right giving the same
grouping as was previously assumed by these compilers. The promotion of
postfix ++ and -- from row two to row one should break no existing code. It
either now sanctions what your compiler might already be doing or it allows
expressions previously rejected by your compiler.


Unary Operators


All of the unary operators are in row two of the table. They are all prefix
operators.
The unary plus was an ANSI C invention. Originally, it had special grouping
semantics but these were removed in a later draft version of the standard. The
result of the unary plus operator is the value of its operand. The integral
promotion is performed on the operand, and the result has the promoted type.
An expression such as -32768 consists of two source tokens; the unary minus
operator and the integer constant 32768. Note there is no such thing as a
negative constant in C. The constant is non-negative and it is preceded by a
unary minus operator. An interesting situation exists on 16-bit
twos-complement machines where -32768 is the smallest value that can be stored
in an int. It so happens, that the type of -32768 when written in this form is
not int; it's long int, but that's another story.
ANSI C has now made it possible to take the address of an array by using
something like &arrayname. Historically, compilers treated this as though you
had written &arrayname[0], although some rejected it outright. (For a detailed
discussion on pointers to arrays, refer to my column in the May 1990 issue of
The C Users Journal.)
In the early days of C, structures and unions were always passed by address
just like arrays. However, once structure and union passing by value was
introduced, an explicit & was needed to construct the address of a structure.
Some compilers issue a warning message if you pass a structure or union by
value since they "believe" you might have omitted the & by mistake.
To call a function pointed to by a function pointer, you use the syntax:
(*funptr) (arg-list)
ANSI C also permits this to be written as:
funptr (arg-list)
so it looks like a "regular" function call. This is quite reasonable since an
expression that designates a function is converted to a pointer to that
function and as a result, the following expressions are equivalent.
printf ("Hi there\n");
(*printf) ("Hi there\n");
(**printf) ("Hi there\n");
(***printf) ("Hi there\n");
The sizeof operator is rather unusual in that it uses a keyword rather than a
symbol and it is evaluated at compile time. A sizeof expression produces a
value of type size_t, an unsigned integer type defined in the standard headers
stddef.h, stdio.h, stdlib.h, string.h, and time.h. This value represents the
size in bytes of an object having the type specified. sizeof can only be
applied to expressions having an object type. (It cannot be used with
expression having incomplete or function type.) sizeof has two forms:
sizeof expression
sizeof (type)
Programmers almost always use parentheses even when they are not required. For
example, in sizeof(i), the parentheses are redundant grouping parentheses
whereas in sizeof(int) they are a necessary part of the syntax. (For a
detailed discussion on sizeof, refer to my column in the February 1988 issue
of CUJ.)
The cast operator can be used to convert an expression of one type to another
with the following restrictions. You cannot cast to or from a structure or
union type. You cannot cast to an array or function type. When you cast from
an array or function type, the operand is first converted to a pointer. You
may cast an expression to its own type and you may also cast an expression to
type void. The latter explicitly discards the value of the expression and its
use is mostly limited inside function-like macros that replace void functions.
Strictly speaking, the cast operator has precedence lower than the unary
operators but higher than the multiplicative operators. However, when shown in
tabular form, it is always written along with the unary operators.


Other Operators


The rest of the operators are quite straightforward, however, a few comments
are in order.
ANSI C states that expressions that both associate and commute are still
controlled by the precedence table. For example:
a * b * c
must be treated as being grouped like:
(a * b) * c
The rules in K&R permitted such expressions to be arbitrarily rearranged. Of
course, different orderings might cause integer overflow (which may or may not
prove fatal or erroneous). With floating-point operands, the results can be
significantly different. On implementations where integer overflow is silent
and recoverable, a compiler is still permitted to rearrange the grouping since
you cannot tell the difference. Likewise for the bit operators , &, and ^.
In the early days of C, the compound assignment operators were written in the
reverse order. For example, += and >>= were written as =+ and =>>,
respectively. Unfortunately, this causes problems with expressions like i =-5.
While you probably wanted to assign -5 to i, you were, in fact, subtracting 5
from the current value of i. These old operators were declared archaic in the
first edition of K&R and are only supported by a handful of mainstream
compilers. They are not part of ANSI C.
The comma operator is rather special and powerful and is discussed in my
columns in the August 1988 and the November 1989 issues of CUJ. It is
typically used only inside macros or in the first and third expressions of a
for construct.


Miscellaneous Issues


Only five operators give any guarantee about the order of evaluation. They
are: &&, , ?:, (), and comma operator. (For a detailed discussion on order of
evaluation and sequence points, refer to my column in the July 1989 issue.)
Only four operators are able to produce an lvalue expression. They are: *, [],
->, and dot. (For a detailed discussion on lvalues, refer to my column in the
August 1989 issue.)

Table 1
 Operator Associativity
 ------------------------------------------------
 () [] -> . ++ -- Left to Right
 ! ~ ++ -- + - * & (type) sizeof Right to Left
 * / % Left to Right
 + - Left to Right
 << >> Left to Right
 < <= > >= Left to Right
 == != Left to Right
 & Left to Right
 ^ Left to Right
 Left to Right
 && Left to Right
 Left to Right
 ?: Right to Left
 = += -= *= /= %= >>= <<= &= ^= = Right to Left
 , Left to Right
 ------------------------------------------------












































Questions & Answers


Array Vs. Pointer Names




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


You may fax questions for Ken to (919) 493-4390. When you hear the answering
message, press the * button on your telephone. Ken also receives email at
kpugh@dukeac. ac. duke. edu (Internet) or dukeac!kpugh (UUCP).


Q
Yesterday I spent several hours tracking down the following error. I have
multiple source files, each of which accesses certain global variables. For
example, consider these two files:
------------file #1--------------
char date[9];
main()
{
strcpy ( date, "01-01-90" );
foo();
puts ( date );
}

--------------file #2--------------
extern char *date;

foo()
{
puts ( date );
strcpy ( date, "02-02-90");
puts (date );
}
Both files compile without warnings (I had #included the appropriate header
files). The link step generated no warnings. However, when I ran the program,
my computer entered the twilight zone. A little detective work with a symbolic
debugger revealed that the value of date was 0, despite the fact that the link
map showed a valid storage address for date.
The compiler I used yesterday was QuickC 2.0. Today I tried this small example
on a VAX and got a memory protection fault. I had an associate try the problem
under UNIX; he got a similar error. I also tried MSC 5.1; the program ran
without errors, but the output seemed to show that date in main was stored in
a different place than date in foo(). Turbo C ran the problem without any
difficulty.
I am sure you are aware that the solution is to replace the declaration in
file #2 with:
extern char date[];
My question is this: why does this error occur? When is a pointer not the same
as an array name? I feel that I am missing something fundamental about
pointers and arrays. Am I?
Richard J. Wissbaum
Aurora, CO
A
You suffer from common pointer/array confusion, which can arise for many
reasons, including the fact that square brackets have multiple meanings. To
work towards a cure, let's examine some facets of arrays, pointers, and their
symbols. Although your problem deals with external arrays, for the benefit of
our readers, I'll start with automatic arrays. Assume that ints are two bytes
long and that addresses (pointers) are four bytes long.
In Listing 1 function_1 contains an array, good_auto_array, declared as a
local variable. The compiler allocates twenty bytes (10 times sizeof(int)) on
the stack for good_auto_array, and four bytes for integer_pointer. You can
make memory references with integer_pointer by using either form shown. Of
course, you do not have to initialize it, but not doing so is the easy way to
destroy your program.
The indexed form, integer_pointer[1], is normally transformed into the
indirect form, *(integer_pointer+1), by the computer. (This translation does
not always take place. I recently examined code produced by the DEC VAX
compiler and different assembly instructions were generated depending on
whether the source used the indexed form or the indirection form.) Likewise
good_auto_array[1] is transformed into *(good_auto_array + 1). Unlike
integer_pointer, good_auto_array represents a constant address. Thus, you
cannot assign anything to it, as
good_auto_array = integer_pointer;
Using the compile time operator sizeof on these two variables,
sizeof(good_auto_array) would be 20 and sizeof(integer_pointer) would be four.
The following erroneous declaration doesn't allow function_2() to compile,
since you must state the size of the array when declaring a local variable.
function_2()
{
int bad_auto_array[]; /* Does not compile */
}
Now for passing an array as a parameter:
Listing 2 contains two functions which are passed an array as their sole
parameter. The name of an array used without the subscript has the value of
the address of the array. The address of array_passed is given to both
function_3 and function_4. Regardless of which way you declare
array_parameter, you can reference it using either function_4's indirect form
or function_3's index form. In both cases array_parameter is four bytes long
(sizeof(array_parameter) yields 4) and like all function parameters, acts like
a local variable. Thus, you could assign

array_parameter = local_array;
As for globals
extern int global_array[];
function_5()
{
global_array[1] = 5;
}
Function_5 makes a reference to a global array. You must already have declared
this global array using the form that includes its size. If you attempt
sizeof(global_array), the compiler should report an error, as it cannot
determine the actual size. The following declaration should appear either in
this source file or in any other source file to which this file is linked.
int global_array[10]; /* Sets aside 10 times sizeof(int) */
When declaring global variables, you either make a definition or a reference.
The form with the explicit size is a definition and sets aside storage. The
form without the size and with the keyword extern simply makes a reference to
a variable that will be linked to its definition at link time.
You can make lots of references to global_array within all the linked source,
but you may write only one definition. The ANSI standard sort of waffles on
this point. A strictly conforming program will have only one definition for a
global in all the sources files linked together. Some linkers will permit
multiple definitions of the same global and treat all the ones after the first
as references. The standard also permits the form
extern int global_array[10];
to act as a definition, a form not permitted in K&R.
To cover all possible extern declarations, the committee came up with the
concept of "tentative definition". Section 3.7.2 of the Standard gives
examples of this, though I won't elaborate on it here.
I prefer the general rule that works across all compilers:
One declaration of the form
int global_array [10];
and all other declarations of the form
extern int global_array[10];
The definition of the global sets the source aside. When the linker matches a
reference to that global, it checks only for a matching name, not whether the
two items are the same data type. (I once had a linker that did not even care
if the definition was for a data item and the reference was to a function.
That caused some interesting debugging problems).
Now if you declare global variables differently in two source files that are
linked, the linker will not produce an error. The ANSI committee stated that
this problem was outside their jurisdiction. You can avoid this problem with
something similar to function prototypes, as shown shortly.
Now to your problem (finally).
char date[9];
main()
{
strcpy ( date, "01-01-90" );
foo();
puts ( date );
}
The external declaration of date allocates nine bytes. strcpy copies
"01-01-90" into those bytes, and sizeof(date) is nine.
extern char *date;

foo()
{
puts ( date );
strcpy ( date, "02-02-90");
puts (date );
}
You declared date to be a pointer (sizeof(date) is four). The linker matched
this reference to the date variable declared in the other file as an array of
char. Referencing date in foo() uses the values in the first four bytes as an
address. The strcpy in foo() copies "02-02-90" into that address.
The strcpy in main() placed values in the first four bytes in date. The values
in the first four bytes of date in ASCII and their hexadecimal representation
are
ASCII 0 1 - 0
HEX 30 31 2D 30
Since the PC stores addresses in reverse byte order, strcpy() will be passed
0x302D3130. If you are lucky, this address is simply in your data space and
will simply wipe out nine bytes of some trivial variables. If you are unlucky,
however, the address is in your code space, and your instructions will change.
Using function prototypes (even ones without parameter indications) and
including them in all your source files will make the compiler check the
function return types. It can check that the function is actually defined
using the same type as the prototype.
Similarly, if you set up a header file of extern declarations for all your
external variables and include the header in all your source files, the
compiler will check that your actual definition for the externals match the
references to them.
Following these lines, I would have a header file called extern.h which
contains the code in Listing 3. The other files would #include this file.
Since you have #include extern.h in the file where the variable is declared,
changing the definition without changing the header file, produces a
compilation error for that file.
An easy way to check external function and data definitions and references to
them is to make the compiler create a symbol table that includes the function
return types, parameter types, and external data types. This information is
already created by the compiler and can be performed without the added baggage
of including the source headers in every file.
The checking algorithm that the compiler already performs on function
prototypes could be employed in a program that reads the symbol files for all
object files that are to be linked. Function prototype checking and external
data type checking could occur without having to write extra source code
specifically for that purpose.


Replies




Stringizing



As many have decried, there's plenty in the C language with which to pervert
the meaning of "elegance", such as the ternary "?:", taking the value of
assignments, the wonders of the comma operator, etc. With so much density
permitted by the language proper, why waste time trying to get cute with the
primitive C preprocessor?
I refer, of course, to Josh Cohen's letter (CUJ 3/90 "Q&A", p 34). I see
things very differently than you do. What you call the "uglier" solution is in
fact the simplest and therefore (dare I say it?) the most truly elegant.
What's so painful about defining not one but two symbolic constants in this
situation? I do it routinely, myself, where I know I'll need a numeral and a
string. I make it easy mnemonically by using the same symbol but with a
postfixed q for the "quoted" version, e.g.
#define MAX 10
#define MAX_q "10"
and then get off and running. Let us remember that related #defines are
usually placed in a common header location, so how horrible is it to do two of
them? I mean, unless your program has hundreds of symbols that have to be
available in both quoted and unquoted form, what's the big deal? Are you using
EDLIN or something?
And who even needs "plenty of comments"? I know for my own use, and could
document it easily for others, the one simple commentary fact that "In the
modules comprising this program, any preprocessor symbol ending with lower
case q expands to the enquoted expansion of a similar symbol written without
the q."
When it comes to the code that depends on such symbol pairs, things get
prettier, not uglier (i.e., you don't see one macro symbol as a parameter to
another). I expect the preprocessor runs slightly faster, to boot, without
going through substitution contortions. Certainly, the program runs much
faster indeed than if we run sscanf() each time! (How could you even suggest
it?!)
Diehards who refuse to accept the preprocessor's limitations may wish to
switch to Turbo C, which at least at version 2.0 sports the (we now know
non-ANSI-conforming) preprocessing "feature" of expanding tokens after the
stringizer, i.e., under Turbo they can indeed code:
#define BOZO Clown
#define STRINGIZE (X) #X
printf("Bozo the %s",STRINGIZE(BOZO));
and get what the Standard does not provide. Then they can complete their
clever machinations with
#if !defined(__TURBOC__)
#error PORT FAILURE: Stringizing\
macro contraindicated!
#endif
and be the envy of the neighborhood.
J. A. Jaffe
Walnut Creek, CA
Your comments are interesting. If I only had a couple of symbols, I'd probably
do it your way. However, I like to make only one change. If two things need to
change (even if they come one line apart), according to Murphy's law, only one
of them will be changed.
I would like to thank William M. Raike of Auckland, New Zealand, who also sent
a reply regarding this topic. He uses Turbo C, which incorrectly does the
expansion first. It is a bug, according to ANSI, rather than a feature. You
people who are using it, would you rather have Borland leave it alone or fix
it? (KP)


Function Redeclarations


In replying to the first inquiry by Mr. Glenn Jordan in the April CUJ, you
failed to address one of his points, which was the "Error 10: type mismatch in
redeclaration of getarray" issued by Turbo C. The reason for the error is that
when main () called getarray (), Turbo C built an implicit forward-declaration
for getarray (), which gave the function the implied return-type of int. When
getarray () was later defined as type void , the compiler perceived a
conflict.
Of course, the other compilers Mr. Jordan tried should have issued the same
error (unless maybe ANSI has downgraded it to a warning, but the compiler
should have said something. Note that had the getarray () function definition
appeared before its first call, there would have been no error, since the
definition of a function serves as its declaration if and only if the
definition occurs before the first time the function is called (and no
explicit forward-declaration is present). Tricky business!
John Lowenthal
Brooklyn, New York


sizeof Operator


Your answer in the April C Users Journal for Glenn Jordan's question
concerning sizeof could have been more concise if it stressed that sizeof is a
compile time operator, not a runtime function call, as it appears to be, (ref.
K&R Rev 2, page 135). The implications of it being a compile time operator
makes it obvious that it cannot be used inside a subfunction to determine
quantities like the size of a passed array.
The answer to his question is that a function cannot get an array's allocated
size. It must be passed this information.
Incidentally, I enjoy your column very much, and I find it to be of great
value, keep up the good work.
Harry N. Bearman
Fort Worth, TX


Keyboard Interrupt Processing


Mike Drew wanted to know how to do interrupt processing when a key is pressed.
You suggested that he chain int 0x16. This will not cause his toggle to be set
until some later time when he tries to read a keystroke from BIOS. If he
chains int 0x09, the toggle can be set immediately, and the spurious
keystrokes will never get into the buffer. The only drawback is you are
running off the hardware scan code, so the key is otherwise unavailable.
An outline for the package in Turbo C follows (Listing 4). I have also
attached a simple Turbo C routine (Listing 5) I have used to map the scan
codes on my keyboard. This works in the described manner, but does not chain
the old interrupt.
(Note: Bulletproof use of this technique requires capturing other interrupts
to ensure that the old 0x09 interrupt is restored when the program exits, or
placing the new interrupt in a TSR or device driver. In the latter case you
don't need install and restore routines. Also, if you screw this up, its
re-boot time.)
Joseph W. Gibson
Pasadena, CA


Of Mice And Men


Here are additional replies to the question on the IBM-PC mouse. I included
both listings, as I find it instructive to show how two people tackle the same
problem.
In the March Q?/A! column of CUJ, Michael Wiedmann asked how to determine
whether a mouse was connected to a specific COM port. He went on to mention
using int 0x33, function 0, so I presume that he is using the Microsoft mouse
and driver. I am not exactly sure what he wants to determine. Your answer
appeared to assume that he meant, "is a mouse actually attached to the
appropriate port?" Function 0 should detect that for him. If, instead, he
means, "which port is allocated for mouse support?", there is a different
function to determine that. The Microsoft interrupt function 36 (decimal)
returns the following information:
1. Which type of mouse is being supported (serial, bus, InPort, PS/2, or HP);

2. Which version of the mouse driver is installed;
3. Which IRQ is used by the mouse driver.
The last point is the relevant point for determining which COM port is used by
the mouse. However, there are two potential "gotchas". First, the PC uses IRQ
3 for COM1 and IRQ 4 for COM2, while the AT uses IRQ 4 for COM1 and IRQ 3 for
COM2. So you have to find out which machine is being used. Address 0xFFFFE in
the BIOS identifies the machine as follows:
0xFC is an AT
0xFD is a PCjr
0xFE is an XT
0xFF is a PC
[from The Programmer's Problem Solver by Robert Jourdain. Brady Books,ISBN
0-89303-787-7]
The second possible pitfall is that very old mouse drivers do not support
function 36. My original Microsoft Mouse User's Guide has functions 0-16 and
19 only. I don't know when function 36 was added.
As a minor point, I should add that it is possible to configure a bus mouse to
use COM1 or COM2 IRQs without their physically being attached to the
appropriate COM port. But, if the system is configured with an IRQ conflict,
detecting the mouse is the least of your problems.
Pages 203 and 204 in the Microsoft Mouse Programmer's Reference [Microsoft
Press, ISBN 1-55615-191-8] cover this function. I strongly recommend the book
to Mr. Wiedmann.
Thomas R. Clune
Boston, MA
This is a response to the question from Michael Wiedmann in the March 1990
Q?/A! column. He asked about determining which COM port on a PC compatible is
used by a serial mouse. This can be accomplished with Interrupt 33H Function
24H (Get Mouse Information). This is one of the more recent functions which
have been added to this interrupt. Microsoft originally used a MOUSE. SYS
driver loaded through a CONFIG.SYS mechanism during boot, with the following
old load syntax:
DEVICE=\path\M0USE.SYS
Some time back (I believe it was about two years ago) Microsoft changed to a
MOUSE.COM driver which is loaded as a TSR. This driver can be placed in
AUTOEXEC.BAT, but this is not required as long as it is loaded before needed
by an application program. The MOUSE.COM driver can be removed from memory
(MOUSE OFF), while the older MOUSE.SYS was permanent. The new load syntax for
MOUSE.COM is simply:


MOUSE


The code provided in my enclosed listing (Listing 6) includes a function which
calls INT 33H Function 24H and interprets the results. This interrupt returns
information about the interrupt line used by the mouse. PC compatibles have a
well-defined relationship between port addresses and interrupts for ports COM1
and COM2. Unfortunately, no such relationship is guaranteed for additional
serial ports such as COM3 or COM4. Therefore, the information provided here is
only useful for COM1 and COM2. Port COM1 uses interrupt IRQ4, while COM2 uses
IRQ3. The mouse_info() function returns the decoded COM number.
The mouse type (serial, bus, etc.), interrupt line, and mouse driver version
number are also returned by mouse_info(). If the mouse driver is not loaded,
the interrupt vector points to a return instruction, and no registers are
changed. All returned values from mouse_info() will then be zero. Please note
that the REGS union and int86() function are extensions to ANSI C available in
Microsoft C and QuickC. Similar methods of calling 8086 family interrupts and
communicating between registers and C variables should be available in other C
implementations for the PC.
Bill Byrom
Irving, Texas
This is an answer to your request in the March 1990 issue. The answer concerns
the question by Michael Wiedmann of West Germany on page 37.
To determine if a mouse is using a serial COM port, and which one, I have
provided the listing mcheck.c (Listing 7). The mcheck routine first calls the
mouse_status() routine to determine if a mouse driver is present. The
mouse_status() routine will call interrupt 33 (hex) with a function code of
zero to get the mouse status. If the mouse driver is available, then the
mouse_status() routine will return TRUE (1) and report in the mouse_info
structure the number of buttons the mouse has. If the mouse driver is not
present, then the mouse_status() routine will return FALSE (0).
If a mouse driver is present, then the mouse_config() routine will be called
to get the configuration of the mouse. This routine will call interrupt 33
(hex) with a function code of 24 (hex) to obtain the mouse configuration. The
mouse_config() routine will fill in the information, concerning mouse driver
revision, the mouse type and the Interrupt Request Level (IRQ), in the
mouse_info structure. The m_info.m_type has a value in the range from one to
five as currently defined in the Microsoft Mouse specification. All other
values are considered unknown by this routine. See the mouse_type character
array for valid mouse types.
Once the mouse information is obtained, the next step is to determine if the
mouse is a serial one, and if so, which COM port it is using. If the
m_info.m_type has a value of two, which means serial mouse, then the
which_port() routine is called to find out which serial interrupt vector the
mouse is using. The which_port() routine compares the COM1 and COM2 interrupt
vector segments against the mouse driver's interrupt vector segment using the
far pointers com1_vec, com2_vec and mouse_vec. If neither vectors match, then
which_port() will return zero, otherwise it will return one indicating that
the mouse is using COM1 or two (2) indicating that the mouse is using COM2.
The mcheck.c routine was tested under Microsoft's QuickC v2.0, Microsoft's C
v5.1 and Borland's TurboC v2.0. Tests were ran against Microsoft's Mouse
Driver v6.36 in both Bus and Serial modes and against Mouse Systems Mouse
Driver v5.03 in serial mode. It should be noted that the Mouse Systems driver
I was using appears to have some problems when calling interrupt 33 (hex)
function 24 (hex). It reported a driver major version as zero and an IRQ level
of 89. It even sometimes reported a mouse type of zero instead of two.
To avoid possible mouse driver incompatibilities as stated above, simply call
the mouse_status() routine to see if a mouse driver is present, and if so,
call the which_port() routine directly to see if the mouse is using a serial
port.
David W. Gunnell
Leesburg, VA


Floating Point Format


I received four replies to this question. In addition to the ones below, I
also heard from Michael Peppler of Geneva, Switzerland.(KP)
This is in regard to the letter from Finnbarr P. Murphy in your C Users
Journal column in the May 1990 issue. He wants to know about converting
floating point numbers in the MS BASIC format to the IEEE format.
This is a familiar problem to people who program in compiled BASIC such as
Microsoft QuickBASIC. While I haven't used QuickBASIC in the last few years, I
am aware of the solution provided by Microsoft. Versions 3.0 and higher of
QuickBASIC handle floating point numbers internally in the IEEE format, but
functions are provided which allow you to read a file containing MS BASIC
format floating point numbers. These functions automatically convert the
numbers to the IEEE format. You can then output the numbers to another file in
the IEEE format so that in the future you have to deal only with the IEEE
format. Page 320 of the QuickBASIC Programming in BASIC manual presents a
sample program for converting a file from one format to the other.
Perhaps more importantly for the C programmer, the library for Microsoft C
contains functions fmsbintoieee and dmsbintoieee which convert single and
double precision numbers respectively from the MS BASIC format to the IEEE
format. According to the excellent book Microsoft C Bible by Barkakati,
published by Howard Sams & Co., these functions are in the libraries for MS C
v4.0 and up. They are also available in Microsoft QuickC.
I hope this information is of use to you and Mr. Murphy.
Bernard H. Robinson, Jr.
DeBary, FL
Prior to QuickBASIC 4.0, all versions of BASIC (interpreters and compilers)
stored floating points in Microsoft Binary Format.
Microsoft has provided functions to convert between these two formats in their
library. They are as follows:
dmsbintoieee Converts Microsoft binary double-precision to IEEE format.
fmsbintoieee Converts Microsoft binary single-precision to IEEE format.
dieeetomsbin -- Converts IEEE double-precision to Microsoft binary format.
fieeetomsbin Converts IEEE single-precision to Microsoft binary format.
Vince Du Beau
Carteret, New Jersey
I'm no BASIC expert (or even a C expert), but I may be able to help with
Finnbarr P. Murphy's question about BASIC floating point numbers in the May C
Users Journal.
Microsoft C uses the IEEE format for floating point numbers. Microsoft
QuickBASIC before v4.0 (and I think, but am not sure, their other
implementations, including the interpreters) used the so-called Microsoft
Binary Format (MBF). Starting with QB 4.0, the programmer could choose either
MSB or IEEE. The latter was the default. The QB 4.0 manual entitled Learning
and Using Microsoft QuickBASIC, Appendix B, p. 247-252, discusses this. Pages
132-133 of the BASIC Language Reference covers the functions CVSMBF and
CVDMBF, which can be used to convert MBF to IEEE.
The encoding of MBF is given in the MASM 5.0 Programmer's Guide, p. 133-134.
Like IEEE format, zero is represented by all bits clear and, for non-zero
values, the integral portion of the mantissa is assumed to be one and not
expressed. Short real format is encoded as follows: bits zero through 22:
fractional part of the mantissa; bit 23: sign bit (set means negative); bits
24-31: exponent, biased by adding 81h to the "real" exponent. Long real
format: bits zero through 54: fractional part of the significant; bit 55: sign
bit (set means negative); bits 56-64: exponent, again biased by adding 401h to
the "real" exponent.
The easiest solution to the problem would probably be to write a BASIC program
to read the files, convert the numbers to IEEE using CVSMBF or CVDMBF, and
write the records back out. A combined C/Basic program could also be
concocted. Since the program would have to start up in BASIC (see the sections
in the manual on mixed-language programming), this solution would probably be
more trouble than it was worth.
I enjoy your columns each month and never fail to learn from them. I hope this
has helped you and Mr. Murphy.
Howard C. Sanner, Jr.

Bladensburg, Md.

Listing 1
function_1()
{
int good_auto_array[10];
int *integer_pointer;

good_auto_array[1] = 5;
*(good_auto_array + 1) = 5;

/* Point it to meaningful address */
integer_pointer = good_auto_array;

*(integer_pointer+1) = 5;
integer_pointer[1] = 5;
}


Listing 2
function_3(array_parameter)
int array_parameter[];
{
int local_array[10];
/* The following two statements perform the same thing */
array_parameter[1] = 5;
*(array_parameter + 1) = 5;
}

function_4(array_parameter)
int *array_parameter;
{
int local array[10];
/* The following two statements perform the same thing */
*(array_parameter + 1) = 5;
array_parameter[1] = 5;
}

calling_function()
{
int array_passed[10];
function_3(array_passed);
function_4(array_passed);
}


Listing 3
EXTERN.H

/* References to all extern variables
extern char date[9];

FILE #1

#include "extern.h"
char date[9];
main()
{
strcpy ( date, "01-01-90" );

foo();
puts ( date );
}

FILE #2

#include "extern.h"

foo()
{
puts ( date );
strcpy ( date, "02-02-90");
puts (date );
}


Listing 4
#define ACK_BIT 128 /* keyboard acknowledge bit */
#define HOT_Key 30 /* scan code for toggle key */

static void interrupt (*old_int)(); /* original int 0x09 */
static int toggle; /* Toggled value */

void interrupt my_interrupt() /* ours */
{
unsigned char scan_code, ack;

scan_code = inportb(0x60); /* scan code from 8255 PIC */
if (scan_code == HOT_Key) /* the one we watch */
{
toggle ^= 1; /* our toggle */
scan_code = inportb(0x61); /* control port */
outportb(0x61,scan_code ACK_BIT);
/* send keyboard ACK */
outportb(0x61,scan_code); /* clear ack*/

outportb(0x20,0x20); /* enable hardware interrupts */
}
else
{*old_int)(); /* normal processing */
}

void install()
{
old_int = _getvect(0x9); /* save old one */
_setvect(0x9,my_interrupt); /* install new one */
}

void remove()
{
_setvect(0x9,old_int); /* restore original state */
}


Listing 5
/* KB_TEST - Read Scan Codes from Keyboard */
#include <stdio.h>
#include <dos.h>


#define KB_BUF_SIZ 32 /* ring buffer size */

unsigned short int kb_buff[KB_BUF_SIZ];
/* char output buffer */

unsigned char kb_front=0, kb_back=0;
/* queue pointers */

#define PORT_A 0x60 /* 8255 PIC Port A */
#define PORT_B 0x61 /* 8255 PIC Port B */
#define ICR 0x20 /* Interrupt Command Register */
#define H_ENABLE 0x20 /* Enable hardware interrupts */
#define ACK_BIT 0x80 /* ACK bit for PORT B */
#define BRK_BIT 0x80 /* Scan code break bit */
#define CH_BITS 0x7F /* Scan code key number */

void interrupt kb_interrupt(void)

{
unsigned char scan_code, ack;

/* Read scan code and send ACK to keyboard */

scan_code = inportb(PORT_A);
ack = inportb(PORT_B);
outportb(PORT_B, ack ACK_BIT);
outportb(PORT_B, ack);

kb_buff[kb_back] = scan_code;
kb_back = (kb_back + 1) % KB_BUF_SIZ;

/* Re-enable hardware interrupts (and exit ISR) */

outportb(ICR,H_ENABLE);
}

/* Install the ISR */

#define KB_VECTOR (0x9)

static void interrupt (*old_interrupt)(); /* DOS interrupt vector */

void kb_install(void)
{
old_interrupt = getvect(KB_VECTOR);
setvect(KB_VECTOR,kb_interrupt);
}

/* Restore DOS ISR */

void kb_restore(old_interrupt)
{
setvect(KB_VECTOR,old_interrupt);
}

/* Read a char (with wait) */

int kb_read(void)


{
int ch;
while (kb_front == kb_back)
;
ch = kb_buff[kb_front];
kb_front = (kb_front + 1) % KB_BUF_SIZ;
return(ch);
}

int main()

{
int scan_code;

kb_install ();
printf("\nPress ESC to Quit\n\n");

while ( (scan_code = kb_read()) != 1) /* ESC key scan code */
{
if (scan code & BRK_BIT)
printf(" ^%d\n", (scan_code & 127));
/* break code */
else
printf(" %d",scan_code); /* made code */
}

kb_restore();
return 0; /* for lint */
}

**********


Listing 6
/* MINFO.C Mouse information demo program*/
/* Version 1.00 03-Mar-90 for Microsoft C 5.1 or QuickC 2.01*/
/* by Bill Byrom = Applications Engineer = Tektronix Dallas TX*/

/* MINFO displays information about the current mouse driver to
stdout, and returns an exit code corresponding to the COM port
number used. The mouse_info() function is used to interrogate
the mouse driver via INT 33H Function 24H (Get Mouse
Information). This function works with Microsoft MOUSE.COM
drivers currently available, but may fail with other drivers or
old versions of MOUSE.COM (tested with version 6.24).

int mouse_info( int *major_version, int *minor_version,
int *mouse_type, int *mouse_irq );

RETURN VALUE: COM port number associated with interrupt vector
used by mouse driver. IRQ4 gives COM1, while IRQ3 gives COM2.
If interrupt is other than IRQ4 or IRQ3, 0 is returned.

The driver version number is returned in major_version (decimal)
and minor_version (hex). The type of mouse is returned as an INT
in mouse_type (see example below for description of the five
types). The IRQ number used by the driver is returned in
mouse_irq (the PS/2 mouse returns 0, while serial mice normally
return 3 or 4). If the mouse driver is not loaded, all returned

values are 0. */

#include <dos.h>
#include <stdio.h>

int main(void);
int mouse_info( int *major_version, int *minor_version,
int *mouse_type, int *mouse_irq );

int main()
{
int portnum, major, minor, type, irqnum;
char *TypeMsg;

portnum = mouse_info( &major, &minor, &type, &irqnum );
switch( type )
{
case 1: TypeMsg = "Bus"; break;
case 2: TypeMsg = "Serial"; break;
case 3: TypeMsg = "InPort"; break;
case 4: TypeMsg = "PS/2"; break;
case 5: TypeMsg = "HP"; break;
default: TypeMsg = "Unknown";
}
if( type > 0 )
printf ( "Driver Version: %d.%x\n"
" Mouse Type: %s\n"
" Interrupt: IRQ%d\n"
" Port: COM%d\n",
major, minor, TypeMsg, irqnum, portnum);
else
printf( "Mouse driver is not installed or is an old version.\n" );
return( portnum );
}

int mouse_info( int *major_version, int *minor_version,
int *mouse_type, int *mouse_irq )
{
union REGS inregs, outregs;
int com_port;

inregs.x.ax = 0x0024;
inregs.x.bx = 0x0000;

inregs.x.cx = 0x0000;
int86( 0x33, &inregs, &outregs );
*major_version = (int)outregs.h.bh;
*minor_version = (int)outregs.h.bl;
*mouse_type = (int)outregs.h.ch;
*mouse_irq = (int)outregs.h.cl;
switch ( outregs.h.cl )
{
case 4: com_port = 1; break;
case 3: com_port = 2; break;
default: com_port = 0;
}
return( com_port );
}


*********


Listing 7
#include <stdio.h>
#include <dos.h>

#ifndef TRUE
#define TRUE 1
#define FALSE 0
#endif

int mouse_status(void); /* Check for mouse driver */
void mouse_config(void); /* Get mouse information */
int which_port(void); /* Which serial port inuse */

char *mouse_type[] = { /* Mouse types */
"Not Defined",
"Bus MOuse",
"Serial Mouse",
"InPort Mouse",
"PS/2 Mouse",
"HP Mouse",
NULL
};

struct mouse_info { /* Mouse info structure */
int major; /* Driver major version */
int minor; /* Driver minor version */
int m_type; /* Mouse type */
int irq_num; /* Mouse type */
int n_buttons; /* Number of mouse buttons */
};

struct mouse_info m_info;

unsigned far *com1_vec = (unsigned far *) 0x00000030;
/* Addr of Com1 vector */
unsigned far *com2_vec = (unsigned far *) 0x0000002C;
/* Addr of Com2 vector */
unsigned far *mouse_vec = (unsigned far *) 0x000000CC;
/* Addr of Mouse vector

*/
main ()
{
if (mouse_status ())
{ /*Check for mouse driver */
mouse_config(); /* Get mouse information */
printf ("Microsoft or compatible mouse driver \
present...\n");
printf("Mouse driver version is %d.%d\n",
m_info.major,m_info.minor);
printf("Mouse type is %s\n",
mouse_type[m_info.m_type]);
printf("Mouse had %d buttons\n",m info.n buttons);
printf("Mouse IRQ Level is %d\n", m_info.irq_num);
if (m_info.m_type == 2)
printf("Mouse is using serial port \

COM%d:\n",which_port());
}
else
printf("No Microsoft or compatible mouse driver \
present...\n");
exit (0);
}
/*******************************************************************
int mouse_status(void) - Check for the presence of a mouse
driver
Input arguments - None.
Returns _ TRUE (1) if Mouse driver present.
False (0) if no Mouse driver
available.
*******************************************************************\

int mouse_status(void)
{
union REGS regs;
regs.x.ax : 0; /* Function zero - Get mouse status */
int86(0x33, &regs, &regs); /* Call the mouse driver */
if (regs.x.ax :== 0xFFFF)
{ /* If return -1 then mouse present */
m_info.n_buttons = regs.x.bx;
/* Store number of buttons on mouse */
return (TRUE); /* Report Mouse present */
}
return(FALSE); /* Mouse driver not available */
}

/*******************************************************************

void mouse_config(void) - Get mouse configuration

Input arguments - None.
Returns - None.

*******************************************************************/

void mouse_config(void)
{
union REGS regs;

regs.x.ax = 0x0024; /* Function 0x24 -get mouse config */
int86(0x33, &regs, &regs); /* Call the mouse driver */
m_info.major = regs.h.bh; /* Save major version of
driver */
m_info.minor = regs.h.bl; /* Save minor version of
driver */
m_info.mtype : ((regs.h.ch > 0 && regs.h.ch < 6) ?
regs.h.ch : 0);
m_info.irq_num = regs.h.cl; /* Save mouse IRQ level */

}
/*******************************************************************
int which_port (void) - Check which serial port the mosue
is using.
Input arguments - None.
Returns - 1 - COM1 in use.

2 - COM2 in use.
0 - Neither COM1 nor COM2 in
use.
*******************************************************************/

int which_port(void)
{
if (*(mouse_vec+1) == *(com1_vec+1))
return(1);
if (*(mouse_vec+1) == *(com2_vec+1))
return(2);
return (0);
}

********
















































On The Networks


comp. sources. unix Is Back!




Sydney S. Weinstein


Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and president
of Datacomp Systems, Inc., a consulting and contract programming firm
specializing in databases, data presentation and windowing, transaction
processing, networking, testing and test suites and device management for UNIX
and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron
Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the
Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do
Internet addressing).




The Search Is Ended


From the number of different postings to comp.sources.unix in the past two
months it appears that Rich Salz (its moderator) is back and even has a little
time available to process the postings. Welcome back, Rich, it has been
awfully quiet.
To help detect corruption in the postings, Rich has ported and made some
changes to Ralph Merkle's SNERFU program. SNERFU is a one-way hash algorithm.
Given some input text, it will generate a single number, such that no two
texts will hash down to the same number. The hash value is really a type of
checksum in this usage, allowing for detection of corrupted postings, but not
for authentication.
SNERFU is a port of the Xerox Secure Hash Function, an algorithm developed by
Xerox and released with no restrictions except that the original author must
be given credit. Starting with SNERFU itself, Volume 21, Issues 19-22, Rich
will be providing SNERFU checksums on all postings.
Those trying to secure UNIX systems should look at Volume 21, Issues 23-27
which contain the Computer Oracle and Password System (COPS), a security tool
for systems administrators. COPS checks and reports on the current status of a
given UNIX system. Included are device permission checks; password file checks
(including tests for "crummy" passwords, non-unique ids, and invalid fields);
group file checks; user file checks (for home directories and startup files);
crontab and startup file checks; and the Kuang expert system (written by
Robert Baldwin) which determines whether a given user (default root) is
compromisible. Lastly, and the only check that requires root privileges, COPS
performs a find on all files to check for changes among the list of SUID
programs.
COPS is not the ultimate in security checker; it's a tool for checking for
common procedural errors that should be run regularly as a system's first line
of defense from users' ignorance or carelessness. COPS was posted by Dan
Farmer (df@sei.cmu.edu).
The systems administration theme continues with Volume 21, Issue 31, an
indirect shell script execution utility from Maarten Litmaath
(maart@cs.vu.nl). This utility allows the restricted use of set-uid shell
scripts to perform functions you would not allow any user to do -- useful for
lock files, mounting and unmounting file systems, and other common tasks that
could require root privileges. If you pick up this posting, don't forget to
get the patches for it. One patch was released on Volume 22, Issue 10 shortly
after the package was posted.
Terry Jones (terry@pcsbst.pcs.com) has contributed strsed (Volume 21, Issue
28), a string function (extending to strchr, strrchr, etc.) that does regular
expression search and replace (including tr type transliteration in the style
of the ed/edit/ex/sed text editors.) strsed makes use of the GNU regex package
which was not included as part of this issue but is widely available. 
strsed supports complex string manipulation without resorting to an external
process.
Ipl, a two-dimensional graphic production system, was contributed by Steve
Grubb (lsr-vax!scg). Ipl produces scatterplots, line plots, bar graphs, range
displays, pie charts, US and Canadian maps, schedule charts, boxes, arrows,
text and much more. Ipl can generate Postscript or Sun output and includes
many examples, a table beautifier (for taking plain text tables and setting
them in a nice font), and a low level subroutine library for producing nearly
equivalent graphics (with shading) for both Postscript and Sunview. Ipl is
Volume 21, Issues 32-45.
For those who want a yacc (a LALR parser generator) but don't have an AT&T
license and don't like GNU's rules, Berkeley has contributed their version of
yacc, from Robert Corbett (corbett@ernie.berkeley.edu). This version is
totally public domain, no fees, no licensing rules -- Volume 21, Issues 78-82.
The largest recent posting is by far the p2c, a Pascal-to-C translator. p2c
understands several Pascal dialects including HP and partial Borland Turbo
Pascal, and is very flexible and configurable. Designed and developed on an
HP-UX system, p2c is System V-based, but has been used on BSD-based Sun. Those
with 286s, however, may be out of luck because p2c does require a 32-bit
architecture. The release also includes a runtime library and some manual
pages. p2c is covered by the GNU license rules and was contributed by Dave
Gillespie (daveg@csvax.caltech.edu) as Volume 21, Issues 46-67. The
distribution is about 1.5MB.
coda, a code distribution aid posted by Rich Salz (rsalz@bbn.com) helps keep
source distributions current across a set of machines. Based on the model of a
master source tree kept on a server. coda is executed by clients to
synchronize their files with the server. coda is not as flexible as the rdist
tool, but requires less of client machines. The client code is very simple and
easy to port. Ports are provided for many UNIX, VMS systems and Xenix. An
MS-DOS port will be forthcoming. coda is Volume 21, Issues 84-36.
If you've ever accidentally overwritten or deleted a file while copying or
moving files using wildcards, mmv is for you. This utility moves, copies,
appends and links multiple files using wildcard patterns safely. mmv prevents
any unexpected deletion of files due to collisions of target names with
existing file names or other target names. It also attempts to detect any
errors that would result from the entire set of actions specified and gives
the user a chance to abort or ignore parts of the actions before it starts the
copies. Support is provided for BSD, System V and MS-DOS. mmv descends from
ren and was submitted for Volume 21, Issues 87 and 88 by Vladimir Lanin
(lanin@csd4.cs.nyu.edu).
Those desiring SunOS 4 automatic mounting features provided on NFS should
acquire amd, an automounting daemon. amd mounts filesystems whenever a file or
directory within that filesystem is accessed and unmounts the filesystem once
it has gone quiescent. amd has been designed to be robust in the face of NFS
server failures. amd contains no UNIX source code and is distributed on the
standard Berkeley public license (free use/redistribution, just give credit).
Jen-Simon Pendry (jsp@doc.ic.ac.uk) contributed amd for Volume 21, Issues
89-101.
Contributed by Matt Landau (mlan- dau@diamond.bbn.com) Exabyte_ toc places a
table of contents at the front of a 2.2 GB (2,200MB) Exabyte 8mm backup tape
and later allows users to update the table. Exabyte_toc reserves a fixed
amount of space at the start of the tape (currently ten megabytes) for
information about the contents of the tape. About 2MB of this space is used
for the table of contents, the rest is a buffer space to allow for rewriting
-- volume 21, issue 102.
Currently, the networks are struggling with the problem of message
authentication. RFC-931 standardizes a method for authenticating that a
computer and user really are who they claim to be. Volume 22, Issues 1 and 2
are a secure user-level implementation of RFC-931. This authentication server
is secure as long as TCP itself remains uncompromised on the two communicating
computers. Auth2.1 was contributed by Dan Vernstein (brnstnd@acf10.nyu.edu).
Dan also contributed a set of scripts and utilities, some of which illustrate
how to use the auth package, and some which make auth easier to use. None are
required for using auth. Volume 22, Issues 3-5 contained the toolset. 
A simple utility program (tpipe) splits a UNIX pipeline into two pipelines
(Volume 22, Issue 11). It replicates stdin to two other pipelines executing
simultaneously.


Brandon's Still Working Also


The other mainstream source group, comp.sources.misc, has also been prolific.
The largest posting, the 1.3MB Starchart/observe 3.2, produces astronomical
charts from a variety of databases. Starchart can produce output for
postscript, x11, Sunview, Atari ST, IBM PC, tektronix, UNIX plot, HP laserjet
and tty displays on UNIX, VMS and other systems. Databases were previously
published in comp.sources.unix volume 12 for the Yale bright star catalog, and
in comp.sources.unix volume 16 for the SAO catalog. This posting also includes
the Hubble Space Telescope Guide Star catalog v1.0. Contributed by Craig
Counterman (ccount@athena.mit.edu) and Alan Paeth (watCGL!AWPaeth) with major
contributions by Joe Wang (joe@athena.mit.edu) and Steve Kennedy
(smk@cbosgd.att.com), starchart is Volume 11, Issues 29-60.
Dangle, Volume 11 Issue 63, contributed by John Cullen (jjsc@inf.rl.ac.uk),
checks for code inadvertently inactivated by nested comments or by loss of a
terminating comment delimiter. Dangle checks for comments and strings which
bridge lines and can be over zealous in reporting.
Gnuplot 2.0, another large posting (not related to the Free Software
Foundation GNU) is a command-line driven interactive function plotting utility
for UNIX, MS-DOS and VMS. This graphical program was originally developed to
help scientists and students visualize mathematical functions and data. This
enhanced version produces publication quality plots and data graphs. Gnuplot
supports many different types of terminals, printers and plotters and is
easily extensible to support new devices. It features Cartesian and Polar
plots, logscale graphs, intelligent tic spacing, optional autoscaling, user
definable functions and variables, all the built-in functions of C, Fortran
and Basic. Contributed by Tom Williams and Colin Kelley (pixar!info-gnuplot).
Three recently received additional postings extend the PBM toolset (see my CUJ
column on portable graphics, vol. 8 no. 6). Ppmtoxpm from Mark Snitily
(mark%zok@sj.ate.slb.com) in Volume 10, Issue 79, pbmtochar from Diomidis
Spinellis (dds@cc.ic.ac.uk) in Volume 11, Issue 80, and ppm-utils from Bill
Turner (bturner@hpcvxbt.cv.hp.com) in Volume 11, Issue 84. Another Groupe Bull
format, XPM, is an ASCII- based color pixmap format that can be included in C
source or read at runtime. Pbmtochar produces line printer art representations
from pbm bitmaps. Chartopbm reads the text file and produces the bitmap from
the line printer file. Ppm-utils contains ppmfill, a subset of pnmtile for
pixmaps only; ppmborder, to write a border around a pixmap file; ppmmerge to
concatenate a number of pixmaps together either side by side or vertically
stacked; and ppmoverlay, similar to pnmpaste but with a different interface.
If you need to send files or directories via electronic mail, mailsplit will
split the files into transmission units and generate an awk script which will
re-assemble the files at the receiving side. mailsplit v2.7 compresses
transmission units to save space. Volume 11 Issue 89, submitted by Mitchell F.
Wyle (wyle@b.inf.ethz.ch).
Some tools, contributed by John Birchfield (jb@altair.csustan.edu) manipulate
PC-clone serial ports (NSC8250 chip). Minimally documented, this submission
supports Microsoft, Borland and Zortech C compilers and includes a vt100
emulator. PC-8250 is Volume 11, Issues 93-95.
A simple text to postscript filter, contributed by Dan Judd
(danjudd@acc.stolaf.edu -- Volume 11, Issue 101), runs on UNIX and MS-DOS and
can handle tabs and backspaces. The filter prints in both landscape and
portrait mode and can place two pages on each sheet of paper. It also has an
option to control the number of lines per page and to automatically select a
type size that will accommodate the number of lines requested.


Some New Games 


Othello, the traditional game of reversi on an eight-by-eight board, has been
redone with a new graphical interface. Rich Burridge (rburridge@sun.com) from
Australia has rewritten this old favorite to support several modern
interfaces, including Sunview (Sun's proprietary windowing system), XView (a
version in the style of Sunview written using X11), X11 (the portable
windowing system from MIT). It contains a rework of the Sun graphics front-end
from Ed Falk of Sun and computer move strategy from Chris Miller. By Rich's
reckoning, "it plays a very reasonable game." However, the user interface,
especially in the graphics versions, is worth studying. Using only eight
colors, the interface provides buttons that cast a shadow and up/down toggles
for the game parameter selection. The game works from both the mouse and from
the keyboard allowing either to be used interchangeably. Othello v1.3 is
Volume 9 Issues 37-40 plus patches in Issues 42, 44 and 45.
I was a little premature with my last installment's information about the new
wraparound version of Tic-Tac-Toe. Eric Lechner (lechner@ucscb.ucsc.edu) has
rewritten it again, providing v2 of his game. Once again, the goal is to get
four pieces in a row, either horizontally, vertically, or diagonally. Note,
that with wraparound edges, the diagonals can be off center. Thus there are no
corners on the board. The new version has a better user interface, on-line
help, and the ability to play with zero, one or two players. tttt2 is Volume
9, Issue 36 and supersedes tttt, which was Volume 9, Issues 33-34.



Previews From alt. sources


Under UNIX there are two flavors of terminal capability description files,
termcap and terminfo. There are also two associated runtime libraries, one
that reads termcap files and one that reads terminfo files. Ross Ridge
(contact!ross) has written a runtime library that can read termcap source
files, terminfo source files or terminfo compiled files, merging the
capabilities of the two runtime libraries. With this library, it is no longer
necessary to add a terminal to each of the files, only one need be present.
The beta test version was posted in April 1990 in five parts. (There is no
moderator for alt.sources so there are no volume and issue numbers.) 
Paul Condie (pcbox!pjc) has contributed a large package called menu that
provides "an easy-to-use menu utility that allows users to operate an
application system in a user-friendly, menu-driven environment. Any
application software, regardless of source language, can be executed from
menus designed with menu." The package supports multi-level menus, data entry
screens that can provide displays, choices and responses that can provide
input to the application software, and screen attributes. menu is driven by a
script file that can be edited using a standard text editor. It was posted in
April 1990 in eleven parts.
Many of the sources are distributed in a format called a shell archive or
shar. Warren Tucker (wht%n4hgf@gatech.edu) has released a new version of his
shell archiver call shar3.21 which incorporates changes from Richard Gumpertz
and Colas Nahaboo. This version generates shell code which attempts to create
needed directories, supports setting modification times on unshared files,
automatically generates submitted by and archive name headers, can unshar
concatenated shars and a mix of shar files and concatenated shar files. Posted
in April 1990 in one part.
Also in April, Sandford Zelkovitz (conexch!sandy) posted XBBS v7.91. This
large posting (over 600KB in thirteen parts) is the source and documentation
for the XBBS bulletin board system for Xenix. For some reason, it was posted
as a uuencoded compressed tar file (not the normal standard). Also recently
(May) some patches for it have been posted to alt.sources.
NCR has allowed the posting of its System Characterization Benchmark suite. It
was posted at the end of April by Rich Kaul (kaul@icarus.eng.ohio-state.edu)
in four parts. This system will stress the CPU, I/O throughput to disk and
serial ports, and degradation as the number of processes increases. It reports
both in text and graphical format on the CPU subsystem, disk subsystem and the
terminal subsystem.
If you use the SCB suite, NCR requests that you return an acknowledgement
letter to them. The letter basically acknowledges that NCR wrote the
benchmark, and that you will not publish results of the benchmark if you
modify it. Results from the unmodified benchmark are publishable as long as
NCR is credited for the benchmark code. There are no fees required for its use
nor other restrictions.
There have been several tree printers written for printing the tree directory
structure under UNIX. A new one called TreeD was posted by Istvan Mohos
(hhb!istvan) in early May 1990. This version prints each page of the map
separately, and numbers them like the grids on a map.




















































GraphX Graphics Library


Victor Volkman


Victor R. Volkman received a BS in computer science from Michigan
Technological University in 1986. He is currently employed as software
engineer at Cimage Corporation of Ann Arbor, Michigan. He can be reached at
the HAL 9000 BBS, 313-663-4173, 1200/2400/9600 baud or any BBS in the W-Net
Network.


GraphX is a device-independent graphics library with two and three-dimensional
drawing capabilities. GraphX provides 185 functions for manipulating 2-D
figures, 3-D figures, stroke text, and raster text. GraphX is apppropriate for
intermediate level C programmers with some prior exposure to 3-D graphics and
can be used with either Borland Turbo C 2.0 or Microsoft C 5.1.
I tested the 02/28/90 release for Turbo C 2.0, distributed on a single
high-density 5-1/4" 1.2M diskette. GraphX is also available on 3-1/2" 720K or
1.44M diskettes.
Programs developed with GraphX will run on any IBM-PC compatible system with
640K RAM and the appropriate graphics hardware. However, an 80286-based
computer, hard disk, and Intel 80x87 Numeric Data Processor (NDP) are
recommended. Third-party NDPs such as the Weitek 4167 aren't supported. GraphX
uses neither EMS nor extended memory for any purpose. A telephone call to
Civilized Software revealed they are planning to release a version developed
with the Eclipse's 16-bit 80286 DOS Extender at an unspecified date.
GraphX supports only the large memory model (code > 64K, data > 64K). This is
an unusual requirement since other graphics packages, such as Halo '88 and
GSS*CGI, offer support for at least the small, medium, and large models. Since
GraphX contains a single library (.LIB) file encompassing all possible
supported devices, it will add at least 175K to the size of your application.
GraphX requires MS-DOS v3.0 or later. My GraphX applications ran without
complaint on MS-DOS v3.30 and IBM PC-DOS v4.01. Although not DESQview-aware,
programs built with GraphX functioned correctly with DESQview v2.25 from
Quarterdeck Office Systems. GraphX also worked fine in Novell Netware 2.15 and
3.00 environments. I tested GraphX with both the ATI VGA/Wonder and an IBM
PS/2 VGA adapters.
GraphX competes with graphics packages such as Halo '88 by Media Cybernetics,
Inc. and GSS*CGI Computer Graphics Interface by Graphics Software Systems,
Inc. All three offer similar functionality in the same price range. I will
compare GraphX, Halo '88 and GSS*CGI on the basis of how well each supports
devices, drawing operations, and fonts. Additionally, I will compare the
performance of GraphX and Halo '88.


Device Support


GraphX supports mainly graphics adapters which are closely compatible with the
IBM EGA and VGA display modes. A total of six adapter and mode combinations
are supported, including EGA 640x350x16, VGA 640x480x16, Genoa SuperVGA
800x600x16, and ATI Wonder 800x600x16.
Most notably missing from the list are AT&T Targa, Hercules monochrome and
InColor, IBM CGA, PGA, MCGA, and 8514/A support, as well as the IBM VGA
320x200x256 mode. Super VGA adapters with 512K or more video RAM such as the
ATI VGA/Wonder and Video-7 VRAM are also not represented. Modern graphics
cards based on the Texas Instruments Graphics Array (TIGA) 340x0 CPUs are also
not supported. All of the above mentioned displays are supported by both Halo
'88 and GSS*CGI.
A telephone call to Civilized Software indicated IBM 8514/A adapter
(1024x768x256) support would be forthcoming at an as yet unspecified date.
Additionally, the tutorial manual promises a Hercules monochrome driver in the
future.
For hardcopy output, GraphX currently supports only HP LaserJet II and
Postscript compatible printers. Documents larger than "A" size (8.5 in. x 11
in.) are not supported. Both Halo '88 and GSS*CGI support several devices in
each of these categories. A call to Civilized Software indicated they are
planning to release a version supporting the new HP LaserJet III and possibly
the HP PaintJet at an unspecified date.
For locator input devices, any two or three-button mouse that uses a Microsoft
mouse compatible driver will work. There is no GraphX support for X-Y location
via joystick, graphics tablets, light pens, or other input devices. Both Halo
'88 and GSS*CGI support several devices in each of these categories. In
GraphX, the locator cursor is always displayed as an arrow pointer of the same
fixed size. By contrast, GSS*CGI allows the locator cursor to be set to an
arbitrary bitmap pattern or one of six pre-defined styles (crosshair, arrow,
hourglass, etc.). In Halo '88, the locator cursor is a crosshair whose height,
width, and color is user-defined.
In GraphX, the console keyboard is not supported as a device of its own.
Consequently, any keyboard input while in graphics mode must be supplied by
routines of your own construction, because keystrokes should not be echoed on
the display. An example of a suitable keyboard input routine is given in the
GraphX Tutorial manual. Although Halo '88 provides a separate cursor for text
display in addition to the locator cursor, it does not address the console
keyboard as a device either. GSS*CGI does support the console keyboard as a
device. You can read from the keyboard in sample (polled) or request mode
(wait for keystrokes). The output can optionally be echoed at a given screen
coordinate using the current font.
GraphX does not support input from imaging devices such as video digitizers
(frame grabbers), video overlays (genlocking), or page scanners without
conversion to GraphX format. Halo '88 supports serveral devices in each of the
above mentioned categories. GSS*CGI supports only page scanning devices.
GraphX differs from Halo '88 and GSS*CGI by linking all of the device drivers
into the application program. This has two important effects. First, you
cannot update drivers independently of the main application, so your customers
will have to get new versions of your software to take advantage of increased
device support in the future. Second, the opportunity for direct use of
third-party device support programs is eliminated. For example, many CAD
programs have interfaces for third-party device drivers. These device drivers
are usually small resident programs that interact with the application via a
software interrupt. This type of interface allows hardware vendors to develop,
improve, and distribute their own device drivers independently. If device
drivers cannot be provided independently, then hardware vendors will be
reluctant to commit resources to their development.
GSS*CGI requires "true" device drivers that are loaded from CONFIG.SYS during
system boot time. Since they use a significant amount of MS-DOS memory, you
may need to perform a different boot before running a non-graphics
application. In Halo '88, the device drivers are loaded during run-time with
calls to setdev() for display adapters and to setprn() for printers. GraphX
doesn't offer any screen-dump facility such as offered by the gprint()
function of Halo'88.


Drawing Support


GraphX is strongest in its support for 2-D and 3-D graphics drawing. GraphX
provides separate 2-D and 3-D functions for drawing dots, lines, polygons,
splines and moving the world cursor. The 3-D functions specify coordinates in
triples of (X,Y,Z)a Cartesian coordinates. The 2-D functions operate on the
X-Y plane (Z=0). Each of the functions is available in a version which uses
absolute or relative coordinates. Coordinate units consisting of user-world
units, device-independent 0 to 1 viewplane units, pixel units, and absolute
inches may all be used. All of the drawing on the window takes place within
the context of the current transformation. For example, if you have specified
a transformation consisting of a rotation about the Z-axis and a translation
on the X-axis, then all objects will be drawn as transformed accordingly. A
smaller set of dot, line, and move functions is available for drawing on the
current viewplane, which is 2-D and independent of window transformations. A
viewplane is a user-defined rectangular portion of the display in which the
actual drawing takes place. Additionally, 2-D elliptical arcs (which includes
circular arcs as a special case) may be drawn on the viewplane, but not on the
window. Objects may be drawn in any one of four writing modes: replacing,
XOR-ing, OR-ing, or AND-ing with the previous pixel values.
GraphX also provides clipping windows and perspective support. Just as a
rectangle is used to clip against a two-dimensional window, a 3-D box is used
to clip against a three-dimensional window. Of course, since a CRT can only
display two-dimensional objects, a three-dimensional window is simulated by
projection onto a two-dimensional viewplane. GraphX supports several
combinations of pyramid clipping for perspective projections, and perspective
foreshortening on the Z-axis.
Although it can be argued that any graphical shape can be rendered using only
lines, a graphics application library should provide more than just these
primitive objects. Although neither Halo '88 nor GSS*CGI has 3-D primitives,
they do support a wide set of 2-D graphics primitives. For example, both
support bar, box, circle, filled circle, pie wedge, and marker symbol
primitives in addition to those offered by GraphX. (A marker symbol is an icon
or a character (such as an "X") that can be used, for example, to highlight
the vertices of a segmented line.)
In addition to the shape-drawing primitives, a complete package provides fill
functions. GraphX supports a vector-fill operation for arbitrarily complex
polygons, but does not support a traditional bitmap flood-fill operation. (A
flood-fill paints a display region beginning from a specified seed point.
Painting continues to spread until the edge of a bordered region is reached.)
Unlike flood-fills that operate blindly, a vector-fill operates on a set of
vertices defining a closed polygon. Thus, the polygon boundaries define the
region border instead of using the existence of previously drawn objects.
Halo '88 and GSS*CGI offer a range of hatch patterns with which to fill. The
GraphX vector-fill uses only a pattern of parallel lines with user-defined
spacing, slope, and other characteristics, requiring repetitive fills to
achieve cross-hatch patterns. Halo '88 offers ten hatch patterns for all fill
types; five of these may be user-defined. GSS*CGI offers 33 standard hatch
patterns plus one user-defined bitmap hatch pattern. Halo '88 offers three
styles of bitmap flood-fills as well as a vector-fill. GSS*CGI has a
vector-fill but no flood-fill.
GraphX also has support for manipulating rectangular bitmap areas, referred to
as "patches". Patches can exist both in memory and on the screen. Patches can
be created, copied, displayed, written to files and read back from files.
Copying a bitmap area is often more complex than just replacing the old pixel
values with the new ones from another bitmap. For example, you might want to
logically OR one bitmap against another bitmap to combine pictures. GraphX
provides a total of 26 bitmap operators including AND, OR, NOT, XOR, min, max,
rotate, sum, difference, and average. These are a superset of the nine bitmap
operators offered by Halo '88 and the 16 writing modes available in GSS*CGI.
GraphX has another uncommon feature for bitmap manipulation: patches may be
subpatches of other patches. This nesting of patches simulates a n-ary tree.
GraphX also supports "pictures", sequences of move and draw commands, which
can be created and then redrawn as desired. Pictures may be called as picture
subroutines to form a hierarchical structure.


Font Support


GraphX provides both stroke and bitmap fonts. Stroke fonts are character sets
defined by brush strokes in the form of vectors. Just as a calligrapher always
writes a letter in the same way, a stroke font specifies a uniform description
of each character. Since stroke fonts are composed of vectors, they can easily
be enlarged, reduced, rotated and drawn in 3-D with minimal distortion. Bitmap
fonts specify patterns of pixels, rather than vectors, to describe each
character. When bitmap text is enlarged, its appearance becomes more
block-like. Bitmap text is always faster to display than stroke text since the
font data is copied rather than computed.
Fonts provided with GraphX include Times-Roman, Helvetica, Courier, Old
English, Gothic, Cyrillic, Greek, Block, Cursive, and Symbols. Dozens of
symbols are available throughout the fonts including musical, mathematic,
scientific, and astronomical symbols. Most of the fonts are available with a
variety of stroke weights and also in italicised form. The initial font
selection is better than most supplied by competing graphics packages. Figure
6 shows some text in each of the 34 stroke fonts as plotted by the Postscript
landscape-mode driver. Note that the missing letter "G" and partial letter "t"
in Font #32 of Figure 6 represent bugs in the Postscript landscape-mode
driver.
Stroke fonts numbered one through 34 are provided in a single font file named
GXFONTS.O. Although completely unique to GraphX, the stroke font file format
is fully documented. Up to 99 stroke fonts can be distributed in a single font
file. A single custom stroke font, FONTO, can be constructed with the DUCHAR()
library function which writes a single stroke character to the FONTO file. An
individual stroke font character may be composed of any combination of lines,
arcs, cubic splines, boundary-directed splines, polygons, filled polygons, and
scale multipliers.
The GraphX stroke fonts can be agonizingly slow compared to most other
graphics applications I have used. Fonts with heavier stroke weights are
proportionally slower to display than fonts with lighter stroke weights. Fonts
composed of characters that are defined by polygon fills are by far the
slowest. For example, a 12MHz AT with a VGA monitor displays fewer than 40
characters per minute from font #30 (a polygon-fill font).
GraphX provides 34 bitmap fonts identical to the stroke fonts. Again, the file
format is completely unique to GraphX, although documented. These fonts are
simply rasterizations of corresponding stroke fonts at a height of roughly
twenty pixels. Bitmap font number zero always maps to the built-in character
set ROM of the display device you are using. For example, font number zero on
an EGA display would have character cells of 8x14. This is analogous to "fast
text" fonts in Halo '88 and "cursor text" in GSS*CGI.
The utility of bitmap fonts is limited because they may not be scaled as they
are displayed. Additionally, strings of bitmapped text may only be written in
the left-to-right direction. Halo '88 and GSS*CGI graphics packages both allow
bitmap fonts to be scaled by an integer value. Halo '88 even allows the width
and height of bitmap fonts to be scaled independently. In Halo '88, bitmap
text may be drawn in any of the four compass paths, including right-to-left
and top-to-bottom.
Except for the harware font 0, both stroke and bitmap fonts are limited to the
128-character ASCII character set.
GraphX does not maintain separate position cursors for graphics and text -- a
single "internal world position" is used for both text and graphics objects.
Halo '88 and GSS*CGI both have separate cursors for text positioning and
graphics positioning, allowing a crosshair cursor and an text input prompt to
be displayed simultaneously, for example.
The GraphX bitmap fonts are not downloadable to the HP LaserJet II printer.
Since all 34 fonts are not available on the HP LaserJet II printer, GraphX
attempts to find the best fit between its internal bitmap font description and
one of the twelve fonts resident in the printer. For PostScript printers,
GraphX supports four hardware stroke fonts: Times-Roman, Helvetica, Courier,
and Symbols. Although you can override the font mapping routine and specify a
printer-specific hardware font, doing so defeats the purpose of device
independence.



Documentation


The manual contains approximately 130 pages divided into tutorial and
reference manuals of roughly equal size. The manuals would serve better if
bound separately since each section has its own independent table of contents
and index. The texts make frequent use of mathematical notation to describe
how GraphX works, especially in the sections about 3-D drawing and
transformations. I would recommend having at least a college-level course in
linear algebra in order to understand the derivations.
The tutorial manual briefly covers each of the major functional areas of
GraphX: initialization, drawing 2-D images, text drawing, 3-D object viewing,
polygons, and so forth. Hardcopy devices, however, are mentioned only in
passing. Most sections contain several code fragments and screen dumps that
show the effect of the code. The examples include practical applications, such
as a simplified paint program.
The reference manual consists mainly of a catalog of GraphX functions listed
alphabetically within each major functional area. The reference manual
contains no figures and few code samples to illustrate the functions. The
function descriptions uniformly use real when they should in fact be double.
Although this idiosyncracy is mentioned in a footnote, it still may confuse
some readers. Appendices in the reference manual provide font file formats,
complete font listings, display list data structures, device specific
information, and a listing of runtime error messages.
Civilized Software's support policy and update practices aren't mentioned
anywhere in the documentation. No registration materials or license agreement
were included in the package. A telephone call to Civilized Software revealed
that each GraphX license includes lifetime telephone support. Civilized
Software also promised to provide free updates to customers who turn in bug
reports. Civilized Software reported that they will mail update disks for
major enhancements, but did not have a definite release schedule.


Case Study: Porting Bargraph


Bargraph is a script-driven program for producing presentation quality bar
charts (see bibliography). Bargraph was originally written to use the
interface provided by the Halo '88 graphics library. As a practical test of
GraphX's functionality, I ported bargraph to the GraphX environment. Figure 1
is a Postscript plot of my bargraph benchmark as printed by GraphX. The entire
porting process, including learning how to use GraphX, took less than two days
from start to finish. Most of the Halo '88 functions had a one-to-one
correspondence with equivalent GraphX functions. The two exceptions were
drawing stroke text at an angle and drawing filled rectangles.
Unlike Halo '88 and GSS*CGI, GraphX does not have a function to set the
baseline angle at which stroke text will be drawn. In the Halo '88 version of
the bargraph program, I used angled stroke text to prevent the labels below
the graph from overwriting one another (see Figure 1). In GraphX, you must do
this indirectly by defining a rotation vector, calling DROTATE() to produce a
4x4 transformation matrix, and then calling DAPPLY() to apply the
transformation to all subsequent vectors. Next, you must set up a 4x4
translation matrix with DTRANSLATE(). Lastly, you combine both transformations
with DRCOMPOSE(). This process has to be repeated for each string of text you
want to display. The source code for angle_text() in Figure 2 demonstrates the
steps.
Although GraphX has a filled rectangle primitive obscurely provided by its
DLINETYPE() function, I chose to supply my own equivalent function. The best
solution was to start by defining a closed rectangular polygon. Next, I
fetched the vertices into a dynamically created array with DMATRIX() and drew
the rectangle with DDRAWP(). Last, I called DFILLP() to do a solid vector-fill
and then deallocated arrays with DKILLP(). The source code for draw_bar() in
Figure 3 illustrates this process.
When vector filling a polygon, each fill-line creates a new vector, and each
new vector requires an allocation from the heap to record its characteristics.
The ensuing allocation overhead coupled with floating point arithmetic
combines to make the fill process noticeably slower than with Halo '88. The
process is slowest when filling an area with a solid hatch pattern.
Figure 4 compares the bargraph programs built with Halo '88 and GraphX. The
basic object code files are of nearly equal size, but the bargraph program was
roughly 76K larger when linked with the GraphX library. Programs built with
GraphX will require more heap memory during run-time, due to the extra
overhead of maintaining lists of polygon vertices and other objects drawn by
the application program. The GraphX bargraph version required effectively
twice as much MS-DOS memory as the Halo '88 version (210K). For larger
applications, the difference may be less significant.
The most apparent difference was the speed at which the bargraph program
executed (see Figure 4 for summary). On a standard 12MHz 80286 computer with a
640x480x 16 VGA display, the GraphX version required almost one minute to
paint the screen. By contrast, the Halo '88 version of the application
finished in little more than two seconds. The addition of an 80287 math
coprocessor allowed the GraphX application to finish in one third of the
original time or about 18 seconds -- still an order of magnitude slower. The
Halo '88 version showed no measurable change when using the math coprocessor.
The time required by GraphX is proportional to the number of vectors drawn, so
cross-hatched rather than fully-filled bars goes proportionally faster.
I had mixed results using the hardcopy output drivers for GraphX. According to
the tutorial, the HP LaserJet II driver is "not fully implemented". (The press
release and reference manual indicated otherwise.) I was unable to produce a
complete printout with either the portrait or the landscape mode drivers. The
best I could achieve with either was to print about 60 percent of the page. I
tested the HP LaserJet driver on an HP LaserJet II and an HP LaserJet 2000,
both with more than 2Mb of RAM, and got identical results. Inspection of the
print files indicated they were indeed smaller than the expected 900K (2325
dots x 3100 dots / 8 dots/byte = 900K).
Next, I tried using the Postscript drivers which were supposed to be fully
implemented. When using the Postscript portrait-mode (PSP) driver, I received
the following error:
GraphX ERROR: gxalloc, called from fnn in fill:
>64K requested.
after which the application exited. I had complete success only with the
Postscript landscape-mode (PSL) driver. This driver produced nearly 400K of
Postscript commands which consisted of almost exclusively of "moveto" and
"lineto" commands. Figure 1 is the Postscript landscape mode printout.
After the failures using Postscript portrait-mode and all of the HP LaserJet
drivers, I began to suspect that perhaps the drawing was too complex for the
drivers. (Civilized Software confirmed that the difficulty is that the vector
fill operation requires a temporary array large enough to hold all the
fill-line end points. When solid-filling, this is proportional to the vertical
revolution of the output device, and for PSP this array is too large.) I
changed the solid vector-fill to a crosshatch pattern requiring considerably
fewer vectors. The result was that all drivers finished without any fatal
gxalloc errors. Thus, the main problem with the printer drivers is that they
cannot handle drawings with as many vectors as GraphX can plot. A summary of
my results with the GraphX printer drivers appears in Figure 6. A telephone
call to Civilized Software later confirmed that work on these bugs is in
progress.


Summary


GraphX is strongest in its support for 3-D drawing and viewing operations.
These operations are both powerful and easy to use. However, a succesful
graphics package must offer more: fast performance, a complete and reliable
set of device drivers, and a full complement of raster and vector functions.
Until these issues are addressed, I can only recommend GraphX to developers
who have an overriding interest in three-dimensional graphics. When the other
problems are resolved, GraphX will be a formidable contender in the graphics
support arena.


Bibliography


Volkman, Victor R. "The Halo Graphics Library". The C Users Journal, March
1990, Vol. 8, No. 3, pp. 115-124.
This is a script based program for producing presentation quality bar charts
with Halo '88.
Foley, J.D, and Van Dam, A. Fundamentals of Interactive Computer Graphics.
A thorough survey of the issues and implementations of computer graphics for
an intermediate-level programmer with no prior experience. Contains a good
introduction to the mathematics of planar geometric projections as well as
advanced topics such as shading models, hidden-edge removal, and color
chromatics.
GraphX
Civilized Software
7735 Old Georgetown Road
Office 410
Bethesda, MD 27814
(301) 652-4714
Figure 1 Bargraph Application as Rendered by PSL Driver
Figure 2 Drawing Text at an Angle with GraphX
void angle_text(title, use_angle, x_val, y_val);
char *title;
double use_angle, x_val, y_val;
{
 double rot[16], tr[16], v[4];

 v[0] = 0.0; /* x-axis */
 v[1] = 0.0; /* y-axis */
 v[2] = 1.0; /* z-axis */
 /* Create rotation matrix */
 DROTATE(rot, v, use_angle);
 DAPPLY(rot);

 /* Create translation matrix */
 DTRANSLATE(tr, x_val, y_val, 0.0);
 DRCOMPOSE(tr);
 DMOVE(0.0, 0.0);
 DTEXT(title, -1, 0.0);
}
Figure 3 Drawing a Filled Rectangle with GraphX
#DEFINE SOLID 0.0

/* Draw a bar from (x1,y1) to (x2,y2) */
void draw_bar(x1,y1,x2,y2)
double x1,y1,x2,y2;
{
 POLYMATRIX *pm;
 POLYLIST *pl;
 short *ea;

 /* Open polygon */
 DPOLY(0,0,0,0,0);
 /* Insert vertices 1 thru 4 */
 DVERTEX(x1, y1, 0);
 DVERTEX(x1, y2, 0);
 DVERTEX(x2, y2, 0);
 DVERTEX(x2, y1, 0);
 /* Close current polygon */
 pl = DENDPOLY(0);
 /* Fetch matrix of polygon vertices */
 pm = DMATRIXP(pl, &ea);
 /* Draw the polygon */
 DDRAW(pl,1,0);
 /* Fill the polygon */
 DFILLP(pm, SOLID, 0, 1);
 /* Recover allocate matrix & edge list */
 DKILLM(pm, ea);
}
Figure 4 Bargraph Benchmark: GraphX vs. Halo '88
BARGRAH application HALO '88 GRAPHX
---------------------------------------------------
ObjectCode Size 7K 7K
Executable Size 106K 182K
DOS Memory Needed 110K 210K *
---------------------------------------------------
Run BARGRAPH benchmark 2.3 seconds 56.3 seconds
(640 x 480 x 16 VGA
 12Mhz 80286 computer
---------------------------------------------------
Same as aboe, but with 2.3 seconds 18.0 seconds
803287 match coprocessor
---------------------------------------------------
* GRAPHX programs require additional memory in proportion the number
 polygons displayed.
Figure 5 Summary of GraphX Hardcopy Results
Device Device Description Test Results
Name
-------------------------------------------------------------
lj2p HP Laser Jet II Displayed internal error and
 2400 x 3180 portrait returned to DOS
lj21 HP LaserJet II Truncated output file after
 3180 x 2400 landscape writing 472K(about 60%)

LJ2P HP LaserJet II Not regocnized as valid
 2325 x 3100 portrait
LJ2L HP LaserJet II Truncated output file after
 3100 x 2325 landscape writing 495K (about 60%)
PSP Postscript (TM) Displayed internal error and
 2380 x 3232 portrait returned to DOS
PSL Postscript (TM) Produced perfect results
 3232 x 2380 landscape
Figure 6 PostScript Landscape Mode Font Listing






















































C Programming In A UNIX Environment


Phyllis Nelson


Phyllis Nelson is currently finishing her PhD in solid-state and quantum
electronics at UCLA. An important part of her research has been the design and
construction of computer-controlled experiments. She also works part time for
TRW, where she is presently characterizing infrared detectors made from
high-termperature superconducting materials. She may be contacted at UCLA
Department of Electrical Engineering, 7619 Boelter Hall, Los Angeles, CA
90024-1596.


C Programming In A UNIX Environment is a text intended to teach those with a
solid programming background how to write UNIX-based C programs.
Both UNIX and C were written to provide tools for programmers. UNIX provides a
plentiful set of elementary but adaptable tools designed to be linked
together, and C is an operator-rich language with which to write new tools as
they are needed.
This operating system and compiler are unusually free of restrictions, giving
the programmer exceptional flexibility. The disadvantage is that actions which
are considered errors in almost every high-level language are accepted by a C
compiler. For example, C will allow you to write to arr[25] even though you
declared the array arr to have only 20 elements. This operation is almost
invariably the result of an error in generating the array index, and the
normal consequence is that you unintentionally modify the value of another
variable. Most compilers will reject such suspicious code, while C will not
even give a warning.
Two independent tools are available for identifying questionable code. One is
the assert() function included in the standard C libraries. In the above
example, assert(i >= 0 && i < 20) checks that the array index i is within the
intended limits before writing to memory. The more usual approach is to use
lint, a sophisticated C syntax analysis tool, to screen the code. Thus,
exhaustive error checking is available, but only if the programmer explicitly
requests it.
C programmers have evolved a number of idioms and conventions, collectively
called "good programming style". Although good habits are important in any
programming environment, in C they defend against some especially insidious
and troublesome bugs. The price of C's flexibility is the additional
complexity of using lint as well as the compiler itself.
Introducing programmers to this freewheeling environment is a challenge. An
unusually large number of commands and operators, together with a truly
awesome assortment of nasty bugs, await the unwary. First-time computer users
are often bewildered by this diversity, while those making the transition from
more structured environments typically have several exasperating bug quests
before they learn practical C self-defense techniques.
C Programming in a UNIX Environment is a useful addition to the literature
because the authors stress the elements of good C programming style. They
point out how the tools paradigm influences the programming process, and are
meticulous in following a consistent, comprehensive approach to the inclusion
of error traps.
This book gets right down to business. By the end of the first chapter, the
authors have presented a multi-function program, discussed how to compile and
run a single-file program and introduced lint. I like the way the authors
point out that, in keeping with the UNIX philosophy, lint is separate from the
C compiler because it does a different job. Kay and Kummerfeld are unusually
scrupulous in presenting "lint-free" code. Even though I have written in C for
a number of years, I picked up several useful new techniques from their
examples.
The first half of the book is a brief introduction to C. The technique used to
present new ideas, teaching by example, is particularly well-suited to
experienced programmers. Each new concept is introduced in a code fragment,
prefaced by a minimal description of the context. The code is then dissected
in detail. Often I skim code fragments in texts because they are so thoroughly
described in the accompanying prose. In this book, I felt challenged to see if
I could not only understand the fragments before they were analyzed, but also
anticipate the points being illustrated.
The next quarter of the book discusses C libraries. Even though I have seen
most of the material before, I felt overwhelmed by the rapid pace of the
presentation. I/O, file operations, storage allocation and string handling are
all covered briefly but carefully in the first two thirds of the chapter.
The rest of the chapter is an introduction to the system interface functions.
This last section is much too brief for the importance and complexity of the
material, and is the one part of the book which I would not recommend for
self-study. The authors clearly intend to give the reader a feeling for the
breadth and power of the system calls. They suggest that, since some knowledge
of UNIX is required, the section should be skimmed on a first reading.
Unfortunately, the overview is so densely written that it is easy to confuse
closely-related functions unless one reads very slowly. A few more pages of
description and examples would help considerably to break up what frequently
is little more than a list of function declarations.
The one redeeming feature of this discussion is its references to the UNIX
Programmer's Manual. The whole chapter on C libraries is best viewed as a list
of functions to look up in the manual. Reading the text and the manual
together would give an excellent introduction to the available functions.
The final chapter is devoted to program development, and I thought it was the
best part of the book. The authors take as their example constructing a simple
mailing list system. They first choose a format for the data file, and then
proceed to write some filters which can be pipelined with standard UNIX tools
to select and sort groups of records, print mailing labels and print form
letters. As the example unfolds, the original choices of file format and data
representation are revisited and the consequences of the chosen form are
explored.
The chapter presents a simple but unusually realistic example of the process
by which applications programs are developed in the UNIX environment by
combining new and existing tools. Although the mailing list system is
rudimentary, it is representative of the early stages of a typical project.
The real strength of this example is how well it illustrates the productivity
of the C-UNIX relationship.
Several appendices give summaries of C syntax, operators and storage classes.
The answers to the exercises are provided, which is especially valuable to
those who use the book for self-study.
This book could be used as the text for a course in either academic or
industrial settings. In addition, with the exception of the coverage of the
system interface library functions, it would be a good choice for self-study
by an experienced programmer. The authors have taken special care to ease the
transition from Pascal to C by pointing out the differences in both philosophy
and syntax. This transition is increasingly likely in the academic setting,
where Pascal is frequently taught as the first programming language.
Two themes unify the exposition: the importance of disciplining oneself to
follow good programming style and the benefits of integrating C and UNIX. It
should, however, be supplemented with both an introductory UNIX text and the
UNIX Programmer's Manual to take full advantage of the close relationship of C
to its original environment. As with any programming text, the reader should
also have access to a UNIX machine and a C compiler.
C Programming
In A UNIX Environment
Judy Kay
and Bob Kummerfeld
Addison-Wesley (1989)
softcover, 340 pages.





























C Programmer's Toolkit


Harold C. Ogg


Harold Ogg is Automation Librarian for Chicago State University. He holds an
M.S. in Library Science and an M.Ed. in Information Systems and has been
involved in library automation for a number of years. He is currently a member
of the adjunct faculties of Indiana University Northwest and Triton College,
where he teaches classes in computer science. He can be contacted at (312)
995-2541 or 995-2251 (office), or at (708) 628-8610 (home).


C Programmer's Toolkit is a handy potpourri of callable routines and functions
not part of the ANSI C standard library. The Toolkit includes a 360K floppy
disk of source code for all the subroutines in the book, including the
corresponding header files. The target machine/operating system is the PC and
compatibles running MS-DOS or PC-DOS, but the material is easily adapted to
UNIX environments or other micro/minicomputers and their operating systems.
C Programmer's Toolkit features routines to invoke screen manipulations,
handle strings, extend the capabilities of basic I/O operations, and perform
some of the more complex mathematical, searching, and parsing algorithms. Its
contents are reminiscent of the specialized libraries found in such packages
as Roundhill's PANEL Plus II and the Greenleaf C Functions. What is lacking is
the detail and depth that might be found in a collection such as the Wiley
Scientific Subroutine package. However, the $200-$400 price tag is also
missing; at $39.95, the C Programmer's Toolkit is a virtual bargain.
The Toolkit might best be labeled a sampler of the aforementioned high-end
packages. This is not to say that it copies or mimics any of them. On the
contrary, the C Programmer's Toolkit can be a valuable aid in allowing the
aspiring developmental programmer to make intelligent choices about whose
libraries to affix his/her loyalty. The chapters lend enough explanation to
the various genres of C libraries to allow programmers to identify those areas
of supplemental routines necessary to realize success in their software coding
projects.
In addition to being a consumer's guide, the C Programmer's Toolkit is an
excellent teaching tool for programmers who have advanced beyond the string
print and file I/O stages. Each chapter contains ample material on which to
base a study on a number of specific C topics. Purdum has deliberately avoided
making the code machine specific in all but a very few instances. Targeted for
Borland's Turbo C, Ecosoft's Eco-C88 and Microsoft's QuickC, the subroutines
should be adaptable to most other C compilers as well. I ported a number of
routines to Lattice C v3.4 and Microsoft C v6.0. With few exceptions, I had
little difficulty in producing executable modules.
The C Programmer's Toolkit is at the same time an experimenter's treasure
trove. While it is possible that some chapters have a minimally adequate
number of routines with which to develop distributable end-user products,
there is certainly an ample supply of functions within topical groups to
generate demonstration programs. The book contains sufficient material to
create a reasonably sophisticated class project and even offer stand-alone
attract mode presentations.
The author expresses the intent to "extend the minimal ANSI set," a task he
accomplishes quite well. In fact, his libraries are not seriously duplicative
of tasks addressed in the extended function sets of any of the more popular C
compilers. Many of the subroutines are not unique in performance, however, but
their advantage lies in the availability in the book of their underlying
source code. At least with the C Programmer's Toolkit, the aspiring C
programmer can dissect the steps and logic of routines of particular interest
to him or her, without paying considerable additional money for the privilege
of examining library source.
There is sufficient depth in some of the chapters that a neophyte programmer
could reasonably create a major, complex program. The chapter on screen
functions contains an adequate number of cursor placement routines and color
manipulators to generate a full screen text editor. The video mode is set with
setvmode(), and the cursor is addressed with find_cursor(), get_cursor(), and
set_cursor(). Text can be output in color with colorchr() or colorstr(), if
desired, and borders/backgrounds can be determined with border() for frame
color and cls_c() to clear the screen to a specific color. Video paging is
possible with getvpage() and setvpage(), if needed. It is up to the individual
programmer to create his or her own algorithms for such things as word
insertion, line counting, and block moves.
The author offers a number of utility functions for tweaking strings. He
includes some not-so-unusual character manipulation functions such as
strdelcn() to delete n occurrences of a character and strdigit() to display a
string as digits. But there are also some less typical tools such as the
strdollar() routine to pass a double as a dollar string (e.g., to convert
12345.67 to $12,345.67), and strsub(), a masking routine to replace one string
pattern with another. While these functions are rather self-evident to
experienced programmers, showing some sample input and ouput would have
clarified the discussion on string utilities (as it would have with
subroutines in other parts of the book).
C Programmer's Toolkit exhibits its "fun" side where it presents mouse
functions and menuing systems. There are a dozen mouse-based subroutines,
enough for the curious to determine "what makes it tick" and apply new
knowledge to the handling of mouse events in more complex routines. The
necessary functions are all present -- mouse_press() to ascertain whether a
mouse button has been pressed; mouse_counters() to test for mouse movement;
and set_mouse() to set the mouse position. The toolkit doesn't present enough
information to construct a full-fledged drawing package, but by the end of the
chapter, an experimenter will have gained much insight on the point-and-click
and screen addressing aspects of mouse-based digitizer technology.
What is conspicuously absent from the list of topics is much reference to
graphics or windowing routines. However, the chapter on menuing systems more
than compensates for this lack. The chapter describes sidebar and box menus
and discusses data structures relevant to menuing. This is not so much a
chapter of functions as one of code examples, with emphasis on highlighting
(reverse video) bars and positioning selection lists.
The book is presented in a clear, readable format with little clutter from
page to page. On the negative side, the text could be upheld with more
illustrations. This omission was acutely conspicuous in the screens, mouse and
menuing discussions.
The "Time and Dates" chapter is the one section of the book where most of the
material is ordinary. maketime() and timeset() get and set system time, and
validdate() checks for proper date input format. Easter() computes the value
of the obvious, and leapyear() also checks for just what you'd imagine. The
"Working with Numbers" chapter doesn't get too involved, emphasizing basic
statistics, such as Chi_square() and regression(). There are some elementary
matrix functions such as put_mat() to change a value in a matrix and
check_symmetry() to check a matrix for symmetry. The remainder of the
mathematics chapter is devoted to financial utilities and emphasizes the
computation of compound interest.
The "Disk and Directory Control Functions" chapter becomes particularly MS-DOS
specific. Certain utilities here could possibly have been more efficiently
implemented through BIOS calls, but the value in this chapter lies in the
principles the functions teach. There are functions to test for file size and
disk capacity, and killfile() for file deletion. Setting file date and time,
sfiledate() and sfiletime(), and some related time/date manipulators could
possibly have been more appropriately placed in the "Time and Dates" chapter,
but this arrangement does not detract from their usefulness.
For any of its shortcomings, C Programmer's Toolkit is a no-nonsense,
nitty-gritty kind of book. Explanations of functions accompany the code
subroutines usually on the same page, and there are not paragraphs upon
paragraphs of text to wade through before putting "hands on" to the functions.
The author makes no pretense of offering an "all things to all programmers"
book, and what is left is his collection of favorite tools, things which may
have helped him understand the intricacies of the C language. And, he makes it
obvious that he'd like to share the benefits of this collection with you.
While not a powerful reference work, Toolkit carries a good index and arranges
the functions in a logical order. Locating cross-referenced information is a
simple matter , and relating functions from one part of the book to another is
easy. There is no author's "pet project" that demands one-fourth to one-third
of the book as an appendix. In other words, you don't buy C Programmer's
Toolkit and spend money on hard copy of a special program you don't need.
Purdum wants you to learn by example and use his utilities to write your own
code. The resulting appendix -- if there is to be one -- is up to your own
creativity.
C Programmer's Toolkit is a good addition to any programmer's library for its
alternate point-of-view on implementing standard programming routines. I
recommend this book as a purchase for intermediate level programmers who wish
to polish their skills and especially for educators who need fresh outlooks
for teaching programming strategies.
C Programmer's Toolkit
Jack Purdum
Que Corp., 1990
$39.95, 350 pages
ISBN 0-88022-457-6































Publisher's Forum
If you are a subscriber, you may have noticed that this issue of The C Users
Journal arrived in a brown paper envelope, rather than our usual white
envelope. According to our paper vendor, the brown paper stock is a little
stronger than the white.
As a former printer, I prefer the white envelope; it prints better and looks
cleaner. Unfortunately it doesn't hold up well enough in the mail. Every month
we receive several complaints from readers who received a damaged envelope,
sans contents. We are hoping that the stronger brown envelope will reduce the
number of magazines lost in transit.
Of course, when we learn that one of our magazines didn't arrive, we send a
replacement immediately whether the cause be a torn envelope, a changed
address, or a postal error.
I say "of course" because automatically sending the replacement copy just
seems natural to us. Recently, however, we were reminded that what is natural
to us isn't natural to everyone. One of our readers called us to report a
change in address only to find that we had already learned of his new address
via the post office (after we had already mailed a copy to the old address)
and taken it upon ourselves to send a second copy to the new address. The
reader was flabbergasted. Apparently the other magazines to which he
subscribes don't take the initiative in sending replacement copies.
I take pride in relating this story because that's how I define quality:
delivering just a little more than the customer expects. Like any human
organization, we're not perfect we sometimes disappoint customers, and we
sometimes disappoint ourselves. The joy, though, comes from surprising
customers with a little more than they expected. Thus, we work hard to
identify and eliminate the systematic sources of customer disappointment such
as an envelope that has a tendency to lose its contents.
We sure hope you all get your magazine this time.
Sincerely yours,
Robert Ward
Editor/Publisher 



















































New Products


Industry-Related News & Announcements




Program Offers Development Assistance To Independent UNIX Software Vendors


UNIX International, Inc. has announced a program offered by UNIX International
(UI) and its members to provide assistance to independent software vendors
(ISVs) inamigrating and marketng applications written for personal computers
and proprietary systems to UNIX System V, v4.
Among the resources provided under the UI program by the organization and its
member companies are:
porting centers in 18 countries to help ISVs move applications to System V,
v4;
a software support network of over 10,000 engineers to assist ISVs and end
users in migrating and supporting their software;
discounted and "loaner" equipment;
a program of commercial and technical conferences;
a reference handbook to available resources;
detailed guides to assist ISVs in porting their applications to UNIX System V,
v4 from other operating systems and from System V derivatives;
technical training courses available at discount rates from member companies.
Further information is available to ISVs through UI World Headquarters,
Waterview Corporate Center, 20 Waterview Blvd., Parsippany, NJ 07054.
(800)848-6495; FAX(201) 263-8401.


Development Environment Designed To Support DSP Programming


BittWare has announced the release of an introductory version of DspHq, an
open architecture DSP software design environment.
DspHq is an operating environment for the PC, designed to develop or study DSP
algorithms. DspHq's open architecture allows users to integrate included
functions, popular function libraries, and their own routines written in C or
Pascal. Algorithms pass and share common data structures and can be calculated
by the PC or downloaded to a signal processor. DspHq includes a menu
interface, command interpreter, batch command processor, file and memory
management, and screen graphics with hardcopy support for most printers.
The batch command language allows users to create and simulate an entire DSP
system, with user input, breakpoints and single-stepping. Source code examples
are provided for "MathPak87" and Numerical Recipes in "C" analysis routines.
Interfaces are available for most signal processing boards based on the AT&T
DSP32/DSP32C devices.
The introductory version is available for $250. Full release, with added
interfaces, is expected third quarter 1990 at a price of $495.
For more information and a demo disk, contact BittWare Research Systems, 400
East Pratt St., 8th Floor, Baltimore, MD 21202. (800) 848-0436; FAX (301)
879-4465.


Windows Compatible Versions Announced By Coromandel


Coromandel Industries has announced that C-Trieve/Windows, a transaction-based
ISAM file manager, is compatible with MS Windows v2.11 and v3.0.
C-Trieve/Windows reduces coding and debugging requirements for
transaction-based applications. It allows developers to move applications
across MS-DOS, Windows, UNIX, XENIX, or SunOS environments.
C-Trieve/Windows saves memory by running as a separate application under
Windows. Other features include: unlimited number of indexes for a single ISAM
file; automatic updating of indexes; file and record locking; transaction
support with commit and rollback; reduced heap space requirements; fast
queries on multiple fields; and choice of sequential, random, and dynamic
access modes.
C-Trieve/Windows is priced at $395.
Coromandel has also announced Integra SQL for MS Windows, an SQL-based
relational databased used to build large database applications for Windows. It
is compatible with MS Windows v2.11 and v3.0.
Integra SQL consists of a pre-compiler, client library, and a local server
running under Windows.
Integra SQL is based on the ANSI SQL standard (ANSI X3.135-1986) and is
available under MS-DOS, UNIX, SunOS, and Windows. It is priced at $495 as a
introductory offer. Price for SQL for Windows is $695.
For more information contact Coromandel Industries, Inc., 108-27 64th Rd.,
Forest Hills, NY 11375. (718) 997-0699; FAX (718) 997-0793.


Lattice Upgrades AmigaX Compiler


Lattice, Inc. has announced the release of v5.0 of its MS-DOS to AmigaDOS C
cross compiler. Version 5.0 is fully compatible with the current release of
Lattice's AmigaDOS C Development System.
The new cross development system includes the complete Amiga DOS C Development
System. It also provides the optimizing compiler, libraries, and utilities for
MS-DOS to a create Amiga programs from an MS-DOS system. All Lattice's
Amiga-specific C keywords, built-in functions, command line options, and other
features are supported on an MS-DOS system.
Registered users of previous versions may upgrade to v5.0 for $350. The cross
development system is priced at $750.
Lattice, Inc. has also announced the transfer of all responsibility for the
development, support, and distribution of the Lattice C Compiler product lines
to SAS Institute Inc., Lattice's parent company.
For more information on either item, contact Lattice, Inc., 2500 South
Highland Ave., Lombard, IL 60148. (708) 916-1600; FAX (708) 916-1190.


Gimpel Shroud 'Encrypts' Source



Gimple Software has annnounced the release of The C Shroud, a source code
obfuscation tool for C, which allows for distribution of C programs in a form
readable by machine but not by people.
Obfuscation is obtained by removing comments, converting identifiers to
meaningless "names," expanding macros, translating control structures to more
primitive forms, name folding, and name separation. Numerous control options
are available.
The C Shroud is available in executable form for MS-DOS or OS/2 and in
shrouded form for others. The introductory price for MS-DOS or OS/2 is $198.
For more information contact Gimpel Software, 3207 Hogarth Lane, Collegeville,
PA 19426. (215) 584-4261; FAX (215) 584-4266.


Pentica Offers Cross-Compilers, Debuggers For Motorola And Zilog Controllers


Pentica Systems, Inc. has announced an integrated C development and debugging
package for the Motorola 68HC11 and 6809, the Zilog Z-80 and Z-180, and the
Hitachi 64180 microcontrollers. The package includes C cross-compilers,
assemblers, and CXDB (a source level debugger) which are fully integrated with
Pentica's MIME-600 in-circuit emulators.
The C cross-compilers produce compact reentrant code that is downloaded to the
MIME-600 emulator for real-time debugging with CXDB. CXDB works with the
MIME-600 and supports unlimited real-time hardware breakpoints -- allowing ROM
code to be debugged, making available the software interrupt vector, and
making possible data access breakpoints, and user access to the emulator's
four comparators.
CXDB enhances standard emulator features such as single stepping and
breakpointing with high level macro and I/O simulation. All C datatypes are
managed at the high level, including scoped variables. CXDB displays source
code, monitor variables, registers, and stack frame history in dedicated
windows. It provides access to all emulator functions.
C cross-compilers are available on Apollo, VAX, Sun, and PC compatibles. CXDB
is available on PC compatibles. The price begins at $1,800 for a PC host.
Tools include: cross compiler, assembler, and utility programs (linker,
librarian, library orderer, hex file generator). A single copy of CXDB, hosted
on the PC begins at $1,500. The software package is available from
Intermetrics for $4,300.
For more information contact Pentica Systems Inc., One Kendall Square, Bldg.
200, Cambridge, MA 02139. (617) 577-1101; FAX (617) 577-1203.


ObjectVision Generates Object Code In C++ Or Pascal


ObjectVision Inc. has begun shipping ObjectVision v1.0, a software tool that
lets users develop object-oriented programs visually. Using ObjectVision a
developer can "draw" a program's objects, interface, and database connections
and either run the application in ObjectVision or automatically convert the
application into commented, ready-to-compile code in either C++ or Turbo
Pascal 5.5.
In v1.0 the developer can create new objects with a mouse and add attributes
and procedures to them graphically. Relationships between objects can be
established by drawing a line between the two objects. Buttons and switches
can be created in ObjectVision's bitmap editor. Version 1.0 also offers ways
to visualize and edit object hierarchies, including a "3-D view."
ObjectVision will convert applications into various OOP languages so
developers can start learning to develop programs without having to commit to
a specific language for application delivery.
ObjectVision generates code for all versions of Zortech C++, Glockenspiel C++,
and Turbo Pascal. The package is priced at $399 and runs on PC compatibles
with 640K of RAM, an enhanced EGA or VGA monitor, and a mouse. An academic
version, priced at $99, supports all features but does not generate code. A
demo disk is priced at $30.
For more information contact ObjectVision Inc., 2124 Kittredge St., Ste. 118,
Berkeley, CA 94704. (415) 540-4889.


db_Vista Supports Version 3 Windows


Raima Corporation has announced that its db_Vista v3.15 database management
system with dynamically linked libraries (DLL) supports MS Windows v3.0.
db_Vista v3.15 allocates far memory. All memory segments locking and unlocking
is on an "as needed" basis, transparent to the user, and in compliance with
Window's requirements. Version 3.15 is written in C and provides a set of
development tools that exploit Windows v3.0 memory management.
Raima has also announced that the DLL version of db_Vista for Windows v3.0
supports Asymetrix's Toolbook v1.0. Prices for the db_Vista database engine
start at $695.
For more information contact Raima Corporation, 3245 146th Place SE, Bellevue,
WA 98007. (206) 747-5570; FAX (206) 747-1991.


Multi-Threading Works On MS-DOS


TOSH Systems, Inc. has released Multi-Threading Program Toolkit, a linkable
library of routines that implement interprogram multitasking with no changes
to MS-DOS or to the compiler being used.
The toolkit enables programmers to create self-contained PC/MS-DOS based
programs that perform different tasks at the same time. Tasks can be suspended
and resumed, put to sleep either for a set time or until an event occurs. A
pre-emptive, time slicing scheduler is accompanied by routines to
create/remove tasks, change priorities, send and receive messages, and acquire
and release system semaphores.
The system supports Turbo C, MS C, and most C compilers. The toolkit is priced
at $119.95.
For more information contact TOSH Systems, Inc., 2627 Taylor St. NE,
Minneapolis, MN 55418-2941. (800) 422-8674 or (612) 788-9433.


Rogue Packages C++ Classes


Rogue Wave has announced a set of general purpose C++ classes to simplify C++
programming.
Collection classes: a library, modeled after the Smalltalk-80 programming
environment which implements Set, Bag, OrderedCollection, SortedCollection,
Dictionary, Stack, Queue. All have similar user interfaces, with full
functionality for inserting, removing, finding, and iterating. These classes
make use of multiple inheritance and virtual functions.
Generic Collection Classes implements singly and doubly-linked lists, stacks,
and queues, using generic.h. Class "BTree-OnDisk": Keyed access of disk
records, using B-Trees. Class "FileManager": Allocates, deallocates, and
coalesces free space within a file. Class "String": Operators and functions
for I/O, concatenation, searching, indexing, and substrings. "Time" and "Date"
handling classes.
The product works with Borland Turbo C++ and includes manuals and example
programs. Source code is included. The classes compile and are portable
between a variety of v2.0 C++ compilers under both MS-DOS and UNIX.
The product is priced at $99 till Aug. 1, then will be $200. For more
information contact Rogue Wave, P.O. Box 2328, Corvallis, OR 97339. (503)
745-5908.


MIPS Lowers System Prices


MIPS Computer System, Inc. has announced price reductions ranging from 24 to
50 percent on memory boards for its RISComputer lines, effective immediately,
and introduced a 1 GB disk drive for $9,500.

For specific pricing changes contact MIPS Computer Systems, Inc., 950 deGuigne
Dr., Sunnyvale, CA 94086. (408) 524-7164.


GSS Offers X-Window Emulators


Graphic Software Systems, Inc. (GSS) has introduced "X-Solutions" to allow
MS-DOS users to access UNIX-based X Window System applications through their
PCs. The X-Solutions series is based on an enhanced version of GSS's PC-Xview
X terminal emulator software for MS-DOS-based PCs.
The X-Solutions group includes:
"X-Start" to turn an MS-DOS-based PC into a low cost X Windows terminal,
priced at $595.
"X-Connect" networking support, priced at $695.
"X-Professional" advanced graphics capabilities, priced at $1,695.
GSS has also announced GSS AT1050 Graphics Accelerator with programmable
frequency control circuitry, a high-resolution display controller to support
MS Windows v3.0. The AT1050 is built around the TI 34010 graphics coprocessor,
augmented with custom-designed hardware to increase the chip's speed.
The AT1050 is designed for IBM PC AT architectures and available to OEMs,
systems suppliers, and VARs at a price of $995.
For more information on either product contact Graphic Software Systems, 9590
SW Gemini Dr., Beaverton, OR 97005. (503) 641-2200; FAX (503) 643-8642.


Library Expands MicroWay Graphics


Libhpgl.lib v2.0 has been released. It is a graphics library to support
MicroWay's 32-bit protected mode 80386 NDP C-386 compiler v2.0.6. Libhpgl.lib
v2.0 adds many new features to libGREX.lib distributed by MicroWay.
Libhpgl.lib v2.0 requires the MicroWay NDP C-386 v2.0.6 compiler for MS-DOS
plus the Phar Lap development tools. It is priced at $200.
For additional information contact Gary R. Olhoeft, P.O. Box 10870 Edgemont,
Golden, CO 80401. (303) 279-6345.


Microsoft Releases Windows Version 3.0


Mircrosoft Corporation has announced the release of Microsoft Windows v3.0, an
operating system designed for one megabyte personal computers. Windows v3.0
uses an interface similar to Microsoft OS/2 Presentation Manager.
The user shell is made up of: The Program Manager, which presents Windows
applications and system functions as colorful icons that users can rearrange
in groups most intuitive to them; The File Manager, which allows users to
manipulate, locate, drop, and drag files to and from any disk drive; and, the
Control Panel, which exploits more advanced color hardware and lets users
design their own "look."
Windows v3.0's new shell integrates into its menus the ability to easily and
consistently connect and disconnect to network servers.
The new v3.0 memory management system allows users to keep multiple large
applications open and accessible. Now Windows applications can exploit up to
16MB of memory. On systems with 386 processors, Windows can exploit the
virtual protected-mode capabilities and provide up to 48 MB of memory.
Windows v3.0 provides standardized interface software to all network-specific
device drivers for a wide range of networks. Network administrators can
establish a single shared Windows directory on a file server instead of having
to maintain a separate copy of Windows for each user configuration. In
addition to Program Manager, File Manager, and Control Panel, LAN users have
access to network printer information through Print Manager.
A related announcement by Microsoft revealed that they will release the
Microsoft Windows Software Development Kit (SDK) for Windows v3.0 in early
summer. An improved graphical user interface and the ability to directly
address up to 16 MB of RAM on machines with an 80286 processor, plus virtual
memory on 80386-based systems are key features of SDK.
Microsoft has also announced another early summer release, the Microsoft
Windows to OS/2 Software Migration Kit to port Windows applications to
Microsoft OS/2 Presentation Manager. The new kit provides a layer of code that
allows converted Windows applications to emulate OS/2 applications and allows
applications to use advanced OS/2 features.
For more information or prices on any of these products contact Microsoft
Corporation, One Microsoft Way, Redmond, WA 98052-6399. (206) 882-8080; FAX
(206) 883-8101.


Tool Measures Code Conformance


Abraxas Software, Inc. has released CodeCheck v2.0, a tool used to detect
maintainence or portability problems within code, and to provide project
management with customized quantitative indicators of code size, complexity,
and density.
CodeCheck can be used to analyze C source code. Standards and measures can be
specified for features of C that have an impact on portability,
maintainability, and style. A custom CodeCheck program specifying standards
and measures is written in a resticted subset of C.
CodeCheck v2.0 is priced for MS-DOS and Macintosh at $495, for OS/2 at $695,
and for UNIX at $995. For more information contact Abraxas Software, Inc.,
7033 SW Macadam Ave., Portland, OR 97219. (503) 244-5253; FAX (503) 244-8375.


Empathy Releases Classix Library


Empathy Incorporated has announced Classix, a library of reusable C++ software
components. Classix incorporates object-oriented design techniques and
heuristics to provide reusable C++ classes covering abstractions in the areas
of: data structure design, mathematical support objects, Small-Talk-like
classes, and mimics of the primitive data types. Classix also includes a
parameterization utility which enables the user to reuse Classix classes with
different data types.
Although Classix is a set of foundation classes, it also contains inheritance
and containment. Classix has over 35 classes and 800 operations.
Classix is available in source code for use with Glockenspiel C++, Sun C++,
GNU C++, Zortech C++, and MPW C++. Prices start at $295.
For more information contact Empathy Inc., P.O. Box 632, Cambridge, MA 02142.
(617) 787-3089.







































































New Releases


CUG320 Convolution Image Process


Contributed by Wesley G. Faler (MI), this volume contains a program that
implements an image manipulation algorithm called "convolution". This program
was used to generate the image on this issue's cover, and is the subject of an
article starting on page 95 in this issue. The program takes an image file
(CUT format) as input, applies the convolution algorithm to the image and
generates a new image. The program was developed under MS-DOS using Turbo C
v2.0 and its BGI features. The disk includes C source code, documentation, and
sample scanned image files, such as a Klingon battle cruiser.


CUG321 Mouse Trap Library


Written by James M. Curran (NJ), submitted by Michael Yokoyama (HI), this
shareware package contains a collection of functions to control a mouse. These
functions provide easy access to the low-level functions of the mouse
interrupt, as well as a simplified system for defining buttons or hot spots on
the screen. The disk includes small and large model libraries for Microsoft C
v5.1, a sample test program, and documentation that describes each mouse
function. The source code can be obtained directly from the author (24
Greendale Rd., Cedar Grove, NJ 07009-1313).


CUG322 Doctor's Tools


This volume contains four programs, Trace by William M. Rogers, RAM Test by
Dean Lance Smith, Mkptypes by Eric R. Smith, and Malloc Leak Trace by Michael
Schwartz. The disk includes all the C source code and documentation for each
program.
Written by William M. Rogers (NJ), Trace is a collection of debugging macros.
Using ANSI C features such as_FILE_and_LINE_, these macros provide enough
information to trace the execution of a program. A sample test program is also
included.
Written by Dean Lance Smith and Mohammad Khurrum, RAM Test is an
implementation of the ATS (Algorithmic Testing Sequence) algorithm developed
by Knaizuk and Hartman and the ATS+ algorithm developed by Nair. The program
tests RAM for any single or multiple stuck-at-0 or stuck-at-1 faults. The
inputs to the program are the starting and ending addresses of any part of
RAM. The output is the address of the location where the fault occurs and the
type of the fault. Smith and Chaiyos Ruengsakulrach have also provided RAM
Test programs that use MATS and MATS+ alogorithms. The programs can be
compiled under MS-DOS using Turbo C.
Written by Eric R. Smith (Canada), submitted by Thomas R. Clune (MA), Mkptypes
is an ANSI prototype generator that takes as input one or more C source code
files, and produces as output a list of function prototypes for the external
functions defined in the input source files. The output prototypes are
suitable for #include'ing in a C source file. The function definitions in the
original source may be either "old-style" (in which case appropriate
prototypes are generated for the functions) or "new-style" (in which case the
definition includes a prototype already). The program is written in Standard
C.
Written by Michael Schwartz (WA), submitted by Henri de Feraudi (France), the
Malloc Leak Trace Package is designed to help trace dynamic memory allocation
leaks. The package provides the standard malloc/free/realloc interface, but
keeps track of all malloc'd buffers, including their size, order of call, and
address. Thus, you can see what malloc'd buffers haven't yet been freed at any
point during program execution. This package is particularly useful with
programs that perform many allocations before reaching some steady state
because the package allows you to ignore the initial allocations and
concentrate on the steady-state behavior.


CUG323 Fireworks And Adventure


This volume contains two programs contributed by Dennis Lo and David Lo
(Canada).
Dennis Lo has written a graphics program, Explod, which generates an animated
fireworks display. Explod works with Hercules, VGA, EGA, or CGA graphics
cards. By specifying options on the command line, you can control some
performance parameters such as video type, the number of simultaneous
explosions on the screen, delay factor, the number of explosions to display
before exiting, gravity, and wind. Lo has also provided a utility to create
explosion data files to drive Explod. The disk includes a complete set of C
source code and assembly files, sample explosion data files, executable code,
and documentation. Explod compiles with Turbo C vl.5 or later and requires
MASM v5.0. However, the program can be compiled with other compilers by
changing the segment and group names.
David Lo has written an adventure game called "Beyond The Tesseract". This
adventure game recognizes two-word "verb-noun" commands for moving, taking
inventory, manipulating objects, and saving the game. The program recognizes
about 200 words. This adventure game is abstract and a bit on the technical
side. Basic knowledge of the names of interesting mathmatical objects would be
helpful in solving the puzzles. There is no carry limit and no death traps.
The disk includes C source code and documentation. The program can be compiled
using Turbo C vl.5 or later.






























We Have Mail
Dear Sir:
As a new subscriber I became aware of your upcoming special issue on
connumications too late to meet the deadline for submittal. I see that one
suggested topic is CRC code calculation. There is a problem with the CRC-16
code specified by X.25 that might cause programmers some grief and that might
therefore warrant a paragraph or so in the special issue.
Section 2.2.7 of CCITT Recommendation X.25 (Red Book) specifies the use of a
CRC-16 frame checking sequence (FCS). The sequence is specified exactly in 1)
and 2) of Section 2.2.7 and examples are then given for typical
implementations at the transmitter and at the receiver.
The typical receiver implementation in X.25 is missing an initial
multiplication by x^ 16. If the receiver is constructed to exactly duplicate
the sequence of steps specified in the typical implementation, all incoming
frames will be taken to be errored.
The correct procedure, including the aforementioned multiplication, is
specified in Section 2.7 of CCITT Recommendation Q.921 which deals with the
integrated services digital network (ISDN). Although the descriptions are
worded somethat differently, X.25 and Q.921 specify identical FCS algorithms.
Careful reading of the wording in the two documents is required to reconcile
the differences in grammatical structure which, at first glance, make it look
as if the mathematical procedures differ.
This information is based on one early copy of X.25 and I have not had the
opportunity to check for corrections in later editions. CCITT procedures
generally, however, prevent changes to published recommendations between
revision cycles.
Thank you for a fine magazine.
Sincerely,
Maynard A. Wright
6930 Enright Drive
Citrus Heights, CA 95621
I haven't verified this problem, but it sounds like the kind of thing that
gives programmers serious headaches. Thanks for the warning. --rlw
Dear CUJ:
Your Publisher's Forum columns in the April and May issues are identical. Is
this deliberate? I ask only from curiosity.
The C Users Journal is a magnificant source of useful information and, like
fine wine and cheese, seems to improve with age. You are all doing a great job
and deserve the kudos which come your way.
Sincerely
Rich Shepard
113434 NE Sandy Blvd., NO. M3
Portland, OR 97238
As one who is increasingly feeling his age, its nice to know age improves
something.
Read on for more about the magically mystifying obfuscated code/Publisher's
Forum performance. --rlw
Dear Editors:
Congratulations to Robert Ward and Don Libes for their magical feat in the
April 1990 issue, the invisible "Obfuscated C Code Contest" results. My
colleagues and I were enthralled for several minutes searching for the
non-existent article. We were truly impressed at this feat of editorial
wizardry, and wish to extend our applause to your staff as the winners in the
best "Composite Effort" category.
In Good Spirit,
Philip Michaels
Software Engineer
Research & Development
What would I do without good humored readers?
As I explained in last month's letters, the cover type and publisher's forum
which appeared in the April issue appeared one issue early by mistake. The
publisher's forum should have appeared in the May issue only, but in some
deadline confusion also wound up in the April issue. (I was working on two
issues at once -- I guess my operating system doesn't support multi-tasking.)
The obfuscated code contest actually appears (as intended) in the May issue.
The (very) bad puns appear in the June publisher's forum. --rlw
Dear C User Journal,
Allow me to praise you on a excellent magazine. I've always found something
interesting and useful in it. The article on executable strings should be
rated "DANGEROUS!" USE AS LAST RESORT ONLY!" There is no need for this with
the compilers and assemblers available today. Some OSs will not allow a
program to execute from the data area which makes the approach questionable
even if you don't consider the maintenance nightmare it will cause. Most
compilers place string literals in the data area. Given a compatible
assembler, compiler, and linker, it's very easy to do it right. I believe it
will be less expensive and easier to understand also.
To the Best Solution,
Frank Veenstra
24797 Metric Drive
Moreno Valley, CA 92387
"Dangerous" is an understatement as far as I'm concerned. Didn't you notice
the skull and cross-bones in the accompanying artwork?
While it's not a technique I endorse, I think understanding how "executable
strings" work gives some insight into how pointers and pointers to functions
work and illustrates how much control (and lattitude for abuse) the C
programmer has. --rlw
Dear Mail,
In the May 1990 issue the article "Executable Strings" by James A. Kuzdrall
stated that executable strings will work for the 80X86 large model only. It is
possible to execute strings in any 80X86 memory model by casting the string to
a far function pointer. See Listing 1.
Yours truly,
Rick Shide
Moore Data Management
S Hwy 100
Minneapolis, MN 55416-1519
Dear Robert,
First I want to congratulate you on this excellent journal. Although I'm a
very recent subscriber, I've enjoyed every minute reading it.
The "hot stuff?." J.A. Kuzdrall presented in his article "Executable Strings"
in the May 1990 issue, drew my attention. I couldn't believe his ending
paragraph, where he says: "This trick won't work with the 8086 family,... ...,
unless the code is compiled using a "large" memory model (four byte
pointers)". Being a Turbo C user, I thought there should be ways to implement
this using a small memory model. The accompanying listing (Listing 2) shows
how. So I say the trick can be performed with the 8086 family (I only can
speak of Turbo C v2 although not as easy as stated in J.A. Kuzdrall article.
By the way, many thanks for a very early delivery to Belgium of your magazine.
Sincerely,
Francis Willems
Pastorijstraat, 102
2152 Wechelderzande.
Belgium
Dear Mr. Ward,
Let me take this opportunity to thank you for your publication. As a
programmer/analyst who has programmed in C for my entire career, I have not
found any other journal or magazine which can compare. I am writing in
response to a letter you published about Microsoft's C v5.1 malloc problem,
Vol.8, Number 2, February, 1990, p.133.

Mr. Jim Schimandle was correct, until he tried to explain what MSC5.1 was
failing to do. If you alter the test program to malloc( ) and then free( ) a
set of buffers of a constant size, but rather than exiting, allow the user to
change the size and then malloc( ) and free( ) again, you will find that the
behavior described in the article happens when the buffer size used for the
initial set of malloc( ) calls is greater than 32,758. This number was found
by trial and error, starting with 32,767. The problem does not happen if your
size for the initial set of malloc>( ) calls is less than 32,758. Your second
set of malloc( ) calls can then use a size greater or smaller than 32,758 with
no major side effects. Further testing has shown that the problem disappears
when you use an initial buffer size of 65,500 or greater. The maximum buffer
size you can malloc( ) is 65,516 which is significantly smaller than the
largest unsigned integer of 65,535. I hope someday to automate a test to find
all the initial buffer sizes which cause the problem.
All low level bit flippers who have adapted to the IBM PC platform know that
MS-DOS works on a 64K segmented architecture (borrowed from the CP/M). We
should be able to allocate memory in blocks of up to 64K (the largest integer
value possible in an unsigned int). Finding the exact cause of this bug is
impossible without accurate source code. I believe the problem may hinge on
what the malloc( ) function does to collapse continguous unused blocks of
storage in order to satisfy a new malloc( ) call. I originally thought that
the problem was due to the size of the free block with MSCv5.1 misinterpreting
the size of large free blocks. But since allocating very large blocks also
appears to work, the problem becomes more complicated.
The bottom line for programmers using MSC5.1 is if possible avoid allocating
blocks of memory larger then 32,758. Programmers should also avoid allocating
memory and freeing it repeatedly (keep the memory allocated and use memset( )
when possible) because the mechanism MSC5.1 uses to collapse free space is a
compromise between speed and fragmentation. The original letter provided an
elegant solution of writing your own separate functions to allocate and free
memory but this has many inherent risks that I would rather not face.
Once again, thank you for such a wonderful magazine.
Sincerely,
David Grey Stahl
1033 S. Queen St.
York, PA 17403
Doesn't 32758 sound suspiciously like 2**16? Sounds to me like someone may
have a short int arithmetic overflow. Thanks for a wonderful letter. --rlw
Dear Mr. Ward,
This is regarding book reviews, in The C Users Journal. Along with the review
you should also print the address of the publisher of that book. This may not
seem important to all those staying in U.S.A. but for all those staying in
other countries where it is very difficult to get good C books. At least these
fellows can then write to the publisher and order the reviewed book.
Why not print the address of all good C source book stores in your next issue?
Hope to hear from you,
Thanking you,
Yours sincerely,
Jaspal Singh
Post Box 4758
Riyadh 11412
Saudi Arabia
Your suggestion is well-taken, we'll try to include this information in the
future. (Be warned, though, it'll take several months before the change works
its way into the magazine.)
As for bookstores, I don't begin to know all the good technical bookstores,
but if our readers will send their nominations, we'll share the list here in
the letters section. --rlw
Dear Editor,
I am seriously looking for software libraries written in C that can do
gridding to densley and irregularly spaced data (e.g. seismic data). I am a
geophysicist and program in Microsoft C v5.0. I would like to generate a
regular grid which could then be accessed through my programs for further
manipulation and interpolation.
I am not interested in fancy displays or pop up menus, just to be able to
generate the gridded data that can be read by other programs.
I have tried Quicksurf, which is a standalone program that is advertised in
the geological magazines, it outputs DXF format for further manipulation with
AutoCAD. I got around the problem by writing a utility to reformat DXF to a
more grid-oriented format. The major hurdle is the copy protection. I want to
be able to link the libraries to my programs and distribute them.
There is bound to be someone out there who has written a gridding routine and
is willing to share (or sell) his/her software library.
Appreciate your assistance.
Sami Kurdy
P.O. Box 211 Sharjah
United Arab Emirates
Tel 9716 543000
FAX 9716 542000
How about it? Has someone addressed this problem? If so, I think it might even
make an interesting story. --rlw
Dear CUJ:
Since mention of "nude centerfold" and "swimsuit issue" both appeared in
letters sections of the June 1990 issue, I'd like to point out that this
magazine is read by people other than heterosexual males such as myself. I'd
prefer that such comments not appear in a technical magazine; however, if they
must, then I propose that Robert Ward be the first to bare all.
An avid reader,
Earl Fong
Island Graphics Corp
4000 Civic Center Drive
San Rafael, CA 94903
I'm willing but I doubt it'll do much for circulation.
I think you've taken umbrage unnecessarily. The comments appeared in a letter
from a reader who was (very cleverly I thought) poking fun at those who expect
CUJ to be all things to all people. I can't even see any explicit sexist
content. He said only nude and centerfold -- there was no mention of the
model's gender in either reference. Perhaps he had in mind a male model?
I'm sorry, but I will continue to print (without censorship) well-written
letters which make a point that I think may be of interest to a substantial
portion of my readership. If that offends you, so be it. --rlw
Dear Editors,
After having read my first issue of The C Users Journal (June, 1990), I am
impressed with the editorial content of the magazine. I was happy to see that
you devote a portion of the magazine to design methodologies. I particularly
enjoyed Joe Celko's column on structure charts and hope that he will continue
prosletyzing this technique in future issues.
I would like to point out that Mr. Celko's solutions for the "Magic grid" puts
the reader on the "doorstep". Starting with Mr. Celko's solution, I found a
grid that results in a quotient of exactly 10:
*837
15**
**40
629*
I found it interesting that the automated solution provides a good starting
point (if not a perfect solution) from which to explore this problem further.
I discovered some interesting properties about the domain of this problem that
allowed me to reject certain solutions very quickly. Perhaps this suggests yet
another methodology -- computer assisted domain analysis?
Sincerely,
Donald Koscheka
168 Intervale Road
Stamford, CT 06905
Dear Sir:
As a brand-new subscriber I have to congratulate you for your beautiful and
informative magazine and thank you for the invitation to participate in the
dialogues that are taking place in The Journal.
I've just received the first issue and read this from beginning to the end,
now for three times. The information you are distributing is highly
interesting and I am sorry I haven't subscribed to your Journal earlier.
The article from Mr. Leor Zolman about the database program got my attention
because of his clear language. I am only very sorry that I could not follow
this from the beginning and wonder if there is some way to get his earlier
articles in my hands. Please inform me about this subject, it will be very
appreciated.

I hope that your Journal will have a lot of subscribers also in other
countries and assure you that I will do my best to achieve this aim here in
Turkey.
Meanwhile I look forward to your answer and information I asked for.
With best regards,
Istek Sitesi,
Block D, Kat 1, Daire 2
Virnasehir, Zip 33100
Mersin, Turkey
Leor's series appeared in four consecutive issues (April through July). At
this writing, we still have all of these issues in inventory. Back issues are
$7.50 each, $11.00 overseas. --rlw

Listing 1
#include <stdio.h>

/* re: "Executable Strings" by James A. Kuzdrall in the May 1990 issue.
* It is possible to execute stings in any 80x86 memory model by
* casting the string to a far function pointer.
*
* Yours Truly,
*
* Rick Shide
* Moore Data Management
* S Hwy 100
* Minneapolis, MN 55416-1519
*/

/* asm equivalent of executable string setting return value to 5
*
* mov ax,5 ; B8 05 00
* ret ; CB
*/

typedef int (far *FPFI)(); // pointer to far function returning int

/* demonstrate executable strings
*/
main()
{
int i;
char *str;

/* immediate form
*/
i = ((FPFI)"\xb8\x05\x00\xcb")();
printf("%d\n", i); // will print "5"

/* string ptr form
*/
str = "\xb8\x34\x12\xcb";
i = ((FPFI)str)();
printf("0X%x\n", i); // will print "0X1234"
}


Listing 2
/* stringexec */ 

#include <dos.h>

char *x_str = "\xB4\x0E\xB0\x41\xB7\x00\xCD\x10\xCB";


/*

Assembly listing of x_str :

MOV AH,0Eh ; put 0Eh in AH register
MOV AL,41h ; put 41h (Hex ASCII value of 'A' in AL register
MOV BH,00h ; put 00h in BH register (current video page)
INT 10h ; call interrupt 10h
RET FAR ; far return to come back in main routine

*/

void main()
{
char far *xx_str /* declaring a far pointer to char */

/* setting the far pointer using MK_FP( )_macro */
xx_str = (char far *) MK_FP(_DS,x_str);

printf(" This is the character : ");

/* calling xx_str as a far function using the cast */
(*((void (far *)())xx_str))();
}







































Implementing The CCITT Cyclical Redundancy Check


Bob Felice


Bob is the chief engineer at Iron Horse Software, a small consulting firm
which specializes in data communications and other forms of real-time software
for embedded systems. He has been coding in C for over four years, using a
number of different compilers.


The CCITT-CRC error detection scheme was first employed (with some minor
modifications) by IBM in its SDLC data link protocol and is used today in
other modern data link protocols such as HDLC, SS7 and ISDN. Like a checksum,
the CCITT-CRC does not impose any additional transmission overhead at the
character level. It can detect errors in any arbitrary number of bits of data,
and its error detection rate is 99.9984%, worse case.
Some rather powerful math stands behind the CCITT-CRC. Fortunately, the reader
doesn't need to understand the math in order to use the algorithm. The basic
idea is to treat the entire message as a (rather long) binary number, which
both the sender and receiver divide using the same divisor. The quotient is
discarded, and the remainder is sent as the CRC. If the message is received
without error, the receiver's calculation will match the sender's calculation,
and the CRC's will agree.
(This is a gross simplification of the process. The CRC is actually the one's
complement of the remainder obtained from the modulo 2 division of the message
by a generation polynomial. The CCITT-CRC uses
x16 + x12 + x5 + 1
for the generator polynomial. The generator is part of the standard
established by the CCITT, an international standards body that publishes
recommendations dealing with telephony and data communications.)
The elegance of this approach is that the division, which looks as though it
ought to be a complicated process, can be implemented in hardware using a
shift register and a few exclusive-OR gates.
The division can also be implemented in software, as the function crc16
(Listing 1) demonstrates. The processor overhead involved in calculating the
checksum is not too bad when you consider the large number of errors that the
algorithm can detect. Two noteworthy items: the CCITT-CRC is calculated on the
data bits in the order that they are sent (least significant bit first); and
the 16-bit CRC is itself sent least significant byte first.
The initial value of the CRC, known as the preset, can be either 0 or 0xFFFF.
Originally, implementers used a preset of zero. This preset, however, exposed
a weakness in the algorithm. A message that started with an arbitrary number
of zeros would have a CRC of zero until a 1 bit was detected. Today, the
predominant preset is OxFFFF, which avoids the leading zero problem.
The function main (Listing 2) illustrates the behavior of the CCITT-CRC on
some test data. The first case calculates the CRC for the message T. The
second case calculates the CRC for the message T and its CRC. Note that this
CRC is a constant, which has been defined in crc_ok. (This is the approach
usually taken when the CCITT-CRC is implemented in hardware.) The third case
illustrates the CRC applied to a longer message. The final case is taken from
Appendix I of CCITT Recommendation X.25 (the 1984 Red book).
A note about portability: I have used the code presented here on four
different compilers (Cross Code C, Ecosoft C, Introl C and Laser C) for three
different machine types (8086, 6809, and 68000). You should be able to use it
without any problems.
The CCITT-CRC has other applications besides the field of data communications.
I used it in an embedded system application to verify the integrity of a block
of data held in non-volatile RAM. You could use it to verify any block of data
on disk or in memory. And of course you can use it in message protocols of
your own devising.
References
1) Computer Networks: Protocols, Standards and Interfaces, Uyless Black,
Prentice Hall. 1987. A comprehensive book on all aspects of data
communications. Chapter 8 deals with the X.25 protocol.
2) "Calculating Checksums with Bits and Bytes," Byte Magazine, September 1986.
This article presents the nuts and bolts of division by using the shift and
exclusive-OR operators. Recommended reading for EEs and math majors.
3) "Generating CRCs with software," Robert Grappel, EDN, October 31, 1984.
This article also gives generation polynomials for 12 and 16 bit CRCs.
Other Error Detection Schemes
Anytime a message is transferred over a physical medium, the possibility
exists that it may be corrupted by noise. Accordingly, since the earliest days
of data communication, various mechanisms have been devised to detect when the
data received was not the same as the data sent.
One of the simplest error detection schemes is parity checking. Each data byte
is sent with an extra bit, which is called the parity bit. The value of the
parity bit depends on the number of 1 bits in the byte, and also on the type
of parity checking used. When Odd Parity is employed, the parity bit is a 1
when the number of 1 bits in the byte is odd. Otherwise, the parity bit is a
0. When Even Parity is used, a parity bit of 1 indicates an even number of 1
bits. Parity checking is easily implemented in hardware and is a feature found
on most data comm chips. Its speed and ease of use make it an attractive and
popular error detection mechanism.
Yet, parity checking is rather inefficient. In asynchronous communications,
each eight-bit data byte is "framed" by a start bit and a stop bit, for a
total of 10 bits. Adding a parity bit to the data byte increases the character
size 10 percent. Furthermore, parity checking can only detect an odd number of
errors per byte. If two bit errors occur in a single byte, they cancel each
other. The parity is unchanged, and the error goes undetected. In an extreme
case, all eight data bits in the byte could be reversed, but the parity check
would not detect the error.
A more efficient method of detecting errors is the checksum. A checksum is
calculated by adding together the values of all of the data bytes in the
message. Checksums can be eight, sixteen, or thiry-two bits wide (overflow
from the addition is ignored). In a typical application, the checksum is
appended to the end of the message. The receiver verifies the message by
re-calculating the checksum on the data and comparing its result to the
checksum that was sent. )
Simple checksums are easy to implement in software and do not bog the
processor down. When checksums are employed, data can be transmitted without
the overhead of parity bits. This is a consideration that becomes more
important as the size of the message increases. However, checksums fall prey
to an entire class of errors that can be termed "transposition errors."
Imagine that a message is sent containing the sequence 0x31 0x33. With just
two bit errors, the sequence could be incorrectly received as 0x33 0x31, and
yet still produce a "correct" checksum.

Listing 1
/**************************************************************************
//
// crc16.c - generate a ccitt 16 bit cyclic redundancy check (crc)
//
// The code in this module generates the crc for a block of data.
//
**************************************************************************/

/*
// 16 12 5
// The CCITT CRC 16 polynomial is X + X + X + 1.
// In binary, this is the bit pattern 1 0001 0000 0010 0001, and in hex it
// is 0x11021.
// A 17 bit register is simulated by testing the MSB before shifting
// the data, which affords us the luxury of specifiy the polynomial as a
// 16 bit value, 0x1021.
// Due to the way in which we process the CRC, the bits of the polynomial
// are stored in reverse order. This makes the polynomial 0x8408.
*/
#define POLY 0x8408

/*
// note: when the crc is included in the message, the valid crc is:
// 0xF0B8, before the compliment and byte swap,
// 0x0F47, after compliment, before the byte swap,

// 0x470F, after the compliment and the byte swap.
*/

extern crc_ok;
int crc_ok = 0x470F;

/**************************************************************************
//
// crc16() - generate a 16 bit crc
//
//
// PURPOSE
// This routine generates the 16 bit remainder of a block of
// data using the ccitt polynomial generator.
//
// CALLING SEQUENCE
// crc = crc16(data, len);
//
// PARAMETERS
// data <-- address of start of data block
// len <-- length of data block
//
// RETURNED VALUE
// crc16 value. data is calcuated using the 16 bit ccitt polynomial.
//
// NOTES
// The CRC is preset to all 1's to detect errors involving a loss
// of leading zero's.
// The CRC (a 16 bit value) is generated in LSB MSB order.
// Two ways to verify the integrity of a received message
// or block of data:
// 1) Calculate the crc on the data, and compare it to the crc
// calculated previously. The location of the saved crc must be
// known.
/ 2) Append the calculated crc to the end of the data. Now calculate
// the crc of the data and its crc. If the new crc equals the
// value in "crc_ok", the data is valid.
//
// PSEUDO CODE:
// initialize crc (-1)
// DO WHILE count NE zero
// DO FOR each bit in the data byte, from LSB to MSB
// IF (LSB of crc) EOR (LSB of data)
// crc := (crc / 2) EOR polynomial
// ELSE
// crc := (crc / 2)
// FI
// OD
// OD
// 1's compliment and swap bytes in crc
// RETURN crc
//
**************************************************************************/
unsigned short crc16(data_p, length)
char *data_p;
unsigned short length;
{
unsigned char i;
unsigned int data;

unsigned int crc;

crc = 0xffff;

if (length == 0)
return (~crc);

do {
for (i = 0 data = (unsigned int)0xff & *data_p++;
i < 8;
i++, data >>= 1) {
if ((crc & 0x0001) ^ (data & 0x0001))
crc = (crc >> 1) ^ POLY;
else
crc >>= 1;
}
} while (--length);

crc = ~crc;

data = crc;
crc = (crc << 8) (data >> 8 & 0xFF);

return (crc);
}


Listing 2
/**************************************************************************
//
// main() - test driver for crc16 function
//
**************************************************************************/

#include <stdio.h>

main(argc, argv)
int argc;
char *argv[];
{
unsigned short crc;
static unsigned char string[40];
string[0] = 'T';
string[1] = (unsigned char)0xd9;
string[2] = (unsigned char)0xe4;
string[3] = NULL;

printf ("The crc of \"T\" is 0xD9E4. crc16 returned 0x%X.\r\n\n",
crc16(string, (short)1))

printf ("The crc of \"T 0xD9 0xE4\" is %x. The value of crc_ok is
0x%X.\r\n\n",
crc16(string, (short)3), crc_ok);

strcpy(string, "THE,QUICK,BROWN,FOX,0123456789");
printf("The crc of \"%s\" is 0x6E20. crc16 returned 0x%X.\r\n\n",
string, crc1 (string, strlen(string)));

string[0] = (unsigned char)0x03;
string[1] = (unsigned char)0x3F;

puts("CCITT Recommendation X.25 (1984) Appendix I example:");
printf("\tThe crc of 0x03 0x3F is 0x5BEC. crc16 returned 0x%X.\r\n\n",
crc16(string, (short)2));

puts("strike RETURN to continue...");
getchar();
}
























































Using An RPC Protocol To Create A Basic File Server


Richard Johnston


Richard Johnston is product development manager for Stony-Brook
Technologies--a network systems integrator and reseller. He has worked in the
computer communications industry for ten years and has a B.S. in computer
science from SUNY Brockport. He can be reached at (516) 567-6060.




The Client-Server Model


Remote Procedure Call (RPC) protocols support network activities by supplying
a mechanism to perform local functions on a remote system. An RPC protocol is
designed to deliver the parameters from a local system function and return the
response back to the originator. The local application is ignorant of whether
the function executed directly on the local system or was shared with the
remote.
The application in this system is referred to as the client. A dedicated
process on the remote system, called the server, performs the function and
returns the results. Thus, RPC is a system built upon the Client-Server Model.
A client-server system is depicted in Figure 1. Multiple processes can be
clients and can make requests for services on the server. Typical
client-server systems can be found in network file systems, distributed
database systems, electric-mail systems, and even simple applications such as
printer spoolers and terminal servers.


The Basic File System


As an example of RPC design, this article will show how to implement an RPC
protocol for remote file access. This basic file system (BFS) protocol was
developed to emulate the network file systems that are available with LANs but
the BFS is specifically designed to operate on a wide area network (WAN). X.25
was selected for the underlying communications medium since it provides most
of the services normally found at the transport level (OSI Layer 4) --
reliable, in-order delivery with buffer segmentation and support for multiple
destinations and sources. Although X.25 was selected, the BFS is not limited
to this communications medium, and with modifications to the communications
interfaces could be ported to other types of networks.
The BFS is designed to allow multiple client systems to communicate with
multiple server systems. A single file server system network would appear as
if it was a star topology -- the file server in the middle with client systems
at the end-points (see Figure 2). A network with several file servers and
client systems would logically appear as a highly meshed network (all of the
client systems could have a connection to every file server). The use of X.25
provides a much simpler network topology as shown in Figure 3.


Client System Components


The BFS client system is comprised of two components: the application
interface (APL) and the client monitor (CMON). The APL is linked to
user-written programs that will use remote files. The CMON operates as an
independent process and is responsible for the associated intra-system routing
and X.25 communications.
The APL implements a group of routines called the r-functions (Listing 1).
These routines parallel the f-functions of the UNIX Streams interface (e.g.
fopen, fclose, etc.). The parameters of the r-functions are the same as the
f-functions except for three differences: the rfopen returns a pointer to an
RFILE structure instead of a pointer to a FILE structure; the other routines
use the RFILE pointer for the rstream parameter; and the rfopen filename
parameter specifies both the name of a file server system and the name of the
file on the file server.
The filename parameter follows the same format as UUCP file references:
<server-name>!<file-name>. The server-name is a logical reference to the
desired file server system and is mapped to an X.25 call address by the CMON.
The CMON performs a routing/multiplexing function between the local
applications and the networked file servers. When a file is requested the CMON
will open an X.25 virtual circuit to the server monitor on the remote file
server. Requests from the applications, for specific file servers, are
multiplexed onto the appropriate virtual circuit. Likewise, responses from the
file servers are routed to the original application. The RPC protocol contains
both a client and server identifier which identifies the current session. The
CMON performs only a minor translation of the routing information within the
RPC protocol header and otherwise does not play an active role in the file
access processing.


Server System Components


The server system has a single component, the server monitor (SMON) which is
responsible for processing requests from the client systems and executing the
r-function operations on the local file system. The SMON decodes the RPC
message, formats the proper f-function parameters, and then calls the local
f-function routine. Any resulting status (or errno value) is returned as the
RPC response message. For read operations, the data will follow the response;
for write operations, the data follows the request message.
The SMON is also responsible for the X.25 communications interface. Unlike the
CMON which "calls" file ser- vers, the SMON "listens" for client systems
attempting to establish an X.25 virtual circuit. The SMON must have a unique
"listen" request pending for each client system for which it is configured to
process requests. Listing 2 and Listing 3 contain sample client and server
configuration files.


The BFS RPC Protocol


The BFS RPC protocol is a custom protocol designed to support the r-functions
and is not a general purpose RPC protocol. The RPC protocol uses a common
message structure for all commands and responses. Listing 4 contains the
declarations of the common message format and the command and response
specific portions. The protocol is a synchronous transaction-oriented protocol
-- a command is sent to the server and the application waits for the response
prior to initiating another transaction.
Several implied operations are embedded within the protocol. The RPC_FOPEN_CMD
message header is transmitted with the Server Session Identifier (SSID) set to
-1, indicating to the server that a server session should be allocated. The
server sends the allocated SSID in the RPC_FOPEN_RSP if the fopen function
succeeded. A failure implies that no session is needed. Another implied
operation occurs on the RPC_FCLOSE_CMD. When a server completes an fclose
operation the session is deallocated unconditionally. The life expectancy for
a client-server session is equal to the time that a remote file is open.
Each RPC command and response is transmitted in its own X.25 packet. Thus the
server or client can wait for any packet and assume it to be an RPC message
with a valid header. Data for read and write operations is transmitted as a
separate packet (or packets depending on the size of the data field). The
client or server will perform an isolated read or write to the virtual circuit
to receive the data.
The application programs, client monitor, and server monitor are tied together
via the BFS's data structures (see Listing 5). Each of the three entities is
'attached' by exchanging process/file ids, RPC session ids, or X.25
circuit/line ids with the next entity in the processing chain.


The Client Application


The client application has two functional components: the user supplied
program and the r-functions application interface (API). The r-functions
application interface is a library of routines which support access to files
on remote file servers. The application program can utilize the r-functions
just as the f-functions fopen, fclose, fread, fwrite and feof would be used.
The r-functions are rfopen, rfclose, rfread, rfwrite, and rfeof. With some
exceptions, the parameters to the r-functions are the same as those for the
f-functions. Specifically, the rfopen filename parameter's format is different
-- it uses the UUCP-like format: <server-name>!<file-name>. The result of an
rfopen call is a pointer to an RFILE structure. This is a control structure
for opened remote files (it should not be directly accessed by the user
program). The RFILE pointer is passed to the other routines as an rstream
pointer (instead of the stream pointer that the f-functions use).


Error Response



While the individual routines operate like their local counterparts, the user
should be aware that the status and errno results do not correspond to local
file system conditions -- they refer to the remote file system conditions.
This is important if an rfopen error indicates that the file does not exist.
This behavior may confuse the first-time user who is attempting to access a
file on the local system -- the file must exist in the proper directory of the
remote file system.
Even though brief, the r-functions supply an integral part of the three layers
of functionality in the bfs_apl code (see Listing 6). The r-functions routines
validate the incoming parameters for the internal routines.
The next (arpc) layer implements the application's RPC session protocol. The
arpc_fopen routine requests that a file be opened on a remote server and the
arpc-fclose routine requests that the file be closed. These two routines
initiate and terminate the RPC session between an application and a file
server. The arpc_read and arpc_write routines are responsible for transferring
data between the application's buffers and the fifos to the client monitor.
The afifo management routines perform the actual transfer of commands,
responses, and user data. For each process these routines open a unique fifo
(named pipe) for reading responses and incoming data flow. The afifo routines
also write outgoing commands and data to the system global fifo for the client
monitor. The afifos are opened and closed based on a link count. The link
count represent the current number of open RFILEs.
A link count of zero indicates that the fifos are not opened and that the
afifo_open routine is executed. Close operations decrement the link count and
close the afifos when the count becomes zero.
A sample application program is shown in Listing 7. The program is simple but
the power to access files from a remote system can be significant when used in
distributed applications. The reader should note that the program will operate
in the same manner as if the calls were all f-functions referencing files on
the local system.


The Client Monitor


A client system can have many applications interacting with many file servers.
The client monitor (CMON) facilitates the orderly routing of data between the
file server systems and the client system. The CMON is launched during system
startup and executes as a background task. During initialization it creates
the global client monitor fifo and opens the X.25 communications ports. The
CMON has three processing layers: the fifo management layer, the session
routing layer, and the X.25 communications interface (see Listing 8).
The CMON's three layers communicate with each other via references to the
internal control block tables: fifo control blocks (c_fcb), session control
blocks (c_scb), and line control blocks (c_lcb). Each table contains indexes
to the next higher or lower level in the processing chain. The c_fcb and c_lcb
contain link counts for determining if a fifo or line is active and will
execute the open and close functions automatically.
Data from applications is received by the fifo management routine
(cfifo_read). The main processing loop assumes that an RPC command will be
received first and locates the session in the c_scb table via the process and
file identifiers (pid and fid) in the RPC header. Note that the client session
identifier (csid) is an index into the c_scb table. The server session
identifier (ssid) is a similar index. A client-server session is identified by
matching csid and ssid pairs.
To route the RPC command/data to the server system, the CMON uses the line
identifier (lid) from the c_scb entry as an index into the c_lcb table. The
c_lcb entry contains the X.25 port number and circuit number (cid) that is
allocated for connecting to the server. If the link count is zero then the
client is not connected to the server and an X.25 Call Request/Confirm
Operation is performed, allocating a cid. The entries in the c_lcb contain the
appropriate X.25 information for making the call to the server.
Responses from the server are processed in reverse. The cid/lid is known by
the incoming X.25 event and the session is located in the c_scb via the csid
contained in the RPC response header. The CMON then uses the csid to index
into the c_scb and extracts the pid/fid representing the local applications
file reference. This pid is used to find the application's c_fcb entry for the
write fifo. The response is adjusted and sent on the correct fifo. Whether the
RPC operation succeeds is an issue only to routing and communications layers -
not to the CMON.
The CMON uses and contains specific support for the Eicon Technologies' X.25
Toolkit and HSI/PC communication controller. For example, the CMON will
allocate an X.25 buffer (on-board memory) and will read or write the data from
the X.25 communications interface directly into the fifo, saving a copy
operation and reducing local system operating requirements.
The CMON operates in a synchronous processing loop (it is single transaction
oriented). The CMON will receive a command from an application and will
transmit the command to the server. It will then wait for a response (this
could take some time), and send the response back to the application. The CMON
will ignore (temporarily) any other pending requests from other applications.
Although the CMON's tables and the RPC protocol structure will support
asynchronous processing (i.e. multiple commands/responses), the synchronous
design of the CMON demonstrates the type of multiplexing that is necessary to
support a client-server system.


The Server System


The server system contains the server monitor (SMON). This background task is
launched at system startup and during initialization opens the X.25
communications port and posts a listen request for each Client System
configured. (A 'listen' request is the X.25 toolkit's counterpart to the call
request function. A listen request completes when a call request with matching
parameters is received.) The SMON has a line control block table (s_lcb) for
keeping track of the client system circuits and a session control block table
(s_scb) for keeping track of the client-server sessions and file system
access.
The SMON doesn't depend on any applications above it, and as a result is not
noticeable in a system. The SMON has three layers of functionality: the X.25
interface, the RPC protocol layer, and the file system interface (see Listing
9 for source code). The SMON continuously waits for either a completion of any
listen request (i.e., new client systems making contact) or received data
(i.e., RPC commands from client systems). The SMON will accept any connection
request from a client system and will maintain the X.25 circuit until it is
cleared by the client. Data received in the main processing loop is
interpreted as an RPC command and is passed to the srpc routines.
The srpc routines are the mates of the arpc routines. These routines process
the decoded RPC header and perform specific RPC functions. The SMON follows
the CMON design where a RPC_FOPEN_CMD initiates a new RPC session and the
RPC_FCLOSE_RSP closes the session. The srpc routines identify the sessions by
extracting the csid and ssid from the RPC header. On RPC_FOPEN_CMD commands
the ssid is -1 and is replaced with the allocated session identifier (e.g. the
s_scb index) within the RPC_FOPEN_RSP response message.


Future Directions


These enhancements are critical if the BFS is to be a file access method:
Adapt the application/client monitor interface to use the standard f-functions
and add the client monitor to the kernal processing.
Increase the error detection and recovery of the overall system --
specifically network failure recovery and internal error recovery.
Routines to parallel other functions such as fprintf, fscanf, fputs, and
fgets.
Enhance the BFS to support both local and wide area networks, possibly
including a simple asynchronous version to connect directly to MS-DOS-based
PCs.


Conclusion


The future of RPC protocols is expanding as the fourth-wave of computing
emerges. Network-based applications or "Groupware" products will enhance the
productivity of the computing community, increase the types of applications
available, change how existing applications operate, and allow users to
economically upgrade computing power where it is needed. The power of RPC
protocols is not limited to software gurus or manufacturers -- it is now being
delivered to the users themselves. Products such as Netwise's RPC Tool will
allow individual programmers to develop custom client-server applications that
execute on UNIX, MS-DOS, and other operating systems.
References
Anderson, Bart & Costales, Bryon & Henderson, Harry. UNIX Communications,
Indiana, Howard W. Sams & Company, 1987.
Fritz, T. E. & Hefnet, J. E. & Raleigh. "A Network of Computers Running the
Unix System," AT&T Bell Laboratories Technical Journal Vol. 63, No. 8, October
1984, reprinted in AT&T Unix System Readings and Applications, Vol. II, New
Jesey, Prentice-Hall, 1987.
Stallings, William. Handbook of Computer Communications Standards, Volume 1,
"The Open Systems Interconnection (OSI) Model and OSI-Related Standards." New
York, Macmillan, 1987.
Stevens, W. Richard. Unix Network Programming, Prentice-Hall Software Series,
New Jersey, Prentice-Hall, 1990.
Tanenbaum, Andrew S. Computer Networks, Second Edition, New Jersey,
Prentice-Hall, 1988.
Network Standards
The need to exchange data and programs between multiple systems became
apparent early in the development of computer systems.
Although everybody in the computing community could agree on the need to
connect computers -- the "How to do it?" was another issue. Computer
manufacturers had developed their own rules for connecting their own systems
together (communication protocols). Some, such as IBM's BSC and SNA, became
defacto standards because of IBM's dominance in the computer market. To
compete, some companies provided emulation products of IBM's equipment, while
other companies developed radically different schemes that either used
modified forms of IBM's protocols or used proprietary solutions. As a result
of this unorganized effort, the owners (or users) of the computer equipment
ended up with multiple computer networks - one for each type of computer or
manufacturer. Even today it is still common to have multiple terminals for a
data-entry operator who must access several computers. This proprietary
network approach transformed the computer communications problem from "How do
I connect computers together?" into "How do I connect my computer networks
together?".
During the 1970s, many research institutions addressed the problem of
interconnecting computers with a manufacturer independent protocol. Networks
that incorporated computers from different manufacturers were built. For
researchers, universities, and the government the Arpanet and Internet
networks were available. For corporations -- Telenet, Tymnet, and other wide
area networks (WAN) came into existence. These networks used either network
manufacturer supplied proprietary protocols or protocols developed by
industrial consortiums. The move from using protocols designed by a
manufacturer to protocols designed by the computing community had begun.
During the 1980s the US, Japanese, and European governments became more
involved with communications standards. National, international and industrial
standards organizations (such as the National Bureau of Standards (NBS),
International Standards Organization (ISO), International Telegraph and
Telephone Consultative Committee (CCITT), Institute of Electrical and
Electronics Engineers (IEEE)) developed system independent approaches to
resolve inter-vender computer communications. These standards specified
protocols for interconnecting computers-to-computers,
computers-to-communications equipment, and communications
equipment-to-communications equipment for local and wide area networks. These
protocols are now implemented on networks that interconnect computers
throughout the world, allowing for system independent exchange of information.


OSI Reference Model



The most significant concept to emerge from the standards process was ISO's
Open System Interconnect (OSI) goals statements. The goals were presented what
is now known as the OSI References Model. This model segments a computer
communications system into seven processing layers. Each layer performs
specific functions as part of the overall task of allowing two application
programs, on different systems located anywhere in the world, to communicate
with each other as if the two programs resided in the same system. (See the
"OSI Reference Model" box.)
In the early part of the 1980s, research and development centered on the lower
protocol layers of the OSI (See the "OSI Reference Model" box). Standards such
as IEEE 802.1-5, CCITT X.25 & X.75 (1980, 1984, and 1988), Arpanet/DDN's
TCP/IP helped link dissimilar computers together.
In the latter part of the decade, the research focus shifted to the upper
layers. The best known products using these protocols are AT&Ts NFS, Sun's
RFS, Novell's Netware (SPX/IPX), and Microsoft's Lan Manager (SMB). Typically
these products require a Local Area Network (LAN) and provide a network-based
file access method. The applications that use these protocols do not need to
distinguish the locality of a file (what system/disk the file physically
resides on) and the users of such systems are normally only aware of a new
hard disk or an additional mounted file system or directory.
OSI Reference Model Layer Functions


Layer 1 -- Physical


The Physical Layer is responsible for the transmission and reception of data
over electrical, optical, or RF devices. It is mainly concerned with
identifying bits and building bytes of data.


Layer 2 -- Data Link


The Data Link Layer formats the raw bytes of data into groups of bytes called
frames. A frame has sequencing and error correction information to allow the
Data Link Layer to identify frames that have been garbled in transmission. The
information inside of a frame is known as a packet.


Layer 3 -- Network


The Network Layer routes packets to their proper destination. This layer is
necessary because computers can be connected to many networks and data that is
destined for a specific computer might need to tranverse several computer
systems to get to the end destination.


Layer 4 -- Transport


The Transport Layer processes packets from a network and re-organizes the data
into the logical format and order required by the Session Layer. Data from the
Session Layer must conform to the needs of the network and might require
segmentation of the data (splitting a large packet into several smaller
packets). The Transport Layer is responsible for end-to-end communications
between computer systems (not applications).


Layer 5 -- Session


The Session Layer regulates the flow of data for a class of service on a
computer system. The Session Layer communicates with its remote partner and
exchanges control packets describing what each partner's current operation
will be. Data packets are transmitted to remote partners only after an
exchange confirming the purpose and type of data to be transmitted.


Layer 6 -- Presentation


The Presentation Layer is responsible for converting data from one machine's
format to another. The most common service performed by the Presentation Layer
is the conversion of ASCII data to or from EBCDIC.


Layer 7 -- Application


The Application Layer is typically a utility program that internally
implements an application protocol (FTP, RSH, RMAIL). This layer can also be a
user-written program communicating through a session level protocol (sockets)
or an embedded function of the operating system such as Remote File System
(RFS), Network File System (NFS), or Network Virtual Device (NVD).
Figure 1 Client-Server Model
Figure 2 Star Network Topology
Figure 3 BFS X.25 Network Topology

Listing 1 r-Function Declaration
RFILE *rfopen(sfname, sftype) /* Opens a file on a File Server */
char *sfname; /* <server-name>!<file-name> */
char *sftype; /* file access type - same as fopen */

int rfclose(rstream) /* Closes file on a File Server */
RFILE *rstream; /* rfile pointer from rfopen routine */


int rfread(buf, size, nitems, rstream) /* Reads data from a remote file */
char *bur; /* application buffer for data */
int size; /* size of item to read */
int nitems; /* number of items to read */
RFILE *rstream; /* rfile pointer from rfopen */

int rfwrite(buf, size, nitems, rstream) /* Writes data to a remote file */
char *buf; /* application buffer for data */
int size; /* size of item to write */
int nitems; /* number of items to write */
RFILE *rstream; /* rfile pointer from rfopen */

int rfeof(rstream) /* Test for End-of-File */
RFILE *rstream; /* rfile pointer from rfopen routine */


Listing 2 Client Configuration File
#
# BFS_CMON Servers Identification File (Sample)
#
# The local BFS_CMON Client is the first entry
#
# Local Client Port Local Address
#-----------------+------+-------------------
CLIENT 1 0 10002
#
# Server Name Port Remote Address
#-----------------+------+-------------------
SERVER1 1 10001


Listing 3 Server Configuration File
#
# BFS_SMON Clients Identification File (Sample)
#
# The local BFS_SMON Server is the first entry
#
# Local Server Port Local Address
#-----------------+------+-------------------
SERVER 1 0 10001
#
# Client Name Port Remote Address
#-----------------+------+-------------------
CLIENT1 1 10002


Listing 4 RPC Protocol Declarations
/**************************************************************
* remote procedure call protocol definitions *
**************************************************************/

#define RPC_FOPEN_CMD 0x0001 /* fopen session/file command */
#define RPC_FOPEN_RSP 0x8001 /* fopen session/file response */

#define RPC_FCLOSE_CMD 0x0002 /* fclose session/file command */
#define RPC_FCLOSE_RSP 0x8002 /* fclose session/file response */

#define RPC_FREAD_CMD 0x0003 /* fread session/file command */
#define RPC_FREAD_RSP 0x8003 /* fread session/file response */


#define RPC_FWRITE_CMD 0x0004 /* fwrite session/file command */
#define RPC_FWRITE_RSP 0x8004 /* fwrite session/file response */

#define RPC_FEOF_CMD 0x0005 /* feof session/file command */
#define RPC_FEOF_RSP 0x8005 /* feof session/file response */

/**************************************************************
* remote procedure call message definitions *
**************************************************************/

typedef struct { /* RPC header format */
int code; /* command/response opcode */
int csid; /* client session identifier */
int ssid; /* server session identifier */
int dlen; /* data field length value */
} RPC_HDR;

typedef struct { /* RPC command message format (default) */
RPC_HDR hdr; /* standard header */
int res1; /* reserved 1 (unused) */
int res2; /* reserved 2 (unused) */
} RPC_CMD_MSG;

typedef struct { /* RPC response message format (default) */
RPC_HDR hdr; /* standard header */
int status; /* response status */
int errno; /* response errno */
} RPC_RSP_MSG;

typedef struct { /* RPC fopen cmd format */
RPC_HDR hdr; /* standard header */
int fil1; /* filler 1 */
int fil2; /* filler 2 */
char sname[16]; /* server name */
char fname[64]; /* file name */
char ftype[8]; /* file type */
} RPC_FOPEN_CMD_MSG;


Listing 5 Basic File System Control Structures
/**************************************************************
* application session/file control block structure *
**************************************************************/

typedef struct {
int state; /* state of session */
char sname[16]; /* server name */
char fname[64]; /* filename name */
char ftype[8]; /* access type */
} RFILE;

/**************************************************************
* client fifo control block structure *
**************************************************************/

typedef struct {
int afifo; /* apl fifo */
int pid; /* apl pid */

int link; /* link count */
char afname[64]; /* apl fifo name */
} C_FCB;

/**************************************************************
* client session control block structure *
**************************************************************/

typedef struct {
int pid; /* apl pid */
int fid; /* apl file id */
int lid; /* line cntl blk id */
int state; /* cntl blk state */
int ssid; /* server session id */
} C_SCB;

/**************************************************************
* client line control block structure *
**************************************************************/

typedef struct {
int port; /* x.25 port id */
int cid; /* x.25 circuit id */
int link; /* session link count */
char sname[16]; /* server name */
char raddr[16]; /* server X.25 addr */
char *cbuf; /* x.25 cmd buf ptr */
char *rbuf; /* x.25 rsp buf ptr */
char *dbuf; /* x.25 data buf ptr */
} C_LCB;

/**************************************************************
* server line control block structure *
**************************************************************/

typedef struct {
int port; /* x.25 port id */
int cid; /* x.25 circuit id */
int link; /* session link count */
char cname[16]; /* client name */
char raddr[16]; /* client X.25 addr */
char *cbuf; /* x.25 cmd buf ptr */
char *rbuf; /* x.25 rsp buf ptr */
char *dbuf; /* x.25 data buf ptr */
} S_LCB;

/**************************************************************
* server session control block structure *
**************************************************************/

typedef struct {
int lid; /* line cntl blk id */
FILE *fid; /* file cntl blk id */
char fname[64]; /* file name */
char ftype[8]; /* access type */
int state; /* cntl blk state */
int csid; /* client session id */
} S_SCB;



Listing 6
/*
+------------------------------------------------------------
 @(#) bfs_apl.c vl.0 90/04/02


 Copyright (C) 1990, StonyBrook Technologies, Inc.,
 All Rights Reserved.



+-------------------------------------------------------------

bfs_apl.c -- this is a library of routines for accessing the
file server thru the client daemon.

*/

#include <stdio.h>
#include <sys/fcntl.h>
#inclucle <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <sys/signal.h>
#include "bfs.h"

int afifo; /* apl read fifo (created by apl) */
int afifow; /* apl write fifo (created by apl) */
int cfifo; /* apl write fifo (created by cmon */
int alink = 0; /* apl fifo link counter (close = 0 ) */
char afname[132]; /* apl fifo name - /tmp/AFlFO.pid */

RFILE rfile[MAX_RFILES]; /* apl file control blocks */

/* rfopen procedure */

RFILE *rfopen(rfname, rftype)
char *rfname;
char *rftype;
{
int ridx, fidx, result;
char *sname, *fname, *cptr;

if(!alink) bfs_apl_init();

sname = rfname;
for(cptr=rfname; *cptr; cptr++)
if(*cptr == '!') {
*cptr = '\0';
fname = cptr+1;
for(ridx=0, fidx=-l; ridx<MAX_RFILES; ridx++)
if(rfile[ridx].state == -1) {
fidx = ridx;
break;
}
if(fidx != -1) {
if(arpc_fopen(fidx, sname, fname, rftype) != -1) {
strcpy(rfile[fidx].sname, sname);

strcpy(rfile[fidx] .fname, fname);
strcpy(rfile[fidx] .ftype, rftype);
rfile[fidx].state = fidx;
errno = 0;
*cptr = '!';
return(&rfile[fidx]);
} else {
*cptr = '!';
return(NULL);
}
} else {
*cptr = '!';
errno = EMFILE;
return(NULL);
}
}
errno = EINVAL;
return(NULL);
}

/* rfclose procedure */

int rfclose(rstream)
RFILE *rstream;
{
int result;

if((rstream != NULL) && (rstream->state)) {
result = arpc_fclose(rstream->state);
rstream->state = -1;
return(result);
}
errno = EINVAL;
return(-1);
}

/* rfread procedure */

int rfread(ptr, size, nitems, rstream)
char *ptr;
int size;
int nitems;
RFILE *rstream;
{
if((rstream != NULL) && (rstream->state))
return(arpc_fread(rstream->state, ptr, size, nitems));
errno = EINVAL;
return(-1);
}

/* rfwrite procedure */

int rfwrite(ptr, size, nitems, rstream)
char *ptr;
int size;
int nitems;
RFILE *rstream;
{
if((rstream != NULL) && (rstream->state))

return(arpc_fwrite(rstream->state, ptr, size, nitems));
errno = EINVAL;
return(-1);
}

/* rfeof procedure */

int rfeof(rstream)
RFILE *rstream;
{
if((rstream != NULL) && (rstream->state))
return(arpc_feof(rstream->state));
errno = EINVAL;
return(-1);
}

/* arpc_fopen procedure */

arpc_fopen(fid, sname, fname, ftype)
int fid;
char *sname, *fname, *ftype;
{
RPC_FOPEN_CMD_MSG focmd;
RPC_FOPEN_RSP_MSG forsp;

if(!afifo_open())
return(-1);

focmd.hdr.cede = RPC_FOPEN_CMD;
focmd.hdr.csid = getpid();
focmd.hdr.ssid = fid;
focmd.hdr.dlen = sizeof(RPC_FOPEN_CMD_MSG) - sizeof(RPC_MSG);

strcpy(focmd.sname, sname);
strcpy(focmd.fname, fname);
strcpy(focmd.ftype, ftype);

cfifo_write(&focmd, sizeof(RPC_FOPEN_CMD_MSG));
afifo_read(&forsp, sizeof(RPC_FOPEN_RSP_MSG));

if(forsp.status == -1)
afifo_close();

errno = forsp.errno;
return(forsp.status);
}

/* arpc_fclose procedure */

arpc_fclose(fid)
int fid;
{
RPC_FCLOSE_CMD_MSG fccmd;
RPC_FCLOSE_RSP_MSG fcrsp;

fccmd.hdr.code = RPC_FCLOSE_CMD;
fccmd.hdr.csid = getpid();
fccmd.hdr.ssid = fid;
fccmd.hdr.dlen = 0;


cfifo_write(&fccmd, sizeof(RPC_FCLOSE_CMD_MSG));
afifo_read(&fcrsp, sizeof(RPC_FCLOSE_RSP_MSG));

afifo_close();

errno = fcrsp.errno;
return(fcrsp.status);
}

/* arpc_fread procedure */

arpc_fread(fid, buf, size, nitems)
int fid, size, nitems;
char *buf;
{
RPC_FREAD_CMD_MSG frcmd;
RPC_FREAD_RSP_MSG frrsp;

frcmd.hdr.code = RPC_FREAD_CMD;
frcmd.hdr.csid = getpid();
frcmd.hdr.ssid = fid;
frcmd.hdr.dlen = O;
frcmd.size = size;
frcmd.nitems = nitems;

cfifo_write(&frcmd, sizeof(RPC_FREAD_CMD_MSG));
afifo_read(&frrsp, sizeof(RPC_FREAD_RSP_MSG));

if(frrsp.hdr.dlen > 0)
afifo_read(buf, frrsp.hdr.dlen);

errno = frrsp. errno;
return(frrsp.status);
}

/* arpc_fwrite procedure */

arpc_fwrite(fid, but, size, nitems)
int fid, size, nitems;
char *buf;
{
RPC_FWRITE_CMD_MSG fwcmd;
RPC_FWRITE_RSP_MSG fwrsp;
int len;

len = size * nitems;
fwcmd.hdr.code = RPC_FWRITE_CMD;
fwcmd.hdr.csid = getpid();
fwcmd.hdr.ssid = fid;
fwcmd.hdr.dlen = len;
fwcmd.size = size;
fwcmd.nitems = nitems;

cfifo_write(&fwcmd, sizeof(RPC_FWRITE_CMD_MSG));
cfifo_write(buf, len);
afifo_read(&fwrsp, sizeof(RPC_FWRITE_RSP_MSG));

errno = fwrsp.errno;

return(fwrsp.status);
}

/* arpc_feof procedure */

arpc_feof(fid)
int fid;
{
RPC_FEOF_CMD_MSG fecmd;
RPC_FEOF_RSP_MSG fersp;

fecmd.hdr.code = RPC_FEOF_CMD;
fecmd.hdr.csid = getpid();
fecmd.hdr.ssid = fid;
fecmd.hdr.dlen = 0;

cfifo_write(&fecmd, sizeof(RPC_FEOF_CMD_MSG));
afifo_read(&fersp, sizeof(RPC_FEOF_RSP_MSG));

errno = fersp.errno;
return(fersp.status);
}


bfs_apl_init()
{
int i;

for(i=0; i<MAX_RFILES; i++) {
rfile[i].state = -1;
rfile[i].sname[0] = '\0';
rfile[i].fname[0] = '\0';
rfile[i].ftype[0] = '\0';
}
}

afifo_open()
{
int pid, flags;

if(alink) {
alink++;
return(1);
}

pid = getpid();

if((cfifo=open("/tmp/CFIFO", 0_WRONLY)) != -1) {
sprintf(afname, "/tmp/AFIFO.%d", pid);
if(mknod(afname, S_IFIFO«0666, 0) == -1)
return(0);
if((afifo=open(afname, O_RDONLY«O_NDELAY)) == -1) {

unlink(afname);
close(cfifo);
return(0);
}
flags = fcntl(afifo, F_GETFL, &flags);
flags &= ~0_NDELAY;

fcntl(afifo, F_SETFL, &flags);
afifow=open(afname, O_WRONLY);
alink = 1;
return(1);
}
return(0);
}


afifo_close()
{
alink--;
if(!alink) {
close(cfifo);
close(afifo);
close(afifow);
unlink(afname);
}
return(1);
}


afifo_read(buf, cnt)
char *buf;
int cnt;
{
return(read(afifo, buf, cnt));
}


cfifo_write(buf, cnt)
char *buf;
int cnt;
{
return(write(cfifo, buf, cnt));
}


Listing 7 Application Test Program
#include <stdio.h>
#include <errno.h>
#include "bfs.h"

main()
{

RFILE *rfd;
char buf[128];

if((rfd=rfopen("SERVER1!testdata", "r"))==NULL) {
printf("File Open Error %d!\n", errno);
perror("testdata");
exit(errno);
}

printf("Remote File SERVER! testdata Open!\n");

while(!rfeof(rfd)) {
rfread(buf, 1, 128, rfd);

printf("Data=[%s]\n", buf);
}

rfclose(rfd);

}


Listing 8
/*
+--------------------------------------------------------------+
 @(#) bfs_cmon.c v1.0 90/04/02 
 
 Copyright (C) 1990, StonyBrook Technologies, Inc., 
 All Rights Reserved. 
+---------------------------------------------------------------+

bfs_cmon.c - this is a client server daemon.
*/
#include <stdio.h>
#include <sys/fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <sys/signal.h>
#include <varargs.h>
#include <eicon/x25.h>
#include "bfs.h"
static int running = 1; /* running bit - can be modified on SIGINT */
char cname[16]; /* client name */
char laddr[16]; /* client addr */
int cfifo; /* client fifo */
int c_lcb_cnt = 0; /* line cntl blk count */
C_LCB c_lcb(MAX_C_LCB]; /* line cntl blk */
C_FCB c_fcb[MAX_C_FCB]; /* apl fifo cntl blk */
C_SCB c_scb[MAX_C_SCB]; /* session cntl blk */
/* x.25 call parameter data */
struct x25parm x25_ parms = { 0, "", 0, "", "" };
/* debug stuff */
int cmon_io;
char cmon_buf [132];
int debug_flag = 0;
signal_trap(in_signal)
int in_signal;
{
if(in_signal == SIGTERM)
running = 0; /* terminate when requested by outside */
}
cleanup_exit(exval)
int exval;
{
cmon_term();
exit(exval);
}
cmon_init()
{
FILE *ifd;
char buf[132];
int i, foundme;


dopen();

for(i=0; i<MAX_C_LCB; i++) {
c_lcb[i].port = c_lcb[i].link = c_lcb[i].cid = 0;
c_lcb[i].sname[0] = '\0';
c_lcb[i].raddr[0] = '\0';
c_lcb[i].cbuf = c_lcb[i].rbuf = NULL;
}

for(i=0; i<MAX_C_FCB; i++)
c_fcb[i].afifo = c_fcb[i].pid = c_fcb[i].link = 0;

for(i=0; i<MAX_C_SCB; i++) {
c_scb[i].pid = c_scb[i].fid =
c_scb[i].lid = c_scb[i].ssid = 0;
c_scb[i].state = -1;
}

c_lcb_cnt = 0;
foundme = 0;

if((ifd=fopen("servers", "r")) != NULL) {
while(fgets(buf, 132, ifd)) {
if((buf[0] != '#') && (sscanf(buf, "%s %d %s",
c_lcb[c_lcb_cnt].sname,
&c_lcb[c_lcb_cnt].port,
c_lcb[c_lcb_cnt].raddr) > 0)) {
if(!foundme) {
foundme = 1;
strcpy(cname, c_lcb[c_lcb_cnt].sname);
strcpy(laddr, c_lcb[c_lcb_cnt].raddr);
} else
c_lcb_cnt++;
}
}
fclose(ifd);
} else {
strcpy(cname, "NONAMECLIENT");
strcpy(laddr, "12345");
}

cfifo_init();
cx25_init();
}
cmon_term()
{
cfifo_term();
cx25_term();
dprintf("BFS-CMON Terminating\n");
dclose();
}

main()
{
RPC_MSG rpc_msg;
int flags;

cmon_init();


dprintf("Basic File System - Client Monitor (BFS-CMON) Ver 1.0\n");
dprintf("Copyright (C) 1990, StonyBrook Technologies, Inc.\n");
dprintf("All Rights Reserved.\n");

print_info(); /* print routing table to debug file */

if(signal(SIGTERM, signal_trap) == -1) { /* setup signal handler */
perror("SIGTERM Failed");
exit(errno);
}
while( running ) {
if(cfifo_read(&rpc_msg, sizeof(RPC_MSG))) {
switch(rpc_msg.hdr.code) {
case RPC_FOPEN_CMD:
cmon_fopen(&rpc_msg);
break;
case RPC_FCLOSE_CMD:
cmon_fclose(&rpc_msg);
break;
case RPC_FWRITE_CMD:

cmon_fwrite(&rpc_msg);
break;
case RPC_FREAD_CMD:
cmon_fread(&rpc_msg);
break;
case RPC_FEOF_CMD:
cmon_feof(&rpc_msg);
break;
default:
break;
}
}
}

cmon_term();
}

/* cfifo_init, _term, _open, _close, _read, _write routines */

cfifo_init()
{
int flags;

if(mknod("/tmp/CFIFO", S_IFIFO0666, 0) == -1) return(0);

if((cfifo=open("/tmp/CFIFO", O_RDONLYO_NDELAY)) != -1) {
flags = fcntl(cfifo, F_GETFL, &flags);
flags &= ~O_NDELAY;
fcntl(cfifo, F_SETFL, &flags);
return(1);
}
unlink("/tmp/CFIFO");
return(0);
}

cfifo_term()
{

close(cfifo);
unlink("/tmp/CFIFO");
}

cfifo_open(pid)
int pid;
{
int i;
for(i=0; i<MAX_C_FCB; i++)
if((c_fcb[i].link) && (c_fcb[i].pid == pid)) {
c_fcb[i].link++;
return(1);
}
for(i=0; i<MAX_C_FCB; i++)
if(!c_fcb[i].link) {
sprintf(c_fcb[i].afname, "/tmp/AFIFO.%d", pid);
if((c_fcb[i].afifo=open(c_fcb[i].afname, O_WRONLY)) != -1) {
c_fcb[i].link = 1;
c_fcb[i].pid = pid;
return(1);
}
break;
}
return(0);
}
cfifo_close(pid)
int pid;
{
int i;

for(i=0; i<MAX_C_FCB; i++)
if(c_fcb[i].pid == pid) {
c_fcb[i].link--;
if(!c_fcb[i].link) {

close(c_fcb[i].afifo);
c_fcb[i].pid = c_fcb[i].afifo = -1;
}
return(1);
}
return(0);
}

cfifo_read(buf, cnt)
char *buf;
int cnt;
{
return(read(cfifo, buf, cnt));
}

cfifo_write(pid, buf, cnt)
int pid, cnt;
char *buf;
{
int i;

for(i=0; i<MAX_C_FCB; i++)
if(c_fcb[i].pid == pid)
return(write(c_fcb[i].afifo, buf, cnt));

return(0);
}

/* cx25_init, _term, _open, _close,
_send_cmd, _send_data, _recv_rsp, _recv_data */
cx25_init()
{
x25init(0);
}

cx25_term()
{
x25exit();
}

cx25_open(lid)
int lid;
{
if(!c_lcb[lid].link) {
dprintf("open x25 line %d\n", lid);
if(x25call(c_lcb[lid].raddr, laddr, &x25_parms,
c_lcb[lid].port, X25WAIT, &c_lcb[lid].cid) == -1) {
dprintf("x25 open error %d on line %d\n", x25error(), lid);
return(0);
}
dprintf("x25 line %d opened ok\n", lid);
c_lcb[lid].link = 1;
c_lcb[lid].cbuf = x25alloc(sizeof(RPC_FOPEN_CMD_MSG));
c_lcb[lid].rbuf = x25alloc(sizeof(RPC_FOPEN_RSP_MSG));
} else {
dprintf("link x25 line %d\n", lid);
c_lcb[lid].link++;
}
return(1);
}

cx25_close(lid)
int lid;
{
dprintf("delink x25 line %d\n", lid);
c_lcb[lid].link--;
if(!c_lcb[lid].link) {
dprintf("close x25 line %d\n", lid);
x25free(c_lcb[lid].cbuf);
x25free(c_lcb[lid] .rbuf);
}
return(1);
}

cx25_send_cmd(lid, buf, cnt)
int lid, cnt;
char *buf;
{

memcpy(c_lcb[lid].cbuf, buf, cnt);
cnt = x25send(c_lcb[lid].cid, c_lcb[lid].cbuf, cnt, 0);
return(cnt);
}


cx25_send_data(lid, pid, cnt)
int lid, pid, cnt;
{
c_lcb[lid].dbuf = x25alloc(cnt);
cfifo_read(c_lcb[lid].dbuf, cnt);
cnt = x25send(c_lcb[lid].cid, c_lcb[lid].dbuf, cnt, 0);
x25free(c_lcb[lid].dbuf);
return(cnt);
}

cx25_recv_rsp(lid, buf, cnt)
int lid, cnt;
char *buf;
{
cnt = x25recv(c_lcb[lid].cid, c_lcb[lid].cbuf, cnt, 0);
memcpy(buf, c_lcb[lid].cbuf, cnt);
return(cnt);
}

cx25_recv_data(lid, pid, cnt)
int lid, pid, cnt;
{
c_lcb[lid].dbuf = x25alloc(cnt);
cnt = x25recv(c_lcb[lid].cid, c_lcb[[id].dbuf, cnt, 0);
cfifo_write(pid, c_lcb[lid].dbuf, cnt);
x25free(c_lcb[lid].dbuf);
return(cnt);
}

/* cmon_fopen, _fclose, _fread, _fwrite, _feof routines */

cmon_f open(rpc_msg)
RPC_MSG *rpc_msg;
{
int pid, fid, lid, csid;
RPC_FOPEN_CMD_MSG .focmd;
RPC_FOPEN_RSP_MSG forsp;

cfifo_read(focmd.sname, rpc_msg->hdr.dlen);

pid = rpc_msg->hdr.csid;
fid = rpc_msg->hdr.ssid;
lid = get_server_id(focmd.sname);

if(cfifo_open(pid)) {
if(lid != -1 ) {
dprintf("fopen req for %s - ok\n", focmd.sname);
if(cx25_open(lid)) {
if((csid=allocate_csid(pid, fid, lid)) != -1) {
focmd.hdr.code = RPC_FOPEN_CMD;
focmd.hdr.csid = csid;
focmd.hdr.ssid = 0;
focmd.hdr.dlen = rpc_msg->hdr.dlen;
cx25_send_cmd(lid, (char *)&focmd, sizeof(RPC_FOPEN_CMD_MSG));
cx25_recv_rsp(lid, (char *)&forsp, sizeof(RPC_FOPEN_RSP_MSG));
forsp.hdr.csid = pid;
forsp.hdr.ssid = fid;
cfifo_write(pid, (char *)&forsp, sizeof(RPC_FOPEN_RSP_MSG));
if(forsp.status == -1) {

deallocate_csid(csid);
cx25_close(lid);
cfifo_close(pid);
}

return;
} else
cx25_close(lid);
}
}
dprintf("fopen req for %s - failed\n", focmd.sname);
forsp.hdr.code = RPC_FOPEN_RSP;
forsp.hdr.csid = pid;
forsp.hdr.ssid = fid;
forsp.hdr.dlen = 0;
forsp.status = -1;
forsp.errno = ENONET;
cfifo_write(pid, (char *)&forsp, sizeof(RPC_FOPEN_RSP_MSG));
cfifo_close(pid);
}
}

cmon_fclose(rpc_msg)
RPC_MSG *rpc_msg;
{
int pid, fid, lid, csid;
RPC_FCLOSE_CMD_MSG fccmd;
RPC_FCLOSE_RSP_MSG fcrsp;

pid = rpc_msg->hdr.csid;
fid = rpc_msg->hdr.ssid;
if((csid = locate_csid(pid, fid)) == -1) return;
lid = c_scb[csid].lid;

fccmd.hdr.code = RPC_FCLOSE_CMD;
fccmd.hdr.csid = csid;
fccmd.hdr.ssid = c_scb[csid].ssid;
fccmd.hdr.dlen = 0;
cx25_send_cmd(lid, (char *)&fccmd, sizeof(RPC_FCLOSE_CMD_MSG));
cx25_recv_rsp(lid, (char *)&fcrsp, sizeof(RPC_FCLOSE_RSP_MSG));
cfifo_write(pid, (char *)&fcrsp, sizeof(RPC_FCLOSE_RSP_MSG));
deallocate_csid(csid);
cfifo_close(pid);
cx25_close(lid);
}

cmon_fread(rpc_msg)
RPC_MSG *rpc_msg;
{
int pid, fid, lid, csid;
RPC_FREAD_CMD_MSG frcmd;
RPC_FREAD_RSP_MSG frrsp;

pid = rpc_msg->hdr.csid;
fid = rpc_msg->hdr.ssid;
if((csid = locate_csid(pid, fid)) == -1) return;
lid = c_scb[csid].lid;

frcmd.hdr.code = RPC_FREAD_CMD;

frcmd.hdr.csid = csid;
frcmd.hdr.ssid = c_scb[csid].ssid;
frcmd.hdr.dlen = rpc_msg->hdr.dlen;
cx25_send_cmd(lid, (char *)&frcmd, sizeof(RPC_FREAD_CMD_MSG));
cx25_recv_rsp(lid, (char *)&frrsp, sizeof(RPC_FREAD_RSP_MSG));
cfifo_write(pid, (char *)&frrsp, sizeof(RPC_FREAD_RSP_MSG));
if(frrsp.hdr.dlen > 0)
cx25_recv_data(lid, pid, frrsp.hdr.dlen);
}

cmon_fwrite(rpc_msg)
RPC_MSG *rpc_msg;
{
int pid, fid, lid, csid;
RPC_FWRITE_CMD_MSG fwcmd;
RPC_FWRITE_RSP_MSG fwrsp;

pid = rpc_msg->hdr.csid;
fid = rpc_msg->hdr.ssid;
if((csid = locate_csid(pid, fid)) == -1) return;
lid = c_scb[csid].lid;

fwcmd.hdr.code = RPC_FWRITE_CMD;
fwcmd.hdr.csid = csid;
fwcmd.hdr.ssid = c_scb[csid].ssid;
fwcmd.hdr.dlen = rpc_msg->hdr.dlen;
cx25_send_cmd(lid, (char *)&fwcmd, sizeof(RPC_FWRITE_CMD_MSG));
if(fwrsp.hdr.dlen > 0)
cx25_send_data(lid, pid, fwrsp.hdr.dlen);
cx25_recv_rsp(lid, (char *)&fwrsp, sizeof(RPC_FWRITE_RSP_MSG));
cfifo_write(pid, (char *)&fwrsp, sizeof(RPC_FWRITE_RSP_MSG));
}

cmon_feof(rpc_msg)
RPC_MSG *rpc_msg;
{

int pid, fid, lid, csid;
RPC_FEOF_CMD_MSG fecmd;
RPC_FEOF_RSP_MSG fersp;

pid = rpc_msg->hdr.csid;
fid = rpc_msg->hdr.ssid;
if((csid = locate_csid(pid, fid)) == -1) return;
lid = c_scb[csid].lid;

fecmd.hdr.code = RPC_FEOF_CMD;
fecmd.hdr.csid = csid;
fecmd.hdr.ssid = c_scb[csid].ssid;
fecmd.hdr.dlen = 0;
cx25_send_cmd(lid, (char *)&fecmd, sizeof(RPC_FEOF_CMD_MSG));
cx25_recv_rsp(lid, (char *)&fersp, sizeof(RPC_FEOF_RSP_MSG));
cfifo_write(pid, (char *)&fersp, sizeof(RPC_FEOF_RSP_MSG));
}

/* utility routines */

get_server_id(sname)
char *sname;

{
int i;

for(i=0; i<c_lcb_cnt; i++)
if(!strcmp(sname, c_lcb[i].sname)) return(i);

return(-1);
}

allocate_csid(pid, fid, lid)
int pid, fid, lid;
{
int csid;

for(csid=0; csid<MAX_C_SCB; csid++)
if(c_scb[csid].state == -1) {
c_scb[csid].state = csid;
c_scb[csid].pid = pid;
c_scb[csid].fid = fid;
c_scb[csid].lid = lid;
c_scb[csid].ssid = 0;
return(csid);
}
return(-1);
}

deallocate_csid(csid)
int csid;
{

c_scb[csid].state = -1;
}

locate_csid(pid, fid)
int pid, fid;
{

int csid;

for(csid=0; csid<MAX_C_SCB; csid++)
if((c_scb[csid].pid == pid) &&
(c_scb[csid].fid == fid))
return(csid);
return(-1);
}


/* debug utilities */

dopen()
{
if((cmon_io = open("/tmp/cmon_io", O_WRONLYO_CREAT, 0666)) != -1)
debug_flag = 1;
else
debug_flag = 0;
}

dclose()
{

if(debug_flag)
close(cmon_io);
debug_flag = 0;
}

dprintf(fmt, va_alist)
char *fmt;
va_dcl
{
va_list ap;

if(!debug_flag) return;
va_start(ap);
vsprintf(cmon_buf, fmt, ap);
write(cmon_io, cmon_buf, strlen(cmon_buf));
va_end(ap);
}

print_info()
{
int i;

dprintf("\nclient: %s laddr: %s - Total Servers: %d\n",
cname, laddr, c_lcb_cnt);

for(i=0; i<c_lcb_cnt; i++)
dprintf(" server: %s port: %d raddr: %s\n",
c_lcb[i].sname, c_lcb[i].port, c_lcb[i].raddr);
}


Listing 9
/*
+-------------------------------------------------------------- +
 @(#) bfs_smon.c v1.0 90/04/02 
 
 Copyright (C) 1990, StonyBrook Technologies, Inc., 
 ALL Rights Reserved. 
 
+---------------------------------------------------------------+

bfs_smon.c - this is a file server daemon.
*/

#include <stdio.h>
#include <sys/fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>
#include <sys/signal.h>
#include <verargs.h>
#include <eicon/x25.h>
#include <eicon/x25errno.h>
#include "bfs.h"

static int running = 1; /* running bit - can be modified on SIGINT */

int s_lcb_cnt;
S_LCB s_lcb[MAX_S_LCB];

S_SCB s_scb[MAX_S_SCB];
char sname[16];
char laddr[16];

struct x25doneinfo event;

/* debug stuff */
int smon_io;
char smon_buf[132];
int debug_flag = 0;

signal_trap(in_signal)
int in_signal;
{
if(in_signal == SIGTERM)
running = 0; /* terminate when requested by outside */
}

cleanup_exit(exval)
int exval;
{
smon_term();
exit(exval);
}

smon_term()
{
sx25_term();
dprintf("BFS-SMON Terminating\n");
dclose();
}


smon_init()
{
FILE *ifd;
char but[132];
int i, foundme;

dopen();

for(i=0; i<MAX_S_SCB; i++) {

s_scb[i].lid = s_scb[i].csid = -1;
s_scb[i].fid = NULL;
s_scb[i].fname[0] = s_scb[i].ftype[0] = '\0';
}

for(i=0; i<MAX_S_LCB; i++) {
s_lcb[i].cid = - 1;
s_lcb[i].port = s_lcb[i].link = 0;
s_lcb[i].cname[0] = '\0';
s_lcb[i].raddr[0] = '\0';
s_lcb[i].cbuf = s_lcb[i].rbuf = s_lcb[i].dbuf = NULL;
}

s_lcb_cnt = 0;
foundme = 0;


if((ifd=fopen("clients", "r")) != NULL) {
white(fgets(buf, 132, ifd)) {
if((buf[0] != '#') && (sscanf(buf, "%s %d %s", s_lcb[s_lcb_cnt].cname,
&s_lcb[s_lcb_cnt].port, s_lcb[s_lcb_cnt].raddr) > 0)) {
if(!foundme) {
foundme = 1;
strcpy(sname, s_lcb[s_lcb_cnt).cname);
strcpy(laddr, s_lcb[s_lcb_cnt].raddr);
} else
s_lcb_cnt++;
}
}
fclose(ifd);
} else
strcpy(sname, "NONAMESERVER");

sx25_init();
}

main()
{
int x25err;

smon_init();

dprintf("Basic File System - Server Monitor (BFS-SMON) Ver 1.0\n");
dprintf("Copyright (C) 1990, StonyBrook Technologies, Inc.\n");
dprintf("All Rights Reserved.\n");

signal(SIGTERM, signal_trap); /* setup signal handler */

print_info();

while(running) {
if(x25done(XD_ALLCONN, XD_NOTO, &event) != -1) {
switch(event.xi_code) {
case XC_LISTEN:
smon_listen_process();
break;
case XC_RECEIVE:
smon_receive_process();
break;
}
} else {
x25err = x25error();
dprintf("x25done error %d\n", x25err);
if(x25err >= ENOPEND)
running = 0;
}
}

smon_term();
}

/* smon_listen, _receive processing routines */

smon_listen_process()

{

int lid, cid;

cid = event.xi_cid;
lid = find_lid(cid);
if(!event.xi_retcode)
x25recv(cid, s_lcb[lid].cbuf, sizeof(RPC_FOPEN_CMD_MSG));
else {
dprintf("listen failed (%d) - client %s down\n",
event.xi_retcode, s_lcb[lid].cname);
s_lcb[lid].cid = -1;
}
}

smon_receive_process()
{
RPC_CMD_MSG *rpc_msg;
int lid, cid, x25err;

cid = event.xi_cid;
lid = find_lid(cid);
if(!event.xi_retcode) {
rpc_msg = (RPC_CMD_MSG *)s_lcb[lid].cbuf;
switch(rpc_msg->hdr.code) {
case RPC_FOPEN_CMD:
srpc_fopen(lid, (RPC_FOPEN_CMD_MSG *)rpc_msg);
break;
case RPC_FCLOSE_CMD:
srpc_fclose(lid, (RPC_FCLOSE_CMD_MSG *)rpc_msg);
break;
case RPC_FREAD_CMD:
srpc_fread(lid, (RPC_FREAD_CMD_MSG *)rpc_msg);
break;
case RPC_FWRITE_CMD:
srpc_fwrite(lid, (RPC_FWRITE_CMD_MSG *)rpc_msg);
break;
case RPC_FEOF_CMD:
srpc_feof(lid, (RPC_FEOF_CMD_MSG *)rpc_msg);
break;
}
} else {
x25err = x25error();
if(x25err==ECALLCLR) {
dprintf("Call Clear line %d (cid=%d)\n", lid, cid);
if(x25listen(s_lcb[lid].raddr, laddr, NULL,
s_lcb[lid].port, X25NOWAIT, &s_lcb[lid].cid) < 0)
dprintf("listen error %d - client %s down\n",
x25error(), s_lcb[lid].cname);
} else {
dprintf("Receive Error %d on line %d (cid=%d)\n", x25err, lid, cid);
x25recv(cid, s_lcb[lid].cbuf, sizeof(RPC_FOPEN_CMD_MSG));
}
}
}
/* srpc_fopen, _fclose, _fread, _fwrite, _feof RPC protocol routines */

srpc_fopen(lid, focmd)
int lid;
RPC_FOPEN_CMD_MSG *focmd;
{

RPC_FOPEN_RSP_MSG *forsp;
int csid, ssid;

forsp = (RPC_FOPEN_RSP_MSG *)s_lcb[lid].rbuf;
csid = focmd->hdr.csid;
ssid = allocate_ssid(lid, csid);

forsp->hdr.code = RPC_FOPEN_RSP;
forsp->hdr.csid = csid;
forsp->hdr.dlen = 0;

if(ssid != -1) {

if((s_scb[ssid).fid = fopen(focmd->fname, focmd->ftype)) != NULL) {
forsp->hdr.ssid = ssid;
forsp->status = 0;
forsp->errno = 0;
s_scb[ssid].csid = csid;
} else {
forsp->hdr.ssid = -1;
forsp->status = -1;
forsp->errno = errno;
deallocate_ssid(ssid);
}
} else {
forsp->hdr.ssid = forsp->status = -1;
forsp->errno = ENXIO;
}
sx25_send_rsp(lid);
}
srpc_fclose(lid, fccmd)
int lid;
RPC_FCLOSE_CMD_MSG *fccmd;
{
RPC_FCLOSE_RSP_MSG *fcrsp;
int csid, ssid;

fcrsp = (RPC_FCLOSE_RSP_MSG *)s_lcb[lid].rbuf;
csid = fccmd->hdr.csid;
ssid= fccmd->hdr.ssid;

fcrsp->hdr.code = RPC_FCLOSE_RSP;
fcrsp->hdr.csid = csid;
fcrsp->hdr.ssid = ssid;
fcrsp->hdr.dlen = 0;

sx25_send_rsp(lid);
}
srpc_fread(lid, frcmd)
int lid;
RPC_FREAD_CMD_MSG *frcmd;
{
RPC_FREAD_RSP_MSG *frrsp;
int csid, ssid, nitems, size;

frrsp = (RPC_FREAD_RSP_MSG *)s_lcb[lid].rbuf;
csid = frcmd->hdr.csid;
ssid = frcmd->hdr.ssid;
nitems = frcmd->nitems;

size = frcmd->size;

frrsp->hdr.code = RPC_FREAD_RSP;
frrsp->hdr.csid = csid;
frrsp->hdr.ssid = ssid;
frrsp->hdr.dlen = 0;

frrsp->status = sfs_read(lid, s_scb[ssid].fid, nitems, size);
frrsp->errno = errno;
sx25_send_rsp(lid);

if(frrsp->status>0)
sx25_send_data(lid);
}
srpc_fwrite(lid, fwcmd)
int lid;
RPC_FWRITE_CMD_MSG *fwcmd;
{
RPC_FWRITE_RSP_MSG *fwrsp;
int csid, ssid, nitems, size, cnt;

fwrsp = (RPC_FWRITE_RSP_MSG *)s_lcb[lid].rbuf;
csid = fwcmd->hdr.csid;
ssid = fwcmd->hdr.ssid;
nitems = fwcmd->nitems;
size = fwcmd->size;

fwrsp->hdr.code = RPC_FWRITE_RSP;
fwrsp->hdr.csid = csid;
fwrsp->hdr.ssid = ssid;
fwrsp->hdr.dlen = 0;

cnt = nitems * size;
sx25_recv_data(lid, cnt);
fwrsp->status = sfs_write(lid, s_scb[ssid].fid, nitems, size);
fwrsp->errno = errno;

sx25_send_rsp(lid);
}
srpc_feof(lid, fecmd)
int lid;
RPC_FEOF_CMD_MSG *fecmd;
{
RPC_FEOF_RSP_MSG *fersp;
int csid, ssid;

fersp = (RPC_FEOF_RSP_MSG *)s_lcb[lid].rbuf;
csid = fecmd->hdr.csid;
ssid = fecmd->hdr.ssid;

fersp->hdr.code = RPC_FEOF_RSP;
fersp->hdr.csid = csid;
fersp->hdr.ssid = ssid;
fersp->hdr.dlen = 0;

fersp->status = feof(s_scb[ssid].fid);
fersp->errno = errno;

sx25_send_rsp(lid);

}

/* xs25_init, _term, _waitevent, _send_rsp, _send_data _recv_data
X.25 interface routines */

sx25_init()
{
int i;

x25init(0);

for(i=0; i<s_lcb_cnt; i++)
if(x25listen(s_lcb[i].raddr, laddr, NULL,
s_lcb[i].port, X25NOWAIT, &s_lcb[i].cid) != -1 ) {
s_lcb[i].cbuf = x25alloc(sizeof(RPC_FOPEN_CMD_MSG));
s_lcb[i].rbuf = x25alloc(sizeof(RPC_RSP_MSG));
dprintf("issue xlisten for %s at %s\n", s_lcb[i].cname, s_lcb[i].raddr);
} else {
dprintf("initial xlisten failed for %s - error %d\n",
s_lcb[i].cname, x25error());
s_lcb[i].cid = -1;
}
}

sx25_term()
{
int i;

for(i=0; i<s_lcb_cnt; i++) {
if(s_lcb[i].cbuf != NULL)
x25free(s_lcb[i] .cbuf);
if(s_lcb[i] .rbuf != NULL)
x25free(s_lcb[i] .rbuf);
if(s_lcb[i].cid != -1) {
dprintf("Clearing Call to Client %s\n", s_lcb[i] .cname);
x25hangup(s_lcb[i] .cid, NULL, XH_IMM);
sx25_waitevent(s_lcb[i] .cid, XC_HANGUP, 0);
}
}
x25exit();

}

sx25_waitevent(cid, etype, errchk)
int cid, etype, errchk;
{
if(x25done(cid, XD_NOTO, &event)<0)
dprintf("waitevent done error %d\n", x25error());
else if(errchk) {
if(event.xi_code != etype)
dprintf("waitevent expected %d - got %d\n", etype, event.xi_code);
if(event.xi_retcode)
dprintf("waitevent event error %d\n", x25error());
}
}

sx25_send_rsp(lid)
int lid;
{

x25send(s_lcb[lid].cid, s_lcb[lid].rbuf, sizeof(RPC_RSP_MSG));
sx25_waitevent(s_lcb[lid].cid, XC_SEND, 0);
}

sx25_send_data(lid, cnt)
int lid, cnt;
{
x25send(s_lcb[lid].cid, s_lcb[lid].dbuf, cnt);
sx25_waitevent(s_lcb[lid].cid, XC_SEND, 0);
x25free(s_lcb[lid].dbuf);
}

sx25_recv_data(lid, cnt)
int lid, cnt;
{
s_lcb[lid].dbuf = x25alloc(cnt);
x25recv(s_lcb[lid].cid, s_lcb[lid] .dbuf, cnt);
sx25_waitevent(s_lcb[lid].cid, XC_SEND, 0);
}

/* file system utilities */

sfs_read(lid, fid, nitems, size)
int lid, nitems, size;
FILE *fid;
{
int cnt;

cnt = nitems * size;
errno = 0;
s_lcb[lid].dbuf = x25alloc(cnt);
if(s_lcb[lid] .dbuf != NULL) {
return(fread(s_lcb[lid].dbuf, nitems, size, fid));
} else
return(-1);
}

sfs_write(lid, fid, nitems, size)
int lid, nitems, size;
FILE *fid;
{
int cnt;

errno = 0;
cnt = fwrite(s_lcb[lid].dbuf, nitems, size, fid);
x25free(s_lcb[lid] .dbuf);
return(cnt);
}

/* session utilities */

allocate_ssid(lid, csid)
int lid, csid;
{
int ssid;

for(ssid=0; ssid<MAX_S_SCB; ssid++)
if(s_scb[ssid].state == -1) {
s_scb[ssid].lid = lid;

s_scb[ssid].csid = csid;
s_scb[ssid].state = ssid;
return(ssid);
}
return(-1);
}
deallocate_ssid(ssid)
int ssid;
{
s_scb[ssid].state = -1;
}

find_lid(cid)
int cid;
{
int lid;

for(lid=0; lid<s_lcb_cnt; lid++)
if(s_lcb[lid].cid == cid) return(lid);

return(-1);
}

/* debug utilities */

dopen()
{
if((smon_io = open("/tmp/smon_io" O_WRONLYO_CREAT, 0666)) != -1)
debug_flag = 1;
else
debug_flag = 0;
}

dclose()
{
if(debug_flag)
close(smon_io);
debug_flag = 0;
}

dprintf(fmt, va_alist)
char *fmt;
va_dcl
}
va_list ap;

if(!debug_flag) return;
va_start(ap);
vsprintf(smon_buf, fmt, ap);
write(smon_io, smon_buf, strlen(smon_buf));
va_end(ap);
}
print_info()
{
int i;

dprintf("\nserver: %s laddr: %s - Total Clients: %d\n",
sname, laddr, s_lcb_cnt);


for(i=0; i<s_lcb_cnt; i++)
dprintf(" client: %s port: %d raddr: %s\n",
s_lcb[i].cname, s_lcb[i].port, s_lcb(i].raddr);
}



























































XMODEM, XMODEM-1K, And XMODEM/CRC


Jonathan Ward


Jonathan Ward is a software engineer for PANDE, Inc. You may contact him at
14275 Midway Rd., Suite 220, Dallas, TX 75244 (214) 242-8628.


XMODEM, created in 1978 by Ward Christensen, is probably the most available
file transfer protocol. XMODEM's simplicity and easy implementation have made
it an immediate success. This article will discuss the original XMODEM
protocol, as well as several of its many enhancements.


Original XMODEM


XMODEM is a half duplex, 8-bit protocol which transfers files by sending a
portion of the file in a data structure called a packet and then waiting for a
response to that packet. If the response is positive (indicating successful
transmission) the next packet is sent. If the response is negative, the packet
is resent.
The original XMODEM protocol transferred 128-byte data blocks with a 1-byte
checksum for error detection. The layout of the XMODEM packet is:
SOH (001)
Packet Number
1's Complement of Packet Number
Data (128 bytes)
Arithmetic checksum
All fields except the data field are one byte. The data field contains 128
bytes of either ASCII or binary data that is usually obtained from the file
being transferred.
Figure 1 shows a simplified progression of a typical transmission. The
receiver initiates the transfer by asking the sender to transmit the first
data packet. For the original XMODEM, this start character is a NAK (0x15).
Once a packet has been sent, the sender waits for the receiver to respond to
the packet. If the receiver responds with an ACK (0x06), indicating that the
packet was received successfully, the sender transmits the next packet. A NAK
indicates the receiver did not receive the packet successfully and sender
resends the packet.
After transferring all the packets, the sender signals end-of-transmission
with an EOT (0x04) and waits for an ACK. The transmission is complete once the
sender receives an ACK.


Receiver Time-Outs


The sender and receiver do not wait indefinitely for responses. The protocol
sets maximum time and retry figures for the different transmission states.
The receiver initiates the transfer by sending a start character. If no data
is received within 10 seconds, the start character is sent again. This process
is repeated 10 times before the receiver gives up.
While receiving the packet data, the receiver allows a one-second time-out for
each character. There is no retry count. If more than one second elapses
between received characters, the packet is NAKed.
After receiving a packet, the receiver sends the packet response. The receiver
then waits for the beginning of a packet or for an EOT. If it doesn't receive
characters within 10 seconds, the packet response is sent again. If the
receiver receives nothing after transmitting the response code 10 times, it
aborts the download.
There is no time-out or retry on the receiver for the final ACK.


Sender Time-Outs


Once started, the sender simply waits for a start character from the receiver.
If the sender doesn't receive a character or if it counts more than 10 invalid
start characters within 60 seconds, it aborts the transfer.
After transferring a packet, the sender waits for the response from the
receiver. The time-out for the response is 10 seconds. If no response is
received within that time, the packet is resent. After 10 retransmissions of
the same packet, the transfer will be aborted.
The final EOT has a time-out of 60 seconds. If no ACK is received within that
time, the sender aborts the transfer and returns an appropriate error.


Relaxed Time-Outs


Several implementations of XMODEM offer a feature called relaxed time-outs or
relaxed timing. Relaxed timing allows extra time between characters and
packets for large systems like CompuServe. I have excluded relaxed timing from
my implementation.


Graceful Abort


Original XMODEM does not provide any method for aborting a download in
progress. However, the XMODEM protocol now supports a graceful abort
enhancement. When the sender is waiting for a start character, an ACK or NAK
after packet transmission, or the final ACK, a CAN (0x18) character indicates
that the receiver has aborted the download.
Likewise, when the receiver is waiting for the first character of the packet,
reception of a CAN character indicates that the sender has aborted the
download.
Some implementations of XMODEM abort the download after receiving a single
CAN, while others abort after three consecutive CAN characters. My
implementation requires two consecutive CANs to abort. Because line noise can
some sometimes garble characters into a CAN, requiring multiple consecutive
CAN characters avoids aborting transfers due to line noise.

If you are trapped in an XMODEM transfer and wish to abort, hit CTRL-X several
times. CTRL-X transmits the CAN character and should abort the transfer.
If the receiver or sender encounters an unrecoverable error condition, XMODEM
should abort the transfer on both ends. My implementation sends eight CAN
characters followed by eight BACKSPACES (0x08) (so that the CANs are not
displayed as received data).


XMODEM-1K


XMODEM-1K, an enhanced XMODEM, transmits 1,024 bytes of data per packet rather
than the original 128 bytes. To indicate the larger packet size to the
receiver, the first character of the packet sent is an STX (0x02). The layout
of the XMODEM-1K packet is:
STX (0x02)
Packet Number
1's Complement of Packet Number
Data (1024 bytes)
Arithmetic checksum
XMODEM-1K may send data in either 128-byte packets or in the larger 1,024-byte
packets. This flexibility necessitates the STX in the 1K packets. Once a
packet has been sent as either 128-byte or 1,024-byte, it can't be resent in a
different size packet.
Other than the larger packet size and the ability to mix large and small
packets, XMODEM-1K does not differ from XMODEM.


XMODEM/CRC


XMODEM/CRC offers an increased error detection rate. XMODEM and XMODEM-1K use
an arithmetic checksum to detect transfer errors. This 1-byte checksum detects
95 percent of all errors and is generated by summing all the bytes in the data
field (excluding the SOH/STX, the packet number, and 1's complement packet
number) modulo 256. However, XMODEM/CRC uses a 16-bit CRC (Cyclic Redundancy
Check) instead of the checksum. This enhancement causes the error detection
rate to exceed 99.997 percent.
The new packet layouts are:
128-byte packet
SOH (0x01)
Packet Number
1's Complement of Packet Number
Data (128 bytes)
16-bit CRC
1024-byte packet
STX (0x02)
Packet Number
1's Complement of Packet Number
Data (1024 bytes)
16-bit CRC
The CRC is calculated by using the binary representation of the data stream as
the coefficients of a polynomial. The 128-byte packet would produce a
polynomial of degree 1,023 (128 x 8 bits). This polynomial is then divided by
the CRC polynomial using modulo-2 arithmetic, the remainder of which is the
CRC. For XMODEM, the CRC polynomial is the CCITT polynomial:
X16 + x12 + x5 + 1
The high-byte of the 16-bit CRC is sent first followed by the low-byte.
The receiver determines whether packets are sent using CRC or checksum to
detect errors. If the receiver wishes to transfer with CRC error detection, it
sends the start character C (0x43) with a three-second time-out and a retry
count of four. If the sender cannot transmit using CRC, it ignores the C and
continues waiting for a NAK. If sender does not respond to the C start
character, the receiver will send a NAK to attempt transfer using the
checksum.
The receiver should always attempt to transfer using CRC and then back down to
checksum if the CRC attempt fails.


My Implementation


I have implemented XMODEM, XMODEM/CRC, and XMODEM-1K transfer protocols.
Rather than developing separate routines to handle each combination of these,
I have written one routine for sending and one for receiving. The receive
routine attempts to transmit using CRC block checking but will fall back to
checksum if the CRC request fails. An argument passed to the send routine
specifies which data block size to use (128 or 1,024). See Listing 1 for the
header for the XMODEM routines.


Encapsulating XMODEM


I decided to make the transfer protocol oblivious to the serial I/O by
providing medium-level serial functions for the transfer routines. Pointers to
these functions are passed as arguments to the XMODEM transfer routines
simplifying adaptation to different hardware configurations and avoiding the
peculiarities of different UARTs, DUARTs, ACIAs, SIOs, etc.
To display transfer status information and manually abandon the transfer,
pointers to functions for these purposes are also passed to the transfer
routines.


Serial Receive Interface


I constructed two functions as an interface to the low-level serial routines.
The first returns a single character received from the serial port. It is
declared as:
int uart_receive (long ms_timeout,

unsigned int *error_flag);
The uart_receive function returns the ASCII code for the received character or
a value of RECEIVER_TIMEOUT if a character is not received within the time
specified in ms_timeout. RECEIVER_TIMEOUT may be defined as anything that is
not a valid ASCII character. I chose -1.
The ms_timeout argument contains the number of milliseconds to wait for a
character (if none has already been received). The constant MS_PER_SEC is
defined as 1000L to aid in conversions. If the time-out value is 0L and no
characters have been received, the uart_receive function should immediately
time-out.
The error_flag argument points to an unsigned integer that will be initialized
to indicate hardware errors detected, i.e., parity error, overrun. Several
constants are defined as different bits in the error flag. They are:
#define RE_OVERRUN_ 0x01
#define RE_PARITY_ 0x02
#define RE_FRAMING_ 0x04
You test these error conditions by ANDing the constant with the variable
containing the error flags. A TRUE test indicates that flag is set and the
error condition occurred. If you don't have access to these error conditions,
then the XMODEM routines need never know them. Simply set the error flag to 0.


Serial Transmit Interface


The second serial interface function transmits characters:
int uart_transmit (char xmit_char);
This function transmits the character value in the xmit_char argument. Two
return values are possible: TRANSMITTER_OK indicates there are no errors with
the transmitter, TRANSMITTER_OFFLINE indicates the receiving computer is no
longer connected. The offline condition can be detected by loss of the carrier
detect signal or some other hardware line. It may also be ignored entirely by
always returning TRANSMITTER_OK.
You can use these functions with polled or interrupt-driven serial routines.
Either way, the functions simplify the transfer protocol by keeping serial I/O
out of the way.


Status Display Interface


A function to display the transfer status is declared as
void display_xfer_stats (
long blocks_transferred,
long bytes_transferred,
const char *status_message);
The bytes_transferred argument contains the number of bytes that have been
successfully transferred. If it is -1L, the number of bytes displayed should
not be disturbed.
The blocks_transferred argument contains the total number of data blocks that
have been transferred. If blocks_transferred is -1L, the total data blocks
displayed should not be disturbed.
The status_message argument points to the status message to be displayed. It
may also be NULL if no status message is to be displayed.


Abort Transfer Interface


A function for testing an abort condition may be provided to the send and
receive routines. Its prototype is:
int check_abort (void);
The function returns a non-zero value if the user aborted the transfer
manually by pressing the ESC or CTRL-C keys.


Receiving


Listing 2 shows the XMODEM receive function. Its prototype is:
int xmodem_recv (
FILE *f,
int (*transmit) (char),
int (*receive) (long,
unsigned int *),
void (*dispstat) (long, long,
const char *),
int (*check_abort) (void));
The f argument points to a file that must already be opened for writing.
To transfer data, the receiver begins by requesting a packet from the sender.
The receiver sends a C in an attempt to transfer using CRC instead of checksum
for error detection. If it doesn't receive a response within three seconds, it
resends the C. This process is repeated four times. If the attempt to use CRCs
fails, the receiver reverts to checksum error detection and sends a NAK to
request the first data packet. If there is no response within 10 seconds, it
sends another NAK. This is repeated 10 times. If there is no response from the
sender after 10 NAKs the transmission is automatically aborted.
The only valid characters for the beginning of a packet are: SOH, for a
128-byte packet, STX, for a 1K packet, or two consecutive CANs for an aborted
transfer. All other characters are ignored.
SOH and STX signal the beginning of a packet. The next two bytes received are
the block number (modulo 256 and starting with one) and the 1's complement of
the block number respectively. These two bytes are compared, and if they don't
represent the same block number, the packet will be NAKed.
Following the block numbers are the data block and the CRC or checksum. The
data portion of the packet is collected and a checksum or CRC is generated.
After receiving the data block and the block check, the receiver compares the
generated checksum or CRC to the received checksum or CRC. If they are not the
same, the packet is NAKed.
Once the receiver successfully receives a packet, it compares the received
block number to the expected block and to the previously received block
number. If the received block number doesn't match either of these other
numbers, the packets have gotten out of sequence and the transmission must be
aborted since XMODEM does not resynchronize block numbers that are out of
sequence.
If the received block number matches the next expected block number, the block
data is written to the file and an ACK is sent.

If the received block number matches the previously received block number, the
packet is ACKed, but the data is not written since it was written when the
block was successfully received the first time. The received block number
might match the previously received block number for two reasons. First, if
the ACK for the previous block was converted into a NAK because of line noise,
the sender would resend the last packet. Second, some senders may assume that
anything that is not an ACK must be a NAK, and would resend the last packet if
the ACK was garbled.
After successfully receiving a packet, the receiver must wait for the start of
the next packet or an EOT. For the sender to know that the file was
successfully received, the EOT must be ACKed.


Sending


Listing 3 shows the XMODEM send function. The prototype for the XMODEM send
function is:
int xmodem_send (
FILE *f,
int (*ransmit) (char),
int (*receive) (long,
unsigned int *),
void (*dispstat) (long, long,
const char *),
int (*check_abort) (void));
The f argument points to the file to be transferred. This file must already be
open for reading.
The sender is much simpler than the receiver. It begins by waiting for a start
character. Valid characters are: C, to indicate CRC block checking; NAK, to
indicate checksum block checking; or two consecutive CANs, to indicate that
the transfer has been aborted. If the sender receives no character within 60
seconds, or if it receives 10 invalid characters, the transfer is aborted.
After receiving a valid start character, the sender prepares and transmits the
first packet, including the block start character (SOH or STX depending on the
data size), the block number, the one's complement block number, the data
block, and the checksum or CRC.
The sender waits for a response from the receiver. Valid responses are: ACK,
indicating that the packet was successfully received; NAK, indicating the
packet was not successfully received; or two consecutive CANs, to cancel the
transfer. All other characters are invalid and are ignored. If the sender
doesn't receive a character within 10 seconds (or receives a NAK), it
retransmits the packet. If it receives an ACK, the sender prepares and sends
the next packet, and the cycle is repeated. After the last packet is sent and
ACKed, the sender sends an EOT to indicate end of transfer. The receiver must
ACK to EOT for the sender to know that the transfer completed successfully.


XMODEM Problems


XMODEM presents several problems:
The original file size is not preserved in the transmission.
The modification time and date stamp, as well as the original filename, are
lost.
The last packet is padded with CP/M EOF (0x1A) characters, which can cause
problems under operating systems where file size and check information are
embedded in the file header. I believe this is the case with the Commodore
Amiga.
Only one file can be sent at a time.
The YMODEM protocol makes several enhancements to XMODEM to solve these
problems.


Does It Work?


I have successfully used my XMODEM routines to transfer files to and from
local BBSs with great success.
Different implementations of XMODEM handle different events. The length of
time-outs varies widely. Some implementations will not fall back to checksum
error detection if the sender doesn't recognize the CRC start character. Error
situations seem to be handled differently as well. I handled these events
following whatever documentation I could find.
After implementation, I discovered a problem with interrupt-driven, buffered
serial I/O, which involved the sender's packet response time-out. After
transmitting a packet, the sender immediately begins waiting for the response.
While this situation works fine with polled I/O, interrupt-driven serial
routines buffer the data to send. When the sender starts waiting for the
response, very little of the packet may have been sent. At 300 baud, it takes
35 seconds to transfer a 1K packet, and I'd bet that even a 4.77 MHz PC
doesn't need 35 seconds to dump 1,000 bytes into a buffer. Even at 1,200 baud,
the time-out is cut pretty close. It is left as an exercise for the reader to
determine the best method of solving this problem. Here are some suggestions:
Ignore the problem, buy a faster modem (9,600 baud or so), and be done with
it.
Pad the time-out values to account for the baud rate.
Modify the transmit routine to wait until the transmit buffer is empty before
returning.
Devise a method of checking the number of characters in the transmit buffer.


Documentation


The best documentation for XMODEM and its derivatives is online. I located a
half dozen document files describing the XMODEM protocol. Some were just an
overview while others were quite in-depth. Interestingly enough, the texts I
own which describe XMODEM don't do nearly as well as the public domain
document files.


Conclusion


XMODEM and its extensions are easy to implement. If you need to implement file
transfer capability in an application, XMODEM is a good choice if the volume
of data to transmit is small or if you are using a slow modem. If you are
using a high-speed modem or are transferring large volumes of data, other
transfer protocols like YMODEM-G and ZMODEM are more efficient, but a little
more difficult to implement.
Figure 1 XMODEM File Transfer
SENDER RECEIVER
---------------------------------------
 Start Character (NAK or 'C')

Packet 1
 ACK (Packet Received OK)
Packet 2
 NAK (Packet NOT OK)
Packet 2
 ACK (Packet OK)
 ù
 ù
 ù
Packet n
 ACK
EOT
 ACK
--------------------------------------

Listing 1
/*--------------------------------------------
XMODEM.H

Author Date Description
-------------------------------------
Jon Ward 22 Apr 90 Initial Revision.
--------------------------------------------*/

#ifndef XMODEM_H
#define XMODEM_H

/*--------------------------------------------
Xmodem Transfer Errors
--------------------------------------------*/
enum xmodem_errors 
{
XERR_OK, /* No errors */

XERR_XMIT_FUNC, /* NULL Transmit function pointer */
XERR_RCVR_FUNC, /* NULL Receive function pointer */

XERR_RCVR_CANCEL, /* Receiver cancelled the transfer */
XERR_SEND_CANCEL, /* Sender cancelled the transfer */
XERR_USER CANCEL, /* User cancelled the transfer */

XERR_FILE_READ, /* Error reading the file */
XERR_FILE_WRITE, /* Error writing the file */

XERR_ACK_TIMEOUT, /* Timed out waiting for data pack ACK */
XERR_NAK_TIMEOUT, /* Timed out waiting for initial NAK */
XERR_SOH_TIMEOUT, /* Timed out waiting for SOH */
XERR_DATA_TIMEOUT, /* Timed out waiting for data */
XERR_LAST_ACK_TIMEOUT,/* Timed out waiting for final ACK */

XERR_INVALID_SOH, /* Invalid char waiting for SOH */
XERR_INVALID_BLOCK_NUM,/* Block mismatch in packet header */
XERR_INVALID_CRC, /* CRC is incorrect */
XERR_INVALID_CHECKSUM,/* Checksum is incorrect */
XERR_BLOCK_SEQUENCE, /* Block out of sequence */

XERR_CHAR_ERROR, /* Received character error */
XERR_OFFLINE, /* Modem not online */


XERR_BLOCK_NAK, /* NAK received */
};

#ifdef XMODEM_LIB
/*------------------------------------------------
Number of attempts and timeout for each attempt
to transferr using CRC.
------------------------------------------------*/
#define CRC_RETRY_COUNT 4
#define CRC_TIMEOUT (3L * MS_PER_SEC)

/*------------------------------------------------
Number of attempts and timeout for each attempt
to Negative Acknowledge a bad packet. This also
includes the initial NAK to get the thing started.
------------------------------------------------*/
#define NAK_RETRY_COUNT 10
#define NAK_TIMEOUT (10L * MS_PER_SEC)

/*------------------------------------------------
Number of attempts and timeout for each attempt
to Acknowledge a good packet.
------------------------------------------------*/
#define ACK_RETRY_COUNT 10
#define ACK_TIMEOUT (10L * MS_PER_SEC)

/*------------------------------------------------
Number of consecutive CANs and the timeout between
them.
------------------------------------------------*/
#define CAN_COUNT_ABORT 2

#define CAN_TIMEOUT (2L * MS_PER_SEC)

/*------------------------------------------------
Timeout between packet data bytes.
------------------------------------------------*/
#define DATA_TIMEOUT (1L * MS_PER_SEC)

/*------------------------------------------------
Number of false start characters that the sender
can receive before aborting the transmission.
------------------------------------------------*/
#define START_XMIT_RETRY_COUNT 10
#define START_XMIT_TIMEOUT (60L * MS_PER_SEC)

/*------------------------------------------------
Maximum number of attempts to send a single block.
------------------------------------------------*/
#define BLOCK_RETRY_COUNT 10

/*------------------------------------------------
Timeout for block responses.
------------------------------------------------*/
#define BLOCK_RESP_TIMEOUT (10L * MS_PER_SEC)

/*------------------------------------------------
Macro to purge all characters from the receive
buffer. This waits until there are no characters

received for a 1 second period.
------------------------------------------------*/
#define PURGE_RECEIVER(c) \
while ((*(c)) (MS_PER_SEC, NULL) != RECV_TIMEOUT)

#endif /* XMODEM_LIB */

/*------------------------------------------------
Number of milliseconds in a second. I know this
is silly, but it makes the meaning a little more
obvious.
------------------------------------------------*/
#define MS_PER_SEC 1000L

/*------------------------------------------------
Block sizes for normal and for 1K XMODEM
transfers.
------------------------------------------------*/
#define XMODEM_BLOCK_SIZE 128
#define XMODEM_1K_BLOCK_SIZE 1024

/*------------------------------------------------
Values returned by the receive character serial
interface function.
------------------------------------------------*/
#define RECV_TIMEOUT -1 /* receiver timed-out */

/*------------------------------------------------
Error bits stored in the receiv character error
flag.
------------------------------------------------*/
#define RE_OVERRUN_ 0x01 /* receiver overrun error */
#define RE_PARITY_ 0x02 /* receiver parity error */
#define RE_FRAMING_ 0x04 /* receiver framing error */

/*------------------------------------------------
Values returned by the transmit character serial
interface function.
------------------------------------------------*/
#define XMIT_OK 0 /* character enqued for transmission */
#define XMIT_OFFLINE -1 /* modem offline */

/*-----------------------------------------------
ASCII values used by XMODEM
-----------------------------------------------*/
#define SOH 1
#define STX 2
#define EOT 4
#define ACK 6
#define BACKSPACE 8
#define NAK 21
#define CAN 24

#define CPMEOF 26

/*-----------------------------------------------
XMODEM block description structure. This is only
used internally by the send and receive routines
and should not be visible outside of these.

-----------------------------------------------*/
#ifdef XMODEM_LIB
struct xmodem_block_st
{
long total_block_count; /* total blocks transferred */
long total_byte_count; /* total bytes transferred */

unsigned char start_char; /* block starting character */
unsigned char block_num; /* transmission block number */
unsigned char not_block_num; /* one's complement block number */
char buffer [XMODEM_1K_BLOCK_SIZE + 1]; /* data buffer */
int buflen; /* buffer length (128 or 1024) */
unsigned char checksum; /* data checksum */
unsigned int crc; /* data CRC-16 */

unsigned int crc_used: 1; /* 0=Checksum 1=CRC-16 */
};

typedef struct xmodem_block_st xblock;
#endif /* XMODEM_LIB */

/*-----------------------------------------------
XMODEM function pointer structure.
-----------------------------------------------*/
#ifdef XMODEM_LIB
struct xmodem_func_st
{
int (*transmit) ( /* xmit function */
char);

int (*receive) ( /* recv function */
long,
unsigned int *);

void (*dispstat) ( /* display function */
long,
long,
const char *);

int (*check_abort) (void); /* manual abort function */
};

typedef struct xmodem_func_st xfunc;
#endif /* XMODEM_LIB */


/*-----------------------------------------------
XMODEMR.C
-----------------------------------------------*/
int xmodem_recv (
FILE *f, /* file to write to */
int (*transmit) (char), /* xmit function */
int (*receive) (long, unsigned int *), /* recv function */
void (*dispstat) (long, long, const char *), /* display function */
int (*check_abort) (void)); /* manual abort function */

#ifdef XMODEM_LIB
int xm_perror (
int error, /* error number */

xfunc *xmf); /* xmodem external functions */

void xm_no_disp_func (
long a,
long b,
const char *buf);

int xm_no_abort_func (void);
#endif /* XMODEM_LIB */




/*-----------------------------------------------
XMODEMS.C
-----------------------------------------------*/
int xmodem_send (
int block_size, /* maximum block size */
FILE *f, /* file to write to */
int (*transmit) (char), /* xmit function */
int (*receive) (long, unsigned int *), /* recv function */
void (*dispstat) (long, long, const char *), /* display function */
int (*check_abort) (void)); /* manual abort function */

#ifdef XMODEM_LIB
unsigned int xm_update_CRC (
unsigned int crc, /* current CRC */
unsigned int c); /* character to add to CRC */

void xm_send_cancel (
int (*transmit) (char)); /* transmit function */
#endif /* XMODEM_LIB */

#endif /* XMODEM_H */

Listing 2
/*-----------------------------------------------
XMODEMR.C

Author Date Description
----------------------------------------*/
Jon Ward 22 Apr 90 Initial Revision.
Jon Ward 23 Apr 90 Cleanup and modify for
XMODEM-1K and XMODEM-CRC.
Jon Ward 26 Apr 90 Corrected implementation
of XMODEM-CRC.
Jon Ward 7 Jun 90 Added more comments and a
little cleanup.
-----------------------------------------------*/

#define XMODEM_LIB 1

#include <stdio.h>
#include "xmodem.h"


#define STATIC static /* undef for debugging */

/*-----------------------------------------------

-----------------------------------------------*/
struct send_n_wait_st
{
char_char_to send;
int retry_count;
long ms_timeout;
unsigned char *valid_responses;
int num_valid_responses;
};

STATIC unsigned char soh_stx_can [] =
{ SOH, STX, CAN };
STATIC unsigned char soh_stx_can_eot [] =
{ SOH, STX, CAN, EOT };

STATIC struct send_n_wait_st crc_req =
{
'C',
CRC_RETRY_COUNT,
CRC_TIMEOUT,
soh_stx_can,
sizeof (soh_stx_can)
};

STATIC struct send_n_wait_st checksum_req =
{
NAK,
NAK_RETRY_COUNT,
NAK_TIMEOUT,
soh_stx_can,
sizeof (soh_stx_can)
};

STATIC struct send_n_wait_st pack_nak =
{
NAK,
NAK_RETRY_COUNT,
NAK_TIMEOUT,
soh_stx_can_eot,
sizeof (soh_stx_can_eot)
};

STATIC struct send_n_wait_st pack_ack =
{
ACK,
ACK_RETRY_COUNT,
ACK_TIMEOUT,
soh_stx_can_eot,
sizeof (soh_stx_can_eot)
};

/*-----------------------------------------------
Error messages for error enums.
-----------------------------------------------*/
STATIC char *xmodem_errors [] =
{
"Transmission Successful",
"NULL Transmit function pointer",
"NULL Receive function pointer",

"Receiver cancelled the transfer",
"Sender cancelled the transfer",
"User cancelled the transfer",
"Error reading the flle",
"Error writing the file",
"Timed out waiting for data pack ACK",
"Timed out waiting for initial NAK",
"Timed out waiting for SOH",
"Timed out waiting for data",
"Timed out waiting for final ACK",
"Invalid char waiting for SOH",
"Block mismatch in packet header",
"CRC is incorrect",
"Checksum is incorrect",
"Block out of sequence",
"Received character error",
"Modem is not online",
};

/*-----------------------------------------------
Local Function Prototypes
-----------------------------------------------*/
STATIC int xm_send_n_wait (
const struct send_n_wait_st *req, /* request structure */
unsigned char *response, /* response from sender */
xfunc *xmf); /* xmodem external functions */

STATIC int xm_block_start (
xblock *xb, /* xmodem block data */
unsigned char block_start, /* block start char from sender */
xfunc *xmf); /* xmodem external functions */

STATIC int xm_recv_block (
xblock *xb, /* xmodem block data */
register xfunc *xmf); /* xmodem external functions */

/*-----------------------------------------------
This function receives a file transferred via
XMODEM, XMODEM-1K or XMODEM/CRC. The f argument
represents the file to receive that has been
opened for writing.
-----------------------------------------------*/
int xmodem_recv (
FILE *f, /* file to write to */
int (*transmit) (char), /* xmit function */
int (*receive) (long, unsigned int *), /* recv function */
void (*dispstat) (long, long, const char *), /* display function */
int (*check_abort) (void)) /* manual abort function */
{
register int error; /* gen purpose error var */
unsigned char start_char; /* first char of block */
unsigned char next_block; /* next block we expect */
unsigned char last_block; /* last successful block */
xblock xb; /* xmodem block data */
xfunc xmfuncs; /* xmodem external functions */

/*-----------------------------------------------
Initialize the function pointer structure.
-----------------------------------------------*/

if ((xmfuncs.dispstat = dispstat) == NULL)
xmfuncs.dispstat = xm_no_disp_func;

if ((xmfuncs.check_abort = check_abort) == NULL)
xmfuncs.check_abort = xm_no_abort_func;

if ((xmfuncs.transmit = transmit) == NULL)
return (xm_perror (XERR_XMIT_FUNC, &xmfuncs));

if ((xmfuncs.receive = receive) == NULL)
return (xm_perror (XERR_RCVR_FUNC, &xmfuncs));

/*-----------------------------------------------
Initialize data for the first block and purge
all data from the receive buffer. Init the
number of bytes and blocks received and display
some useful info.
-----------------------------------------------*/
next_block = last_block = 1;

xb.total_block_count = 0L;
xb.total_byte_count = 0L;

(*xmfuncs.dispstat) (0L, 0L, "");

PURGE_RECEIVER(receive);

/*-----------------------------------------------
Attempt to transfer using CRC-16 error detection.
This involves sending the CRC begin character:
'C'.
-----------------------------------------------*/
xb.crc_used = 1;
error = xm_send_n_wait (&crc_req,
&start_char,
&xmfuncs);

/*-----------------------------------------------
If the sender did not respond to the CRC-16
transfer request, then attempt to transfer using
checksum error detection.
-----------------------------------------------*/
if (error == XERR_SOH_TIMEOUT)
{
xb.crc_used = 0;
error = xm_send_n_wait (&checksum_req,
&start_char,
&xmfuncs);
}

/*------------------------------------------------
If begin transfer request failed, return error.
------------------------------------------------*/
if (error != XERR_OK)
return (error);

/*------------------------------------------------
If the starting character of the next block is
an EOT, then we have completed transferring the

file and we exit this loop. Otherwise, we init
the xmodem packet structure based on the first
character of the packet.
------------------------------------------------*/
while (start_char != EOT)
{
register int good_block; /* NZ if packet was OK */

error = xm_block_start (&xb,
start_char,
&xmfuncs);

if (error != XERR_OK)
return (error);

good_block = -1; /* assume packet will be OK */

/*------------------------------------------------
Receive the packet. If there was an error, then
NAK it. Otherwise, the packet was received OK.
------------------------------------------------*/
if (xm_recv_block (&xb, &xmfuncs) != XERR_OK)
{
good_block = 0; /* bad block */
}

/*------------------------------------------------
If this is the next expected packet, then append
it to the file and update the last and next
packet vars.
------------------------------------------------*/
else if (xb.block_num == next_block)
{
int bytes_written; /* bytes written for this block */
last_block = next_block;
next_block = (next_block + 1) % 256;

bytes_written = fwrite (xb.buffer, 1, xb.buflen, f);

xb.total_block_count++;
xb.total_byte_count += bytes_written;

(*xmfuncs.dispstat) (xb.total_block_count,
xb.total_byte_count,
NULL);

if (bytes_written != xb.buflen)
{
xm_send_cancel (transmit);
return (xm_error (XERR_FILE_WRITE, &xmfuncs));
}
}

/*------------------------------------------------
If this is the previous packet, then the sender
did not receive our ACK to that packet and
resent it. This is OK. Just ACK the packet.

If the block number for this packet is completely

out of sequence, cancel the transmission and
return an error.
------------------------------------------------*/
else if (xb.block_num != last_block)
{
xm_send_cancel (transmit);
return (xm_perror (XERR_BLOCK_SEQUENCE, &xmfuncs));
}

/*------------------------------------------------
Here, good_block is non-zero if the block was
received and processed with no problems. If it
was a good block, then we send an ACK. A NAK is
sent for bad blocks.
------------------------------------------------*/
if (good_block)
{
error = xm_send_n_wait (&pack_ack,
&start_char,
&xmfuncs);
}
else
{
PURGE_RECEIVER(receive);
error = xm_send_n_wait (&pack_nak,
&start_char,
&xmfuncs);
}

if (error != XERR_OK)
return (error);
}

/*------------------------------------------------
The whole file has been received, so attempt to
send an ACK to the final EOT.
------------------------------------------------*/
if ((*transmit) (ACK) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, &xmfuncs));

return (xm_perror (XERR_OK, &xmfuncs));
}

/*------------------------------------------------
Dummy function used in case caller did not supply
a display function.
------------------------------------------------*/
void xm_no_disp_func (
long a,
long b,
const char *buf)
{
a = a;
b = b;
buf = buf; /* avoid compiler warnings */
}

/*------------------------------------------------
Dummy function used in case caller did not supply

a used abort function.
------------------------------------------------*/
int xm_no_abort_func (void)
{
return (0);
}

/*------------------------------------------------
This function transmits a character and waits for
a response. The req argument points to a
structure containing info about the char to
transmit, retry count, timeout, etc. The
response argument points to a storage place for
the received character. The xmf argument points
to a structure of caller supplied functions.

Any errors encountered are returned.
------------------------------------------------*/
STATIC int xm_send_n_wait (
const struct send_n_wait_st *req, /*request structure */
unsigned char *response, /* response from sender */
register xfunc *xmf) /* xmodem external functions */
{
int j;
unsigned int rerr;

for (j = 0; j < req->retry_count; j++)
{
register int rcvd_char;
register int i;

/*------------------------------------------------
Check to see if the user aborted the transfer.
------------------------------------------------*/
if ((*xmf->check_abort) ( ) != 0)
{
xm_send_cancel (xmf->transmit);
return (xm_perror (XERR_USER_CANCEL, xmf));
}

/*------------------------------------------------
Transmit the block response (or block start)
character.
------------------------------------------------*/
if ((*xmf->transmit) (req->char_to_send) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, xmf));

/*------------------------------------------------
Wait for a response. If there isn't one or if a
parity or similar error occurred, continue with
next iteration of the retry loop.
------------------------------------------------*/
rcvd_char = (*xmf->receive) (req->ms_timeout, &rerr);
if (rcvd_char == RECV_TIMEOUT)
continue;

if (rerr != 0)
return (xm_perror (XERR_ CHAR_ERROR, xmf));


/*------------------------------------------------
Initialize the response and check to see if it is
valid.
------------------------------------------------*/
if (response != NULL)
*response = (unsigned char) rcvd_char;

for (i = 0; i < req->num_valid_responses; i++)
if (rcvd_char == req->valid_responses [i])
return (XERR_OK);
}

return (xm_perror (XERR_SOH_TIMEOUT, xmf));
}

/*------------------------------------------------
This function analyzes valid block start
characters to determine block size or in the case
of CAN whether to abort the transmission.

Any errors encountered are returned.
------------------------------------------------*/
STATIC int xm_block_start (
register xblock *xb, /* xmodem block data */
unsigned char block_start, /* block start char from sender */
register xfunc *xmf) /* xmodem external functions */
{
switch (block_start)
{
case SOH: /* NORMAL 128-byte block */
xb->buflen = XMODEM_BLOCK_SIZE;
return (XERR_OK);

case STX: /* 1024-byte block */
xb->buflen = XMODEM_1K_BLOCK_SIZE;
return (XERR_OK);

case CAN: /* Abort signal */
if ((*xmf->receive) (CAN_TIMEOUT, NULL) == CAN)
{
xm_send_cancel (xmf->transmit);
return (xm_perror (XERR_SEND_CANCEL, xmf));
}
break;
}

return (xm_perror (XERR_INVALID_SOH, xmf));
}

/*------------------------------------------------

This function receives the block numbers, block
data, and block checksum or CRC. The received
data is stored in the structure pointed to by the
xb argument. The block numbers are compared, and
the checksum or CRC is verified.

Any errors encountered are returned.
------------------------------------------------*/

STATIC int xm_recv_block (
xblock *xb, /* xmodem block data */
register xfunc *xmf) /* xmodem external functions */
{
register int i;
unsigned int rerr; /* receive error */

/*------------------------------------------------
Attempt to receive the block number. If the
receiver timesout or if there is a receive error,
ignore the rest of the packet.
------------------------------------------------*/
if ((i = (*xmf->receive) (DATA_TIMEOUT, &rerr)) == RECV_TIMEOUT)
return (xm_perror (XERR_DATA_TIMEOUT, xmf));

if (rerr != 0)
return (xm_perror (XERR_CHAR_ERROR, xmf));

xb->block_num = (unsigned char) i;



/*------------------------------------------------
Attempt to receive the one's complement of the
block number.
------------------------------------------------*/
if ((i = (*xmf->receive) (DATA_TIMEOUT, &rerr)) == RECV_TIMEOUT)
return (xm_perror (XERR_DATA_TIMEOUT, xmf));

if (rerr != 0)
return (xm_perror (XERR_CHAR_ERROR, xmf));

xb->not_block_num = (unsigned char) i;

*/------------------------------------------------
Make sure that the block number and one's
complemented block number agree.
------------------------------------------------*/
if ((255 - xb->block_num) != xb->not_block_num)
return (xm_perror (XERR_INVALID_BLOCK_NUM, xmf));

/*------------------------------------------------
Clear the CRC and checksum accumulators and
receive the data block.
------------------------------------------------*/
xb->crc = 0;
xb->checksum = 0;

for (i = 0; i < xb->buflen; i++)
{
int rcvd_char;

if ((rcvd_char = (*xmf->receive) (DATA_TIMEOUT, &rerr)) ==
RECV_TIMEOUT)
return (xm_perror (XERR_DATA_TIMEOUT, xmf));

if (rerr != 0)
return (xm_perror (XERR_CHAR_ERROR, xmf));


xb->buffer [i] = (unsigned char) rcvd_char;

if (xb->crc_used != 0)
xb->crc = xm_update_CRC (xb->crc, xb->buffer [i]);
else
xb->checksum += xb->buffer [i];


}

/*------------------------------------------------
Validate the CRC.
------------------------------------------------*/
if (xb->crc_used)
{
if ((i = (*xmf->receive) (DATA_TIMEOUT, &rerr)) == RECV_TIMEOUT)
return (xm_perror (XERR_DATA_TIMEOUT, xmf));

if (rerr != 0)
return (xm_perror (XERR_CHAR_ERROR, xmf));

if ((unsigned char) i != (unsigned char ) (xb->crc >> 8))
return (xm_perror (XERR_INVALID_CRC, xmf));

if ((i = (*xmf->receive) (DATA_TIMEOUT, &rerr)) == RECV_TIMEOUT)
return (xm_perror (XERR_DATA_TIMEOUT, xmf));

if (rerr != 0)
return (xm_perror (XERR_CHAR_ERROR, xmf));

if ((unsigned char) i != (unsigned char) (xb->crc & 0xFF))
return (xm_perror (XERR_INVALID_CRC, xmf));
}


/*------------------------------------------------
Validate the checksum.
------------------------------------------------*/
else
{
if ((i = (*xmf->receive) (DATA_TIMEOUT, &rerr)) == RECV_TIMEOUT)
return (xm_perror (XERR_DATA_TIMEOUT, xmf));

if (rerr != 0)
return (xm_perror (XERR_CHAR_ERROR, xmf));

if ((unsigned char) i != xb->checksum)
return (xm_perror (XERR_INVALID_CHECKSUM, xmf));
}

return (XERR_OK);
}

/*------------------------------------------------
This function prints an XMODEM status message
using the caller supplied display function. The
error argument is returned.
------------------------------------------------*/
int xm_perror (

int error, /* error number */
xfunc *xmf) /* xmodem external functions */
{
xmf->dispstat (-1L, -1L, xmodem_errors [error]);
return (error);
}

Listing 3
/* ----------------------------------------------
XMODEMS.C

Author Date Description
-------------------------------------------
Jon Ward 22 Apr 90 Initial Revision.
Jon Ward 23 Apr 90 Cleanup and modify for
XMODEM-1K and XMODEM-CRC.
Jon Ward 26 Apr 90 Corrected implementation of
XMODEM-CRC.
-----------------------------------------------*/
#define XMODEM_LIB 1

#include <stdio.h>
#include "xmodem.h"

#define STATIC static /* undef for debugging */


/*------------------------------------------------
Local Function Prototypes
------------------------------------------------*/
STATIC int xm_send file (
FILE *f, /* file pointer */
xblock *xb, /* pointer to block data */
xfunc *xmf); /* xmodem external functions */

STATIC int xm_send_block (
xblock *xb, /* pointer to block data */
xfunc *xmf); /* xmodem external functions */



/*------------------------------------------------
This function sends a file via XMODEM, XMODEM-1K
or XMODEM/CRC. The f argument represents the
file to send that has been opened for reading.
------------------------------------------------*/
int xmodem_send (
int block_size, /* maximum block size */
FILE *f, /* file to write to */
int (*transmit) (char), /* xmit function */
int (*receive) (long, unsigned int *), /* recv function */
void (*dispstat) (long, long, const char *), /* display function */
int (*check_abort) (void)) /* manual abort function */
{
xblock xb /* block data */
xfunc xmfuncs; /* xmodem external functions */
register int error_count; /* counter for errors */
register int can_count; /* cancel counter */
int error; /* gen error var */

unsigned int rerr; /* received char error */


/*-------------------------------------------------
Initialize the function pointer structure.
-------------------------------------------------*/
if ((xmfuncs.dispstat = dispstat) == NULL) 
xmfuncs.dispstat = xm_no_disp_func;

if ((xmfuncs.check_abort = check_abort) == NULL)
xmfuncs.check_abort = xm_no_abort_func;

if ((xmfuncs.transmit = transmit) == NULL)
return (xm_perror (XERR_XMIT_FUNC, &xmfuncs));

if ((xmfuncs.receive = receive) == NULL)
return (xm_perror (XERR_RCVR_FUNC, &xmfuncs));


/*-------------------------------------------------
-------------------------------------------------*/
xb.total_block_count = 0L;
xb.total_byte_count = 0L;

(*xmfuncs.dispstat) (0L, 0L, "");

PURGE_RECEIVER(receive);

/*-------------------------------------------------
Purge all data from the receive buffer. Then we
wait for 1 full minute. If we receive no
characters during that time, then we return
indicating the file was not transferred. If we
receive a NAK, then we begin transmission. If
we receive an excessive number of garbage
characters, then we return indicating no file
transfer.
-------------------------------------------------*/
for (error_count : 0, can_count = 0; 1; )
{
int rcvd_char;

if (error_count >= START_XMIT_RETRY_COUNT)
return (xm_perror (XERR_NAK_TIMEOUT, &xmfuncs));

/* try for 1 minute */
rcvd_char = (*xmfuncs.receive) (START_XMIT_TIMEOUT, &rerr);

if (rerr != 0)
{
xm_send_cancel (xmfuncs.transmit);
return (xm_perror (XERR_CHAR_ERROR, &xmfuncs));
}


switch (rcvd_char)
{
case RECV_TIMEOUT:
xm_send_cancel (xmfuncs.transmit);

return (xm_perror (XERR_NAK_TIMEOUT, &xmfuncs));

case 'C':
xb.crc_used = 1;
break;

case NAK:
xb.crc_used = 0;
break;

case CAN:
if (++can_count >= CAN_COUNT_ABORT)
{
xm_send_cancel (xmfuncs.transmit);
return (xm_perror (XERR_RCVR_CANCEL, &xmfuncs));
}
continue;

default:
can_count = 0;
error_count++;
continue;
}

break;
}



/*------------------------------------------------
Setup the block size and the start of block
character and send the file.
------------------------------------------------*/
if (block_size == XMODEM_1K_BLOCK_SIZE)
{
xb.buflen = XMODEM_1K_BLOCK_SIZE;
xb.start_char = STX;
}
else
{
xb.buflen = XMODEM_BLOCK_SIZE;
xb.start_char = SOH;
}

if ((error = xm_send_file (f, &xb, &xmfuncs)) != XERR_OK)
return (error);

/*-------------------------------------------------
Now, we send an EOT to the receiver and we
expect to get an ACK within 1 minute. If we
don't then we timeout and report an error. If
we get any character other than an ACK, we
retransmit the EOT. If we get an ACK, then we
exit with hopeful success.
-------------------------------------------------*/
while (1)
{
if ((*xmfuncs.transmit) (EOT) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, &xmfuncs));


switch ((*receive) (START_XMIT_TIMEOUT, &rerr))
{
case RECV_TIMEOUT:
return (xm_perror (XERR_LAST_ACK_TIMEOUT, &xmfuncs));

case ACK:
if (rerr != 0)
continue;

break;

default:
continue;
}

break;
}

return (xm_perror (XERR_OK, &xmfuncs));
}


/*------------------------------------------------
This function transmits the file associated with
the file pointer f according to the requirements
of the XMODEM transfer protocol.
------------------------------------------------*/
STATIC int xm_send_file (
FILE *f, /* file pointer */
register xblock *xb, /* pointer to block data */
register xfunc *xmf) /* xmodem external functions */
{
int done;

/*------------------------------------------------
Repeat until we have sent the whole file.
------------------------------------------------*/
for (done = 0, xb->block_num = 1; !done; xb->block_num++)
{
size_t bytes_read;
int error;



/*------------------------------------------------
Read a block of data. If it was not a full
block, then check for a file error. If it was a
file error, cancel the transfer and return an
error. Otherwise, pad the remainder of the
block with CPM EOFs.
------------------------------------------------*/
bytes_read = fread (xb->buffer, 1, xb->buflen, f);

if (bytes_read != xb->buflen)
{

register int i;


if (ferror (f))
{
xm_send_cancel (xmf->transmit);
return (xm_perror (XERR_FILE_READ, xmf));
}

for (i = bytes_read; i < xb->buflen; i++)
xb->buffer [i] = CPMEOF;

done = -1;
}

/*-------------------------------------------------
Calculate the one's complement block number and
send the block.
-------------------------------------------------*/
xb->not_block_num = (unsigned char) (255 - xb->block_num);

if ((error = xm_send_block (xb, xmf)) != XERR_OK)
return (error);
}

return (XERR_OK);
}


/*-------------------------------------------------
This function transmits the block described by the
structure pointed to by the xb argument. The
transmit and receive arguments point to functions
that transmit and receive data to and from the
modem. A value of 0 is returned to indicate that
the block was successfully sent. A non-zero
return value indicates an offline condition.
-------------------------------------------------*/
STATIC int xm_send_block (
register xblock *xb, /* pointer to block data */
register xfunc *xmf) /* xmodem external functions */
{
int error_count;

error_count = 0;

while (1)
{
int i;
int can_count; /* number of CANS received */
int block_resp; /* response to block */
unsigned rerr; /* received char error */

/*------------------------------------------------
If there were too many errors, then exit with
error.
------------------------------------------------*/
if (error_count > BLOCK_RETRY_COUNT)
{
xm_send_cancel (xmf->transmit);
return (xm_perror (XERR_ACK_TIMEOUT, xmf));
}


/*------------------------------------------------
Check for user abort and process it if
applicable.
------------------------------------------------*/
if ((*xmf->check_abort) () != 0)
{
xm_send_cancel (xmf->transmit);
return (xm_perror (XERR_USER_CANCEL, xmf));
}


/*------------------------------------------------
Transmit the block start character (it's
different depending on whether we're sending
1024-BYTE or 128-BYTE packets.

Send the block number.

Send the one's complement of the block number.
------------------------------------------------*/
if ((*xmf->transmit) (xb->start_char) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, xmf));

if ((*xmf->transmit) (xb->block_num) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, xmf));

if ((*xmf->transmit) (xb->not_block_num) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, xmf));

/*------------------------------------------------
Clear the CRC and checksum and send the data
block while building the CRC or checksum.
------------------------------------------------*/
xb->crc = 0;
xb->checksum = 0;

for (i = 0; i < xb->buflen; i++)
{
if ((*xmf->transmit) (xb->buffer [i]) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, xmf));

if (xb->crc_used != 0)
xb->crc = xm_update_CRC (xb->crc, xb->buffer [i]);
else
xb->checksum += xb->buffer [i];
}




/*------------------------------------------------
Send the CRC or checksum. If we send the CRC,
we must send the High BYTE first.
------------------------------------------------*/
if (xb->crc_used == 0)
{
if ((*xmf->transmit) (xb->checksum) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, xmf));

}
else
{
if ((*xmf->transmit) ((unsigned char) (xb->crc >> 8)) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, xmf));

if ((*xmf->transmit) ((unsigned char) (xb->crc & 0xFF)) != XMIT_OK)
return (xm_perror (XERR_OFFLINE, xmf));
}




/*------------------------------------------------
Wait for the receiver to respond with an ACK or
a NAK. If we timeout waiting, then return an
error code. If we get a NAK, retransmit the
block. If we gat an ACK, then return with
status OK. If we receive 2 consecutive CANs,
then we abort the transmission.
------------------------------------------------*/
can_count = 0;

GET_BLOCK_RESPONSE:
block_resp = (*xmf->receive) (BLOCK_RESP_TIMEOUT, &rerr);

if (rerr != 0)
{
xm_perror (XERR_CHAR_ERROR, xmf);
goto GET_BLOCK_RESPONSE;
}
switch (block_resp)
{
case ACK:
error_count = 0;

xb->total_block_count++;
xb->total_byte_count += xb->buflen;

(*xmf->dispstat) (xb->total_block_count,
xb->total_byte_count,
NULL);
break;

case CAN:
if (++can_count >= CAN_COUNT_ABORT)
{
xm_send_cancel (xmf->transmit);
return (xm_perror (XERR_RCVR_CANCEL, xmf));
}
goto GET_BLOCK_RESPONSE;

/*------------------------------------------------
I have seen 2 ways of handling this problem. One
is to return an error indicating that the sender
never received a packet ACK. Another method
retransmits the packet just sent in hopes that
the receiver will get its ACK together. The
second one is what I did.

------------------------------------------------*/
case RECV_TIMEOUT:
error_count++;
#if 0
return (xm_perror (XERR_ACK_TIMEOUT, xmf));
#else
xm_perror (XERR_ACK_TIMEOUT, xmf);
continue;
#endif

case NAK:
xm_perror (XERR_BLOCK_NAK, xmf);
error_count++;
continue;

default:
goto GET_BLOCK_RESPONSE;
}

break;
}

return (XERR_OK);
}



/*------------------------------------------------
This function updates a CRC accumulator for xmodem
CCITT CRC.
------------------------------------------------*/
unsigned int xm_update_CRC (
register unsigned int crc, /* current CRC */
unsigned int c) /* character to add to CRC */
{
static unsigned int crc_polynomial = 0x1021;
register int i;

c <<= 8;

for (i = 0; i < 8; i++)
{
if ((c ^ crc) & 0x8000)
crc = (crc << 1) ^ crc_polynomial;
else
crc <<= 1;

c <<= 1;
}

return (crc);
}



/*------------------------------------------------
This function sends 8 CANs and 8 BSs as done by
YAM. This is used to cancel an X or Y Modem send
or receive.

------------------------------------------------*/
void xm_send_cancel (
int (*transmit) (char)) /* transmit function */
{
register int i;

for (i = 0; i < 8; i++)
(*transmit) (CAN);

for (i = 0; i < 8; i++)
(*transmit) (BACKSPACE);
}



















































A Flexible dprintf Function


Arkin Asaf


Arkin Asaf has eight years of programming experience, the last two exclusive
to the C language. He gets his kicks from writing small but useful utilities,
and can be reached at 47 Berri St., Herzlya, Israel 46456.




About dprintff


Many applications could benefit from a specialized version of printf.
Unfortunately, printf is too large and complex to re-invent, and it's often
easier to get the source to a complex 3-D graphics library than to the
pedestrian functions in a compiler's standard library.
This article presents source code to dprintf, a clone of the printf function.
With minute effort you can modify and adapt dprintf to suit your needs. You
will find dprintf is mostly portable and expandable: it easily extends to
accommodate newly devised formats; it can print to almost all output
destinations, and it follows the ANSI standard.


Defining dprint


dprintf parallels printf in calling convention and operation except that it
uses a pointer to a function as its first parameter. This pointer designates a
function, resembling putchar(), which performs all output. Specifying the
output function gives dprintf unlimited choice of output destinations. The
pointer's definition and function prototypes are:
typedef int (*dprintf_fp)(int);
int dprintf(dprintf_fp Func,
const char *Format, ...);
int vdprintf(dprintf_fp Func,
const char *Format, va_list Args);


Variations On dprintf


Like most printf implementations, dprintf accepts a variable length argument
list and passes a pointer to this list, along with pointers to the output
function and format string to a subordinate function, in this case to
vdprintf. Having so little to do, dprintf tends to be rather small -- as short
as four statements long.
Since vdprintf accepts fixed arguments and is quite long, I advise that
dprintf absorb all non-standard arguments, setting them for vdprintf's
convenience. For example, you could create a function aprintf which allocates
a memory block in which to store output by revising dprintf as in Listing 1.
Note that vdprintf remains unchanged.
vdprintf cannot be insulated from all changes: new formats will require
changes to vdprintf. You can easily create relatively portable versions of
dprintf that support binary and Roman numerals, file and path names, and
printer control codes. At a minimum, you must modify vdprintf to create a %
pointer format, since this format varies considerably between system
architectures.


The Workings Of vdprintf


This section and the following two describe the internal working of vdprintf
(Listing 2) in a stepwise manner.
vdprintf outputs characters using a programmer-supplied function designated by
a pointer. This function returns EOF upon output error. Rather than passing a
pointer and return value through three levels of functions, a static pointer
(Out-Func) and longjmp buffer (dputc_buf -- for quick return) are defined.
Consequently, vdprintf's first actions involve assigning OutFunc a pointer and
initializing dputc_buf.
With that done, the printing process begins: vdprintf scans the format string
a character at a time, interpreting % specifications and echoing all other
characters to the output.
Following the % format sign come the flags - zero or more from a set of five:
-, +, space, 0 and #. Successive flags are parsed from the format string one
by one. strchr() matches each potential flag against FlagsList, returning
either NULL (not a flag) or a pointer to the flag in FlagsList. Simple pointer
substraction and bit shifting then produce a bit mask, which ORs onto Flags.
Later on vdprintf will AND Flags with Mask macros to establish whether or not
certain flags have been mentioned.
With all flags read, vdprintf gathers the width and precision parameters.
Widths are processed first (zero assumed, if absent), either as a numeric
(deduced from digits in the format string) or an int, and if an asterisk
replaces the numerals an argument is consumed from the arguments list. Leading
zeros are considered a flag. Since the precision is separated by a period, it
may begin with zero but otherwise is read similarly. Note that not specifying
the precision value (zero assumed by default) differs from omitting the
precision and period altogether (minus one assumed). For example, "%5.s"
implies a zero-length string, whereas "%5s" implies a string of five or more
characters.
Before the format letter comes the argument size: default, short (type h),
long (l), or long double (L). The long size applies only to integers, the long
double size to floating points. Default may be int, double or any specialized
type, such as char for the %c format. The short type serves only to maintain
some compatibility with scanf (), short arguments being automatically promoted
to int and float arguments to double by the compiler. In effect, the L
specifier is meaningless.
Finally, vdprintf reads the format letter, which determines how to generate
the output. Most formats are provided by auxiliary functions in order to keep
vdprintf short. If an output error or incorrect format specification is
encountered at any point, vdprintf returns EOF; if all goes well, vdprintf
returns the number of characters successfully printed.


vdprintf's Auxiliary Functions


Five auxiliary functions assist vdprintf: PrintDecimal, PrintRadix,
PrintFloat, ToInteger and Print. The first three transform long ints, long
unsigneds, and long doubles, respectively, into printable strings of digits.
ToInteger also transforms and Print does the actual printing.
PrintDecimal (%d or %i formats) produces signed decimals. It dissociates the
received long int into prefix and value, the prefix holding the sign. Once
ToInteger stringizes the value, Print outputs both the prefix and the value.
PrintRadix yields long unsigned decimals (%u format), octals (%0),
hexadecimals (%x or %X) and pointers (%p). Since these values are always
positive, the prefix, obtained in the variant format (# flag present), denotes
the value's type: nothing for decimals, 0 for octals and 0x for hexadecimals.
(Note that hexadecimal letters are in the same case as the format letter.)
As presented in Listing 2, vdprintf's PrintRadix utters eight-digit
hexadecimal (upper case letters) pointers, which @ prefixes in the variant
format. Various system architectures impose different pointer representation,
both in memory and in writing. It may be essential that you modify not only
PrintRadix, but also vdprintf's switch construct, which assumes pointers
remain intact, cast to long unsigneds. Not all systems guarantee this.
Before printing, numeric values must be converted into characters. ToInteger
turns long unsigneds into NULL-terminated strings of digits in a given radix.
A numeral must have no less than precision number of digits; if necessary,
zeros precede the value. ToInteger stores the string in a malloc'ed memory
block. Its address returns by reference -- through formal parameter char
**Buffer. The string's length returns by value (terminating NULL excluded.)

Print completes the auxiliary functions, printing the prefix and value in
accordance with Flags. Normally, spaces are inserted before the prefix,
right-justifying it and the value within their field (the width parameter sets
the field). The 0 flag states that zeros come between the prefix and value to
fill the field whole. The - flag appends spaces at the end, left-justifying
the prefix and value. Regardless of the style used, no more than Maximum
number of characters are printed (ignoring negative maximums). Finally, OutCnt
increments by the total number of characters printed.


Outputting Floating Point


PrintFloat starts with the prefix: negative values have a - prefix; positive
values have either nothing (default), a space (space flag present) or a +
prefix (plus flag present, space flag present or not).
At the far right of the representation, all formats but %f require an
exponent. The exponent results from dividing and multiplying the floating
point number by ten until the resulting value is between zero and one.
Divisions add to the exponent count; multiplication subtract. The %g format
forces vdprintf to choose between the standard (%f) and engineering (%e)
formats, whichever is shorter. If the exponent lies between -3 and precision
(inclusive), standard format governs (the exponent is cleared after it has
been deducted from the precision). Otherwise, vdprintf selects engineering.
Either way, the precision loses one digit off its end.
In the algorithm's trickiest manipulation vdprintf must split a floating point
number into integer and fractional parts. The integer rounding is accomplished
by casting the floating point to int and then back to float. Twice an integer
is created. The second time, the part of the fraction to be printed is moved
left of the decimal point, and again cast to int and back. We now have the
integer and fraction parts stringized.
The integer part prints first. In the %g format, trailing zeros are removed
from the fraction. In some cases (e.g., zero precision) there may be no
fraction. If the fraction follows or the # flag appears, a period follows the
integer. Unrelated, the exponent follows. An e (in the same case as the format
letter) precedes the exponent, which always contains a sign and at least two
digits.
During the process a long double was cast to long int and back. On some
systems, long floating numbers may fail to convert properly, if at all (they
may even raise an exception). A simple but imperfect solution uses the floor
function to obtain the integer part of a floating point. This fix also
requires changes to parts of PrintFloat. (See Listing 3). This solution is
imperfect because floor acts only on doubles, not long doubles. Depending on
your system and your demands on it, either version of PrintFloat should work.

Listing 1
char *aprintf_base;
char aprintf_ofst;

const char *aprintf(const char*Format, ...)
{
va_list Args;
va_start(Args,Format);
aprintf_ofst=0;
aprintf_base=NULL;
vdprintf(aprintf_out,Format,Args);
aprintf_out('\0');
va_end(Args);
return aprintf_base;
}

int aprintf_out(int Char)
{
aprintf_base=realloc(aprintf_base,aprintf_ofst+1);
if (aprintf_base==NULL)
return EOF;

aprintf_base[aprintf_ofst++]=Char;
return Char;
}


Listing 2
/* > DPRINTF.C
* dprintf -- Source Code
* (C) August 30 1989 Arkin Asaf
* All rights reserved
* References:
* C: A Reference Manual/Chapter 17, pp 328-340
* The Waite Group's Guide to ANSI C/Chapter 7, pp 84-87 */

/* Include files: */

#include <ctype.h>
#include <setjmp.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* Macro constants:- TRUE/FALSE, Flags mask bits, and
* argument size types.

* Macros:- Maximum and Minimum expand to yield the maximum
* or minimum of two expressions; ToValue gives the decimal
* value of an ASCII digit and ToDigit returns a digit from
* a value in any radix. N.B.: It goes without saying that
* the macro parameters must contain no operations of con-
* sequence, for they will be carried out more than once. */

#define TRUE 1
#define FALSE 0

#define MaskJustify 0x01 /* - Left justify value within
* field */
#define MaskPlusSign 0x02 /* + Precede positive value
* with plus */
#define MaskSpace 0x04 /* sp Precede positive value
* with space */
#define MaskZeros 0x08 /* 0 Justify value with zeros */
#define MaskVarient 0x10 /* # Output value in varient
* format */

#define TypeNormal 1 /* int/double */
#define TypeShort 2 /* short (meaningless) */
#define TypeLong 3 /* long int */
#define TypeDouble 4 /* long double */

#define Maximum(a,b) ((a)>(b)?(a):(b))
#define Minimum(a,b) ((a)<(b)?(a):(b))
#define ToValue(a) ((a)-'0')
#define ToDigit(a) ((a)<10?(a)+'0':(a)-10+'A')

/* OutFunc (of type dprintf_fp) points to a putchar-like
* function, which performs all output. Called with a
* character int as parameter, the function returns EOF
* only if an output error occured.*/

typedef int (*dprintf_fp)(int);
static dprintf_fp OutFunc;

/* Function declarations: */

int dprintf(dprintf_fp, const char *, ...);
int vdprintf(dprintf_fp, const char *, va_list);
static void PrintDecimal(long, int, int, char, int *);
static void PrintRadix(unsigned long, int, int, char, char,
int *);
static void PrintFloat(long double, int, int, char, char,
int *);
static void Print(char *, char *, int, int, char, int *);
static int ToInteger(char **, unsigned long, int, int);
static void dputc(int);

/* dputc employs this longjmp buffer in the event of an
* output error. */

jmp_buf dputc_Buf;

/* int dprintf(dprintf_fp, const char *, ...)
* dprintf accepts pointers to a putchar-like function and
* a format string. It then passes them to vdprintf, along

* with a pointer to the variable arguments list. */

int dprintf(dprintf_fp Func, const char *Format, ...)
{
int Return;
va_list Args;

va_start(Args,Format);
Return=vdprintf(Func,Format,Args);
va_end(Args);
return Return;
}

/* int vdprintf(dprintf_fp, const char *, va_list)
* vdprintf is an implementation of vprintf, as defined in
* the ANSI standard, with an additional
* pointer-to-function as its first parameter. On exit,
* vdprintf returns the number of characters successfully
* printed, EOF if an error occured. */

int vdprintf(dprintf_fp Func, const char *Format,
va_list Args)
{
char Flags, Size, *Ptr;
int Width, Precis, OutCnt = 0;
long Int;
unsigned long UnsgInt;
long double Float;
char *FlagsList = "-+ 0#", *TypesList = "hlL";

/* The pointer-to-function assigns to static variable
* OutFunc, rather than being passed through three layers
* of functions. The longjmp buffer is then initialized,
* so dputc can return in case of an output error. */
OutFunc=Func;
if (setjmp(dputc_Buf))
return EOF;
/* The format string is scanned a character at a time:
* %'s are processed, all other characters are merely
* echoed to the output. */
for (; *Format; ++Format)
{
if (*Format!='%')
{
dputc(*Format);
++OutCnt;
continue;
}
/* An output format can start with a combination of
* five flags: - + spc 0 #. Flags is set accordingly. */
if (!*++Format)
return EOF;
Flags=0;
while ((Ptr=strchr(FlagsList,*Format))!=NULL)
{
Flags=1<<(Ptr-FlagsList);
++Format;
}
/* Read width (zero assumed, if absent) and precision

* (-1 assumed, if absent): width must not start with
* a zero; precision precedes with a period -- if no
* precision follows the period, zero is assumed; an
* int argument is consumed for the width or precision,
* if an * replaces the value of either. */
Width=0;
if (*Format=='*')
{
Width=va_arg(Args,int);
++Format;
}
else
while (isdigit(*Format))
Width=Width*10+ToValue(*Format++);
if (*Format=='.')
{
Precis=0;
if (*++Format=='*')
{
Precis=va_arg(Args,int);
++Format;
}
else
while (isdigit(*Format))
Precis=Precis*10+ToValue(*Format++);
}
else
Precis=-1;
/* An argument is either int (default), short int ('h'
* -- meaningless), long int ('l'), double (default
* float), or long double ('L'). */
if ((Ptr=strchr(TypesList,*Format))!=NULL)
{
Size=Ptr-TypesList+TypeShort;
++Format;
}
else
Size=TypeNormal;
/* Consume one output format letter.
* Auxiliary functions process most formats, keeping
* vdprintf short, or else it may fail to compile. */
switch (*Format)
{
case 'd':
case 'i':
if (Size==TypeLong)
Int=va_arg(Args,long);
else
Int=va_arg(Args,int);
PrintDecimal(Int,Width,Precis,Flags,&OutCnt);
break;
case 'u':
case 'o':
case 'x': case 'X':
if (Size==TypeLong)
UnsgInt=va_arg(Args,unsigned long);
else
UnsgInt=va_arg(Args,unsigned);
PrintRadix(UnsgInt,Width,Precis,Flags,*Format,

&OutCnt);
break;
case 'c':
{
static char Char[2]= {0,0};

Char[0]=va_arg(Args,unsigned char);
Print(Char,NULL,Width,-1,Flags,&OutCnt);
break;
}
case 's':
Ptr=va_arg(Args,char *);
Print(Ptr,NULL,Width,Precis,Flags,&OutCnt);
break;
case 'f':
case 'e': case 'E':
case 'g': case 'G':
if (Size==TypeDouble)
Float=va_arg(Args,long double);
else
Float=va_arg(Args,double);
PrintFloat(Float,Width,Precis,Flags,*Format,&OutCnt);
break;
case 'p':
/* The pointer-to-void argument is cast to long
* unsigned, assuming pointer representation to
* remain intact. */
UnsgInt=(unsigned long) va_arg(Args,void *);
PrintRadix(UnsgInt,Width,Precis,Flags,*Format,
&OutCnt);
break;
case 'n':
*(va_arg(Args, int *))=OutCnt;
break;
case '%':
Print("%",NULL,Width,-1,Flags,&OutCnt);
break;
default:
return EOF;
}
}
return OutCnt;
}

/* void PrintDecimal(long, int, int, char, int *)
* Print a decimal value (%d or %i) with a sign prefix. */

static void PrintDecimal(long Int, int Width, int Precis,
char Flags, int *OutCnt)
{
char *Prefix;
char *Buffer;

if(Int<0)
{
Int=-Int;
Prefix="-";
}
else

{
if (Flags&MaskPlusSign)
Prefix="+";
else
if (Flags&MaskSpace)
Prefix=" ";
else
Prefix=NULL;
}
ToInteger(&Buffer,Int,10,Precis);
Print(Buffer,Prefix,Width,-1,Flags,OutCnt);
free(Buffer);
}

/* void PrintRadix(unsigned long, int, int, char, char,
* int *) Print an unsigned int in decimal (%u), octal
* (%o), hexadecimal (%x or %X), or pointer (%p) form. In
* the varient format octals prefix with a 0, hexadecimals
* with a 0x, and pointers with an @. (Hexadecimal letters
* are in the same case as is the format letter.) */

static void PrintRadix(unsigned long Int, int Width, int
Precis, char Flags, char Format,
int *OutCnt)
{
char *Prefix = NULL;
char *Buffer;
int Length;

if (Format=='u')
{
ToInteger(&Buffer, Int,10,Precis);
Print(Buffer,NULL,Width,-1,Flags,OutCnt);
}
else
if (Format=='o')
{
if (Flags&MaskVarient)
Prefix="0";
ToInteger(&Buffer,Int,8,Precis);
Print(Buffer,Prefix,Width,-1,Flags,OutCnt);
}
else
if (Format=='p')
{

/* Various architectures impose different pointer rep-
* resentations, both in memory and in writing. As is,
* an 8-digit hexadecimal number prints (upper case
* letters), prefixed with an @ in the varient format. */
if (Flags&MaskVarient)
Prefix="@";
ToInteger(&Buffer,Int,16,8);
Print(Buffer,Prefix,Width,-1,Flags,OutCnt);
}
else
{
if (Flags&MaskVarient)
Prefix=(Format=='x')?"0x":"0X";

Length=ToInteger(&Buffer,Int,16,Precis);
if (Format=='x')
{
for (; Length>0; --Length)
Buffer[Length-1]=tolower(Buffer[Length-1]);
}
Print(Buffer,Prefix,Width,-1,Flags,OutCnt);
}
free(Buffer);
}

/* void PrintFloat(long double, int, int, char, char,
* int *) Print a floating point number in standard (%f) or
* engineering (%e) form; the %g format requires that the
* shortest of the two be selected. The number divides into
* integer, fraction and exponent parts; each is cast into
* a long int, stringized with ToInteger, and Printed. */

static void PrintFloat(long double Float, int Width, int
Precis, char Flags, char Format,
int *OutCnt)
{
char *Prefix;
char *BufferI, *BufferF = NULL, *BufferE = NULL;
int LengthI, LengthF, LengthE = 0;
int Short = (Format=='g' Format=='G');
int Exponent = 0;
unsigned long Int;

/* Determine prefix according to sign and format flags.
* If no precision was given, six is assumed. */
if(Float<0)
{
Float=-Float;
Prefix="-";
}
else
{
if (Flags&MaskPlusSign)
Prefix="+";
else
if (Flags&MaskSpace)
Prefix=" ";
else
Prefix=NULL;
}
if (Precis<0)
Precis=6;
if (Format=='e' «« Format=='E' «« Short)
{
long double TempFloat = Float;

/* For %e and %g formats, establish the exponent:
* Float is divided and multiplied by ten, until it's
* value rests between zero and one. Exponent totals
* all divisions minus all multiplications. */
if (Float!=0)
{
while (Float>10)

{
Float/=10;
++Exponent;
}
while (Float<1)
{
Float*=10;
--Exponent;
}
}
LengthE=ToInteger(&BufferE,Exponent,10,2)+2;
/* If the %f format is shorter, %g requires that
* the exponent be cancelled and that amount of
* precision be lost; It states that one precision
* digit be lost in any case. */
if (Short)
{
if (Precis>0)
--Precis;
if (Exponent>=-3 && Exponent<=Precis)
{
LengthE=0;
Precis-=Exponent;
Float=TempFloat;
}
}
}
/* The mantissa divides into integer and fraction parts,
* stringized by ToInteger: the last digit always rounds
* up; a period is printed only before a fraction or in
* the varient format; the %g format allows trailing zeros
* in the fraction to be lost. N.B.: Too long floating
* numbers may raise an exception on conversion to long
* int, or otherwise fail to convert properly. */
Int=(unsigned long) Float;
Float-=(long double) Int;
if (Precis<=0 && Float>=.5)
++Int;
LengthI=ToInteger(&BufferI,Int,10,-1);
if (Precis>0)
{
for (LengthF=0; LengthF<Precis; ++LengthF)
Float*=10;
Int=(unsigned long) Float;
Float-=(long double) Int;
if (Float>=.5)
++Int;
LengthF=ToInteger(&BufferF,Int,10,Precis);
}
else
LengthF=0;
if (Short && !(Flags&MaskVarient))
while (LengthF>0 && BufferF[LengthF-1]=='0')
--LengthF;

if (Flags&MaskVarient LengthF>0)
--Width;
Width-=LengthF+LengthE;
Print(BufferI,Prefix,Width,-1,Flags,OutCnt);

if (Flags&MaskVarient LengthF>0)
{
dputc('.');
++*OutCnt;
}
if (LengthF>0)
Print(BufferF,NULL,LengthF,LengthF,MaskZeros,OutCnt);
/* Print exponent part of number, with an 'e' in the same
* case as is the format letter. Exponents must have a
* sign and at least two digits. */
if (LengthE>0)
{
if (Format=='g')
Format='e';
else
if (Format=='G')
Format='E';
dputc(Format);
if (Exponent<0)
{
Exponent=-Exponent;
Prefix="-";
}
else
Prefix="+";
Print(BufferE,Prefix,3,-1,MaskZeros,OutCnt);
}
free(BufferI);
free(BufferF);
free(BufferE);
}
/* int ToInteger(char **, unsigned long, int, int)
*
* Convert an unsigned int to a NULL-terminated string of
* digits in the given radix. If the string has less digits
* than the precision, additional zeros are inserted at the
* start of it. ToInteger allocates a memory block in which
* it stores the string -- Buffer returns its address. */

static int ToInteger(char **Buffer, unsigned long Int,
int Radix, int Precis)
{
int Cnt, Length;
unsigned long TempInt = Int;

if (Precis<0)
Precis=1;
for (Cnt=0; TempInt!=0; TempInt/=Radix, ++Cnt) ;
*Buffer=malloc(Maximum(Precis,Cnt)+1);
if (*Buffer==NULL)
return 0;
for (Length=0; Length+Cnt<Precis; )
(*Buffer)[Length++]='0';
Cnt= Length=Maximum(Length+Cnt,Precis);
(*Buffer)[Length]='\0';
for (; Int>0; Int/=Radix)
(*Buffer)[--Cnt]=ToDigit(Int%Radix);
return Length;
}


/* void Print(char *, char *, int, int, char, int *)
* Print prefix followed by value, incrementing OutCnt by
* the total number of characters printed: by default, the
* string is right justified within the field with spaces;
* the 0 flag places zeros between prefix and value; and
* the - flag left justifies by appending spaces at the
* end. Note that if Maximum is not -1, no more than that
* number of characters are printed. */

static void Print(char *String, char *Prefix, int Width,
int Maximum, char Flags, int *OutCnt)
{
int Length = strlen(String);

if (Prefix)
Length+=strlen(Prefix);
if (Maximum>=0 && Length>Maximum)
Length=Maximum;
*OutCnt+=Maximum(Length,Width);

if (!(Flags&MaskJustify Flags&MaskZeros))
for (; Length<Width; --Width)
dputc(' ');
while (Prefix && *Prefix!='\0')
{
dputc(*Prefix++);
--Length;
--Width;
}
if (!(Flags&MaskJustify))
for (; Length<Width; --Width)
dputc('0');

while (*String!='\0' && Length>0)
{
dputc(*String++);
--Length;
--Width;
}
for (; Width>0; --Width)
dputc(' ');
}

/* void dputc(int)
* Perform character output through the putchar-like
* function provided. If it returns EOF, a longjmp to
* vdprintf will make it return EOF to its caller. */

static void dputc(int Char)
{
if (OutFunc(Char)==EOF)
longjmp(dputc_Buf,EOF);
}


Listing 3
/* > DPRINTF-F.C
* dprintf -- PrintFloat 2nd Version

* (C) August 30 1989 Arkin Asaf
* All rights reserved
* Altered PrintFloat does no double to int casts; the
* math library floor function is used instead.
* PrintFloat and ToFloat perform floating point
* arithmetic, requiring the presence of a FP processor
* or emulator, and the floor function (rounds down a
* double to its integer value. */

/* void PrintFloat(long double, int, int, char, char,
* int *) Print a floating point number in standard (%f)
* or engineering (%e) form; the %g format requires that
* the shortest of the two be selected. The number divides
* into integer, fraction and exponent parts; the exponent
* is stringized with ToInteger, the integer and fraction
* with ToFloat. */

static void PrintFloat(long double Float, int Width, int
Precis, char Flags, char Format,
int *OutCnt)
{
:
long double Float2;

:
:
/* The mantissa divides into integer and fraction parts,
* stringized by ToFloat: the last digit always rounds
* up; a period is printed only before a fraction or in
* the varient format; the %g format allows trailing
* zeros in the fraction to be lost. N.B.: floor rounds
* only doubles -- long doubles are not catered for. */
Float2=floor(Float);
Float-=Float2;
if (Precis<=0 && Float>=.5)
++Float2;
LengthI=ToFloat(&BufferI,(double) Float2,-1);
if (Precis>0)
{
for (LengthF=0; LengthF<Precis; ++LengthF)
Float*=10;
Float2=floor(Float);
Float-=Float2;
if (Float>=.5)
++Float2;
LengthF=ToFloat(&BufferF,(double) Float2,Precis);
}
else
LengthF=0;
:
:
}

/* int ToFloat(char **, double, int)
* Convert a double to a NULL-terminated string of digits.
* If the string has less digits than the precision,
* additional zeros are inserted at the start of it.
* ToFloat allocates a memory block in which it stores
* the string -- Buffer returns its address. */


static int ToFloat(char **Buffer, double Float, int Precis)
{
int Cnt, Length, Char;
double TempFloat = Float;

if (Precis<0)
Precis=1;
for (Cnt=0; TempFloat>=1; TempFloat/=10, ++Cnt) ;
*Buffer=malloc(Maximum(Precis,Cnt)+1);
if (*Buffer==NULL)
return 0;
for (Length=0; Length+Cnt<Precis; )
(*Buffer)[Length++]='0';
Cnt= Length=Maximum(Length+Cnt,Precis);
(*Buffer)[Length]='\0';
for (; Float>=1; Float/=10)
{
Char=floor(Float-floor(Float/10)*10);
(*Buffer)[--Cnt]=ToDigit(Char);
}
return Length;
}








































Complex Function Library


Maynard A. Wright, P. E.


Maynard A. Wright is a self-employed consultant in telecommunications based in
Citrus Heights, California with 28 years of experience in the industry. Wright
is registered as a professional electical engineer in California and is
currently active in two working groups which are developing national standards
for telecommunications (ANSI/ECSA TIM1.3 and IEEE P743). He was awarded the
Centennial Medal of the Institute of Electrical and Electronics Engineers
(IEEE) in 1984 and currently holds both local and national positions in IEEE.
You can contact him at 6930 Enright Drive, Citrus Heights, CA 95621 (916)
726-1673.


As an engineer I occasionally program, usually to model some aspect of a
communications system. Until recently I used FORTRAN almost exclusively
because much of what I do involves manipulating complex variables. For short
throwaway programs I have often used BASIC. FORTRAN's convenient operations on
variables of complex type are superior to those in BASIC. The BASIC code is
rife with the sines, cosines and arctangents that are required to implement
polar-to-rectangular and rectangular-to-polar conversions, and the code is not
nearly as readable as its FORTRAN counterpart.
A recent project required writing code for a client's staff who didn't speak
FORTRAN. I bought a Microsoft QuickC compiler and quickly found that tasks
which are best done in FORTRAN by linking in an assembly routine can be done
directly in C. QuickC code is also portable to other compilers and systems.
Unfortunately for me, C lacks a complex data type. I solved this by using the
same techniques that work well in BASIC. A language as versatile as C must
have a better solution.
David Messerschmitt [1] proposes passing structures to and returning
structures from functions to produce readable complex functions. This method
provides source code that seems every bit as readable as that of FORTRAN:
FORTRAN: Z = X + Y
C: z = cmplx_add(x, y);
where x, X, y, Y, z and Z are complex. Although the FORTRAN code is identical
to an algebraic expression, the C code is equally clear as long as the reader
is aware of the complex library functions prefixed by cmplx_ or by the letter
c. BASIC or C code without the complex data type would require several
statements to perform the same computation, depending on whether the arguments
were available in rectangular or polar form and on the desired form of the
return.
Although Messerschmitt's method seems an excellent way to handle complex
functions in C, at least some C texts imply dark happenings if structures are
passed to and/or returned from functions. Most of the warnings include the
qualifier "long," but at least one reference, the QuickC v1.01 Programmer's
Guide [2], states that you should not write functions that have structure or
union return types due to the large size of such objects. The reader is left,
however, with a sense that functional doom awaits those who ignore such
admonitions.
The warnings left me a little uneasy about writing my own C complex function
library in order to cast off my last ties to FORTRAN. How large is long when
applied to a structure? Two doubles occupy 16 bytes on my system. Is that long
in the sense of the concern about structure arguments and returns?
[Editor's note: The sky won't fall, but programmers who return structures risk
creating non-portable code and inefficient code. One cannot portably return
structures because many preANSI compilers did not support structure passing by
value. Compilers that do support structure passing usually do so with some
type of copy mechanism. It is potentially very inefficient to copy an entire
structure from the calling routine into a local space and then copy the
structure back at exit.]
In order to test the waters a bit, I wrote three versions of a function to
calculate the complex hyperbolic sine of a complex argument: one which accepts
a structure containing two doubles and returns a structure of the same form as
proposed by Messerschmitt, a second which accepts a pointer to a global array
containing two doubles and returns a pointer to another such array, and a
third which obtains space for doubles representing arguments and returns using
malloc() and accepts and returns pointers to that space.
The source code (Listing 1 through Listing 3) was compiled and linked using
QuickC's QCL.EXE. The computation of the complex hyperbolic sine was performed
5,000 times in a for loop in each version. Execution was timed using
getticks() as described in Proficient C, pages 144-146 [3]. Since I only
wanted to compare the three versions' runtime converting the BIOS ticks to
actual elapsed time was unnecessary. Table 1 contains the values obtained by
compiling, linking and running the functions on an 8088 XT clone with an
installed 8087.
Listing 3's longer runtime results from calling malloc() and then free()
during every iteration of the program. Using pointers to global arrays seems
to provide about a five percent improvement in execution speed over passing
and returning structures. Using the array pointers, however, makes the source
code considerably less readable and makes functions considerably less
independent. This militates in favor of using Messerschmitt's structures.
Armed with this data, I wrote a complex function library containing complex
add, subtract, multiply and divide and most of the various transcendental
functions. Listing 4 contains the source code for the library, and Listing 5
shows the associated header file. I have thoroughly tested functions ctanh
(complex hyperbolic tangent) and catnh (inverse complex hyperbolic tangent).
All the other functions in the library have been tested to at least some
extent.
To avoid conflict with the complex structure type of the Microsoft C library
declared in math.h, the structure type used by the complex function library is
named cmplx_nmbr. Using Microsoft's declaration of type complex in math.h is
tempting, but doing so inhibits porting to other compilers because the complex
type is not part of the ANSI standard. Microsoft uses the complex type only in
implementing the cabs() function, which accepts a structure containing two
doubles and returns a single double.
The complex function library includes a rudimentary error handler err_hndlr(),
which is invoked if errno is non-zero after computing the return value of each
function. For most applications I envision, terminating execution seems
appropriate for any call to err.hndlr(); therefore, the function is written to
carry out that action.
The library as presented here does not include all possible transcendental
functions. The code for additional functions may be added by looking up the
appropriate closed form expressions in mathematical reference works. Functions
which do not exist in closed form may be calculated by using truncated
infinite series.
With some compilers, there may be considerable benefit derived from specifying
the various functions as independent source files and building a library from
the resulting independent object modules. I have not done this primarily to
keep all the source code in one file.
About the time I thought I had this all wrapped up, Louis Baker introduced his
efficient complex arithmetic and matrix macros [5]. I wrote another inverse
complex hyperbolic sine routine using his technique and compared it with the
other three. The source code, in Listing 6, results in
Baker
.OBJ file size 1470
.EXE file size 24210
run time in BIOS ticks 180
Although the correspondence between Listing 6 and Listing 1 through Listing 3
may not be exact, Listing 6 is the fastest of them all. The code, however,
does not allow a function return to be shown as an lvalue. For maximum
readability I use Messerschmitt's structure passing techniques. To squeeze
maximum efficiency out of a program, I use Baker's defines or some combination
of the techniques already discussed.
In C, complex structures can be defined in any manner the user desires. The
FORTRAN compiler I use requires the two components of a variable declared
COMPLEX to be REAL*4, which is comparable to type float in C. This limitation
can be quite serious when writing code to perform tasks such as the analysis
of transmission lines with high-standing wave ratios. Such lines exhibit
impedances with very high and very low values, which may cause
loss-of-precision errors when only single-precision variables are available.
zin.c (Listing 7) is a transmission line input impedance program which uses
the complex function library. Zin will work for the complex impedances
encountered in telephone cable pairs as well as with the high-frequency
approximations appropriate to radio transmission lines.
Zin tandems transmission line sections together to solve such problems as the
input impedance of a quarter-wave matching section or the series connection of
cable pairs of differing impedances. Zin avoids divide by zero errors by
accepting zero impedance inputs and adding a very small number to such an
input.
Zin implements Equation 7.19 of [4]:
Zi / Zo = tanh[(a + jb)d + arctanh(Zt / Zo)]
where
Zi = the input impedance to the line
Zo = the characteristic impedance of the line
Zt = the impedance terminating the line
a = the attenuation constant of the line
b = the phase constant of the line
d = the length of the line
Note that if, in FORTRAN, a + jb (the complex propagation constant of the
transmission line or cable pair) is moved into a complex variable CPROP, the
code for the preceding equation is:
ZI : (CTANH(CPROP*L+CATNH(ZT/ZO))*ZO
where:
CTANH() = complex hyperbolic tangent function
CATNH() = inverse complex hyperbolic tangent
function
L = the length of the line (usually as a
REAL*4)
This code is almost identical to the algebraic form and is very readable. None
of the C techniques resemble the algebraic form as much as does one possible
FORTRAN implementation.
The terminal interface portion of zin is relatively user-friendly and rejects
inputs that cannot be converted to valid doubles via the keyboard input
function get_real(). Zin and get_real() use Microsoft cursor positioning
commands. Though these commands reduce portability, porting to Mix Power C
requires only a few minutes.
Zin could be modified to eliminate non-ANSI functions, but the
user-friendliness of get_real() depends on repositioning the cursor either
following an input error diagnostic message or on completion of input in order
to write the converted value over what was actually entered.



Summary


The complex function library discussed and presented here is something of an
experiment in C. The complex function library has been useful in
implementating programs such as zin and in wooing me away from FORTRAN.
References
[1] David G. Messerschmitt, "A Transmission Line Modeling Program Written in
C," IEEE Journal on Selected Areas in Communications, Vol. SAC-2, No. 1,
January, 1984.
[2] Microsoft QuickC Programmer's Guide, Microsoft Corporation, 1987.
[3] Augie Hansen, Proficient C, Microsoft Press, 1987.
[4] Robert A. Chipman, Theory and Problems of Transmision Lines, Schaum's
Outline Series, McGraw-Hill, 1968.
[5] Louis Baker, "Complex Arithmetic And Matrices In C," The C Users Journal,
May, 1990.
Complex Number Applications
Complex numbers are explained in Reference 5. Illustrations are given in that
Reference in which complex numbers are obtained as the roots of polynomials.
In mechanical and electrical systems, complex numbers are often used to
represent such quantities as position, energy, voltage and current.
An intuitive grasp of the meaning of a physical quantity expressed in complex
form may often be obtained by thinking of the real part of the number as
energy which will be dissipated in the process under consideration or
transported out of the system. The imaginary part of the same complex quantity
may then be considered to be energy stored in the system which will be
returned to its source at some time.
If a spring is compressed, for example, the energy required may be considered
a complex quantity with the energy that heats the spring due to frictional
losses in the compression process comprising the real part. The imaginary part
of the quantity of energy is that stored in the spring which may be regained
by allowing the spring to expand back to its original form.
The input impedance of a transmission line is a complex number both components
of which are in units of ohms. The real part represents the ability of the
line to absorb energy which will disappear down the line and be dissipated as
heat in the resistance of the line or which will be delivered to the receiving
device at the distant end of the line. The imaginary component of the input
impedance represents the ability of the line to accept energy that will be
stored in much the same manner as energy is stored in a spring. Such energy
will be returned to the source. The returned energy is due to reflections from
impedance mismatches along the line or due to the energy storage
characteristics of the line itself.
Table 1
 Listing 1 Listing 2 Listing 3
 structures arrays malloc()
.OBJ file size 1604 1704 1692
.EXE file size 24274 25784 24338
run time in BIOS ticks 194 183 204

Listing 1
/* complex hyperbolic sine routine intended to test
argument passing and function returns only. This
version passes a structure containing two doubles
and returns a structure of the same type. */

#include <dos.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define BIOS_DATA_SEG 0x40
#define TIMER_DATA 0x6c
#define TICKS_PER_DAY 0x01800B0L

struct cmplx_nmbr

{
double real;
double imag;
};

long getticks(void);
struct cmplx_nmbr csinh(struct cmplx_nmbr);

main()

{
int ctr;
long start, end;
struct cmplx_nmbr arg, rtrn;
start = getticks();
printf("\n BEGIN AT CLOCK = %ld", start);
arg.real = 3.0;

arg.imag = -2.0;
for(ctr = 1; ctr <= 5000; ++ctr)
rtrn = csinh(arg);

printf("\n\n REAL RESULT = %lG", rtrn.real);
printf(" IMAG RESULT = %lG", rtrn.imag);
end = getticks();
printf("\n END AT CLOCK = %ld", end);
printf("\n\n ELAPSED TICKS = %ld", end - start);
}

struct cmplx_nmbr csinh(struct cmplx_nmbr param)

{
struct cmplx_nmbr rslt;
rslt.real = cos(param. imag) * sinh(param.real);
rslt.imag = sin(param. imag) * cosh(param.real);
return rslt;
}


Listing 2
/* complex hyperbolic sine routine intended to test
argument passing and function returns only. This
version passes a pointer to a global array and 
returns a pointer to a similar array. */

#include <dos.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define BIOS_DATA_SEG 0x40
#define TIMER_DATA 0x6c
#define TICKS_PER_DAY 0x01800B0L
long getticks(void);
void csinh(double*);

double cplxarg[2], cplxrtn[2];

main()

{
int ctr;
long start, end;

start = getticks();
printf("\n BEGIN AT CLOCK = %ld", start);

*cplxarg = 3.0;
*(cplxarg + 1) = -2.0;

for(ctr = 1; ctr <= 5000; ++ctr)
csinh(cplxarg);

end = getticks();

printf("\n\n REAL RESULT = %lG", *cplxrtn);

printf(" IMAG RESULT = %lG", *(cplxrtn + 1));

printf("\n END AT CLOCK = %ld", end);
printf("\n\n ELAPSED TICKS = %ld", end - start);
}

void csinh(double *arg)

{
*cplxrtn = cos(*(arg + 1)) * sinh(*arg);
*(cplxrtn + 1) = sin(*(arg + 1)) *cosh(*arg);
}


Listing 3
/* complex hyperbolic sine routine intended to test
argument passing and function returns only. This
version passes doubles to a function which obtains
memory using malloc() and returns pointers to that
memory. */

#include <dos.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define BIOS_DATA_SEG 0x40
#define TIMER_DATA 0x6c
#define TICKS_PER_DAY 0x01800B0L

long getticks(void);
char *csinh(double, double);

main()

{
double x, y, realarg, imagarg;
char *vlurtnd;
int ctr;
long start, end;

start = getticks();
printf("\n BEGIN AT CLOCK = %ld", start);

realarg = 3.0;
imagarg = -2.0;

for(ctr = 1; ctr <= 5000; ++ctr)

{
vlurtnd = csinh(realarg, imagarg);
x = *(double*)(vlurtnd);
y = *((double*)(vlurtnd + 8));

free((void*)vlurtnd);
}

end = getticks();


printf("\n\n REAL RESULT = %lG", x);
printf(" IMAG RESULT = %lG", y);

printf("\n END AT CLOCK = %ld", end);
printf("\n\n ELAPSED TICKS = %ld", end - start);
}

char *csinh(double realarg, double imagarg)

{
double outreal, outimag, *pntrreal, *pntrimag;
char *rtnvlu;

rtnvlu = (char *)malloc(16);

outreal = cos(imagarg) * sinh(realarg);
outimag = sin(imagarg) * cosh(realarg);

pntrreal = (double*)rtnvlu;
pntrimag = (double*)(rtnvlu + 8);

*pntrreal = outreal;
*pntrimag = outimag;

return rtnvlu;
}


Listing 4
/***********************************************
** **
** Complex Function Library **
** **
** Maynard A. Wright, P. E. 4-16-90 **
** **
************************************************/
/* This library must be linked with the program which
will call the functions defined herein. The
calling program must also #include file cmplx.h for
structure and function declarations. Multivalued
functions defined in this library will, in general,
return the solution with the smallest absolute
value. The source code for this library is stored
as cmplxlib.c.

stdlib.h is included because the variable errno is
declared in that header file in Microsoft C.

Range and domain errors are trapped by evaluating
errno. Error diagnostics are directed to stderr and
the program exits, returning the value 1 to the
calling environment. */

/* Preprocessor Directives */

#include <errno.h>
#include <math.h>
#include <stdio.h>

#include <stdlib.h>

#define LN10 2.3025851
#define PI 3.1415927

/* Declarations */

static struct cmplx_nmbr

{
double real;
double imag;

} arg, arg1, arg2, answer, rtrn;

/* Function Prototypes */

struct cmplx_nmbr csinh(struct cmplx_nmbr);
struct cmplx_nmbr ccosh(struct cmplx_nmbr);
struct cmplx_nmbr ctanh(struct cmplx_nmbr);
struct cmplx_nmbr catnh(struct cmplx_nmbr);
struct cmplx_nmbr csin(struct cmplx_nmbr);
struct cmplx_nmbr ccos(struct cmplx_nmbr);
struct cmplx_nmbr ctan(struct cmplx_nmbr);
struct cmplx_nmbr catn(struct cmplx_nmbr);
struct cmplx_nmbr cexp(struct cmplx_nmbr);
struct cmplx_nmbr cten2x(struct cmplx_nmbr);
struct cmplx_nmbr cloge(struct cmplx_nmbr);
struct cmplx_nmbr clog10(struct cmplx_nmbr);
struct cmplx_nmbr cmplxinv(struct cmplx_nmbr);
struct cmplx_nmbr cmplxadd(struct cmplx_nmbr,
struct cmplx_nmbr);
struct cmplx_nmbr cmplxsub(struct cmplx_nmbr,
struct cmplx_nmbr);
struct cmplx_nmbr cmplxmul(struct cmplx_nmbr,
struct cmplx_nmbr);
struct cmplx_nmbr cmplxdiv(struct cmplx_nmbr,
struct cmplx_nmbr);
void err_hndlr(char *name);

/* Complex Function Definitions */

struct cmplx_nmbr csinh(struct cmplx_nmbr param)

{
errno = 0;
rtrn.real = cos(param.imag) * sinh(param.real);
rtrn.imag = sin(param.imag) * cosh(param.real);
if(errno)
err_hndlr("CSINH()");
return rtrn;
}

struct cmplx_nmbr ccosh(struct cmplx_nmbr param)

{
errno = 0;
rtrn.real = cos(param.imag) * cosh(param.real);
rtrn.imag = sin(param.imag) * sinh(param.real);

if(errno)
err_hndlr("CCOSH()");
return rtrn;
}

struct cmplx_nmbr ctanh(struct cmplx_nmbr param)

{
double denom;
errno = 0;
denom = cosh(2 * param.real) + cos(2 * param.imag);
rtrn.real = sinh(2 * param.real) / denom;
rtrn.imag = sin(2 * param.imag) / denom;
if(errno)
err_hndlr("CTANH()");
return rtrn;
}

struct cmplx_nmbr catnh(struct cmplx_nmbr param)

{
errno = 0;
rtrn.real = 0.25 * log(((1. + param.real) * (1. +
param.real) + param.imag * param.imag) /
((1. - param.real) * (1. - param.real) +
param.imag * param.imag));
rtrn.imag = -0.5 * (PI - atan2((1. + param.real),
-param.imag) - atan2((1. - param.real),
-param.imag));
if(errno)
err_hndlr("CATNH()");
return rtrn;
}

struct cmplx_nmbr csin(struct cmplx_nmbr param)

{
errno = 0;
rtrn.real = 0.5 * (exp(param.imag)
+ exp(-param.imag)) * sin(param.real);
rtrn.imag = 0.5 * (exp(param.imag)
- exp(-param.imag)) * cos(param.real);
if(errno)
err_hndlr("CSIN()");
return rtrn;
}

struct cmplx_nmbr ccos(struct cmplx_nmbr param)

{
errno = 0;
rtrn.real = 0.5 * (exp(param.imag)
+ exp(-param. imag)) * cos(param.real);
rtrn.imag = 0.5 * (exp(-param.imag)
- exp(param.imag)) * sin(param.real);
if(errno)
err_hndlr("CCOS()");
return rtrn;
}


struct cmplx_nmbr ctan(struct cmplx_nmbr param)

{
struct cmplx_nmbr numerator, denominator;
double texp = exp(-2 * param.imag);
errno = 0;
numerator.real = sin(2 * param.real) * texp;
numerator.imag = 1 - cos(2 * param.real) * texp;
denominator.real = cos(2 * param.real) * texp + 1;
denominator.imag = numerator.real;
rtrn = cmplxdiv(numerator, denominator);
if(errno)
err_hndlr("CTAN()");
return rtrn;
}

struct cmplx_nmbr catn(struct cmplx_nmbr param)

{
errno = 0;
rtrn.real = 0.5 * (PI - atan2((1 + param.imag),
param.real) - atan2((1 - param.imag),
param.real));
rtrn.imag = 0.25 * log(((1 + param.imag)
* (1 + param.imag) + (param.real
* param.real)) / ((1 - param.imag) * (1 -
param.imag) + (param.real * param.real)));
if(errno)
err_hndlr("CATN()");
return rtrn;
}

struct cmplx_nmbr cexp(struct cmplx_nmbr param)

{
errno = 0;
rtrn.real = exp(param.real) * cos(param.imag);
rtrn.imag = exp(param.real) * sin(param.imag);
if(errno)
err_hndlr("CEXP()");
return rtrn;
}

struct cmplx_nmbr cten2x(struct cmplx_nmbr param)

{
errno = 0;
rtrn.real = exp(param.real * LN10) + cos(param.imag
* LN10);
rtrn.imag = exp(param.real * LN10) + sin(param.imag
* LN10);
if(errno)
err hndlr("CTEN2X()");
return rtrn;
}

struct cmplx_nmbr cloge(struct cmplx_nmbr param)


{
errno = 0;
rtrn.real = log(sqrt((param.real * param.real) +
(param.imag * param.imag)));
rtrn.imag = atan2(param.imag, param.real);
if(errno)
err_hndlr("CLOGE()");
return rtrn;
}

struct cmplx_nmbr clog10(struct cmplx_nmbr param)

{
errno = 0;

rtrn.real = log(sqrt((param.real * param.real) +
(param.imag * param.imag))) / LN10;
rtrn.imag = atan2(param.imag, param.real) / LN10;
if(errno)
err_hndlr("CLOG10()");
return rtrn;
}

struct cmplx_nmbr cmplxinv(struct cmplx_nmbr param)

{
struct cmplx_nmbr one;
errno = 0;
one.real = 1.0;
one.imag = 1e-100;
rtrn = cmplxdiv(one, param);
if(errno)
err_hndlr("CMPLXINV()");
return rtrn;
}

struct cmplx_nmbr cmplxadd(struct cmplx_nmbr arg1,
struct cmplx_nmbr arg2)

{
errno = 0;
rtrn.real = arg1.real + arg2.real;
rtrn.imag = arg1.imag + arg2.imag;
if(errno)
err_hndlr("CMPLXADD()");
return rtrn;
}

struct cmplx_nmbr cmplxsub(struct cmplx_nmbr arg1,
struct cmplx_nmbr arg2)

{
errno = 0;
rtrn.real = arg1.real - arg2.real;
rtrn.imag = arg1.imag - arg2.imag;
if(errno)
err_hndlr("CMPLXSUB()");
return rtrn;
}


struct cmplx_nmbr cmplxmul(struct cmplx_nmbr arg1,
struct cmplx_nmbr arg2)

{
errno = 0;
rtrn.real = arg1.real * arg2.real - arg1.imag
* arg2.imag;
rtrn.imag = arg1.real * arg2.imag + arg2.real
* arg1.imag;
if(errno)
err_hndlr("CMPLXMUL()");
return rtrn;
}

struct cmplx_nmbr cmplxdiv(struct cmplx_nmbr arg1,
struct cmplx_nmbr arg2)

{
double mag1, mag2, ang1, ang2;
errno = 0;
ang1 = atan2(arg1.imag, arg1.real);
ang2 = atan2(arg2.imag, arg2.real);
mag1 = arg1.imag / sin(ang1);
mag2 = arg2.imag / sin(ang2);
mag1 /= mag2;
ang1 -= ang2;
rtrn.real = mag1 * cos(ang1);
rtrn.imag = mag1* sin(ang1);
if(errno)
err_hndlr("CMPLXDIV()");
return rtrn;
}

void err_hndlr(char *func_name)
{
if(errno == ERANGE)
{
fprintf(stderr,
"\n\n\n ");
fprintf(stderr,
"!!! RANGE ERROR IN %s FUNCTION !!!",
func_name);
}
else if(errno == EDOM)
}
fprintf(stderr,
"\n\n\n ");
fprintf(stderr,
"!!! DOMAIN ERROR IN %s FUNCTION !!!",
func_name);
}
else
{
fprintf(stderr,
"\n\n\n ");
fprintf(stderr,
"!!! NON-MATHEMATICAL ERROR IN ");
fprintf(stderr, " %s FUNCTION !!!", func_name);

}
exit(1);
}


Listing 5
/********************************************
** **
** File CMPLX.H **
** **
** Complex Function Include File **
** **
** Maynard A. Wright, P. E. 4-16-90 **
** **
********************************************/

#ifndef struct cmplx_nmbr

struct cmplx_nmbr

{ double rea
double imag;

} arg, arg1, arg2, answer;

#endif

/* Function Prototypes */

struct cmplx_nmbr csinh(struct cmplx_nmbr);
struct cmplx_nmbr ccosh(struct cmplx_nmbr);
struct cmplx_nmbr ctanh(struct cmplx_nmbr);
struct cmplx_nmbr catnh(struct cmplx_nmbr);
struct cmplx_nmbr csin(struct cmplx_nmbr);
struct cmplx_nmbr ccos(struct cmplx_nmbr);
struct cmplx_nmbr ctan(struct cmplx_nmbr);
struct cmplx_nmbr catn(struct cmplx_nmbr);
struct cmplx_nmbr cloge(struct cmplx_nmbr);
struct cmplx_nmbr clog10(struct cmplx_nmbr);
struct cmplx_nmbr cexp(struct cmplx_nmbr);
struct cmplx_nmbr cten2x(struct cmplx_nmbr);
struct cmplx nmbr cmplxinv(struct cmplx_nmbr);
struct cmplx_nmbr cmplxadd(struct cmplx_nmbr,
struct cmplx_nmbr);
struct cmplx_nmbr cmplxsub(struct cmplx_nmbr,
struct cmplx_nmbr);
struct cmplx_nmbr cmplxmul(struct cmplx_nmbr,
struct cmplx_nmbr);
struct cmplx_nmbr cmplxdiv(struct cmplx_nmbr,
struct cmplx_nmbr);


Listing 6
/* complex hyperbolic sine routine intended to test
argument passing and function returns only. This
version uses a macro following Louis Baker's method
covered in "Complex Arithmetic and Matrices in C,"
the C Users Journal, May, 1990 */


#include <dos.h>
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#define BIOS_DATA_SEG 0x40
#define TIMER_DATA 0x6c
#define TICKS_PER_DAY 0x01800B0L
#define CSINH(y, x) {y.real = cos(x.imag)
* sinh(x.real);\ y. imag = sin(x.imag)
* cosh(x.real);}

struct cmplx_nmbr

{
double real;
double imag;
};

long getticks(void);

main()

{
int ctr;
Long start, end;
struct cmplx_nmbr arg, rtrn;
start = getticks();
printf("\n BEGIN AT CLOCK = %ld", start);
arg.real = 3.0;
arg. imag = -2.0;
for(ctr = 1; ctr <= 5000; ++ctr)
CSINH(rtrn, rg);

printf("\n\n REAL RESULT = %lG", rtrn.real);
printf(" IMAG RESULT = %lG", rtrn.imag);
end = getticks();
printf("\n END AT CLOCK = %ld", end);
printf("\n\n ELAPSED TICKS = %ld", end - start);
}


Listing 7
/*******************************************
** Input Impedance Calculation **
** **
** Maynard A. Wright, P. E. 1990 **
** **
*******************************************/

/* Stored as "zin.c." Algorithm used here
described in M. A. Wright, "Computation of
Input Impedance Using RPN Calculators," IEEE
1979 Region VI Conference Proceedings. This
version of the program uses functions from
the Microsoft graphics library and must
therefore be compiled using Microsoft C or QuickC.
The program also uses function prototypes and

constant definitions from include file cmplx.h
and must be linked to library cmplxlib.lib. */

#include <graph.h>
#include <math.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <local\cmplx.h>

#define DBSCALE 8.68588963 /* convert dB to nepers */
#define PHASCALE 57.29577951 /* convert degrees to
radians */
#define MINVAL 1e-100 /* minimum value of double
input */
#define NW '\xC9' /* northwest graphics corner */
#define NE '\xBB' /* northeast graphics corner */
#define SW '\xC8' /* southwest graphics corner */
#define SE '\xBC' /* southeast graphics corner */
#define HZ '\xCD' /* horizontal graphics line */
#define VR '\xBA' /* vertical graphics line */

double get_real(char*, short, short);

void main(void)

{
/* declarations */

char decision, counter;
short linectr;
double atten, phase, length;
struct cmplx_nmbr calcz, termz, linez, inputz;

/* screen header */

_settextcolor(14);
_clearscreen(_GCLEARSCREEN);
_settextposition(3,20);
printf("%c", NW);
for(counter = 1; counter <= 37; ++counter)
printf("%c", HZ);
printf("%c", NE);
_settextposition(4,20);
printf("%c INPUT IMPEDANCE CALCULATION %c",
VR, VR);
_settextposition(5,20);
printf("%c", SW);
for(counter = 1; counter <= 37; ++counter)
printf("%c", HZ);
printf("%c", SE);

/* input data from keyboard */
termz.real = get_real("ENTER REAL PART OF
TERMINATING IMPEDANCE IN OHMS:", 8, 17);
if(termz.real == 0.) /* make input nonzero to
avoid sub- */
termz.real = MINVAL; /* sequent math errors */

termz.imag = get_reat("ENTER IMAGINARY PART OF
TERMINATING IMPEDANCE IN OHMS:", 9, 12);
if(termz.imag == 0.)
termz.imag = MINVAL;
do

{
linez.real = get_real("ENTER REAL PART OF
CHARACTERISTIC IMPEDANCE OF LINE IN OHMS:", 10, 6);
If(linez.real == 0.)
linez.real = MINVAL;
linez.imag = get_real("ENTER IMAGINARY PART OF
CHARACTERISTIC IMPEDANCE OF LINE IN OHMS:", 11, 1);
if(linez.imag == 0.)
linez.imag = MINVAL;
atten = get_real("ENTER ATTENUATION PER UNIT
LENGTH OF LINE:", 12, 24);
if(atten == 0.)
atten= MINVAL;
_settextposition(13,22);
printf("ATTENUATION IN DECIBELS OR NEPERS? ");
printf("(D OR N):");
do
{
fflush(stdin);
decision = getche();
if(decision != 'D' && decision != 'd' &&
decision != 'N' && decision != 'n')
{
_settextposition(14,22);
printf(" !!! RESPONSE MUST BE D OR
N !!!");
printf(" ");
_settextposition(13,66);
}
}
while(decision != 'D' && decision != 'd' &&
decision != 'N' && decision != 'n');
if(decision == 'D' decision == 'd')
atten /= DBSCALE;
phase = get_real("ENTER PHASE SHIFT PER UNIT
LENGTH OF LINE:", 14, 24);
if(phase == 0.)
phase = MINVAL;
_settextposition(15,22);
printf("PHASE SHIFT IN DEGREES OR RADIANS? ");
printf("(D OR R):");
do

{
fflush(stdin);
decision = getche();
if(decision != 'D' && decision != 'd' &&
decision != 'R' && decision != 'r')
{
_settextposition(16,22);
printf(" !!! RESPONSE MUST BE D OR
R !!!");
printf(" ");

_settextposition(15,66);
}
}
while(decision != 'D' && decision != 'd' &&
decision != 'R' && decision != 'r');
if(decision == 'D' decision == 'd')
phase /= PHASCALE;
length = get_real("ENTER

LENGTH OF LINE:", 16, 25);
_settextposition(17,22);
printf("
");
if(length == 0.)
length = MINVAL;

/* calculate attenuation and phase shift of line */

atten *= length;
phase *= length;

/* calculate termz / linez */

calcz = cmplxdiv(termz, linez);

/* handle case of angle = 0 */

if(calcz.imag == 0.)
calcz.imag = 1e-100;

/* calculate inverse complex hyperbolic tangent */

calcz = catnh(calcz);

/* add in line constants */

calcz.real += atten;
calcz.imag += phase;

/* calculate complex hyperbolic tangent */

calcz = ctanh(calcz);

/* denormalize solution to obtain input impedance */

inputz = cmplxmul(calcz, linez);

/* print results */

for(linectr = 19; linectr <= 20; ++linectr)

{
_settextposition(linectr, 51);
printf(" ");
}

_settextposition(19,20);
printf("REAL PART OF INPUT ");
printf("IMPEDANCE = %+lG OHMS",inputz.real);

_settextposition(20,15);
printf("IMAGINARY PART OF INPUT ");
printf("IMPEDANCE = %+lG OHMS\n\n",inputz.imag);

/* loop for another section or terminate execution */

_settextposition(22,4);
printf("USE INPUT IMPEDANCE AS TERMINAL IMPED");
printf("ANCE FOR ANOTHER SECTION? ");
_settextposition(22,67);
decision = 0;
fflush(stdin);
decision = getche();
if(decision == 'Y' decision == 'y')

{
termz.real = inputz.real;
termz.imag = inputz.imag;

/* set up screen for new input */

for(linectr = 8; linectr <= 16; ++linectr)

{
_settextposition(linectr, 66);

printf(" ");
}
_settextposition(19,20);
printf(" ");
printf(" ");
_settextposition(20,15);
printf(" ");
printf(" ");
_settextposition(8,66);
printf("%+lG", inputz.real);
_settextposition(9,66);
printf("%+lG", inputz.imag);
}
}

while(decision == 'Y' decision == 'y');
printf("\n");
}

double get_real(char *num_string, short x, short y)

{
struct rccoord rcoord;
double value;
char *phlag = '\0', *remainder, string[20];
short inrow, incol;

do

{
if(phlag)

{

_settextposition((x + 1), 1);
printf("
");
printf("
");
_settextposition((x + 1), 25);
printf("!!! ERROR IN INPUT: PLEASE REENTER
!!!");
}

_settextposition(x, y);
printf("%s", num_string);
printf(" ");
rcoord = _gettextposition();
inrow = rcoord.row;
incol = rcoord.col + strlen(num_string);
_settextposition(inrow, incol);
++phlag;
fflush(stdin);
}

while((!(value = strtod((gets(string)),\
&remainder)) && *remainder) value ==
HUGE_VAL);
_settextposition(inrow, incol);
printf("%G ", value);
if(phlag)

{
_settextposition((x+1), 25);
printf(" ");
}

return(value);
}




























Standard C


With Gun And Reel




P. J. Plauger


P.J. Plauger has been a prolific programmer, textbook author, and software
entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO C standards committee, WG14. His latest book is Standard C
which he co-authored with Jim Brodie.


Last month, I discussed the ground rules by which the Standard C library must
operate. I reviewed name space issues from the points of view of both users
and implementors. I did much the same for the fifteen standard headers. I also
showed ways to implement some of the tricker aspects of the Standard C
library.
Part of my motivation is to quell criticism of the ANSI C standard. People new
to the party often shoot from the hip. They take one look, pass judgement, and
start calling people bad names. It is very easy to find things to criticize in
anything as complex as a programming language standard. If you haven't been
privy to all the micro decisions along the way, it is even easier to conclude
that mistakes were made. If your favorite feature is omitted or strangely
altered, it is too easy to assume stupidity or malice.
As an active member of committee X3J11, I know that plenty of thought went
into the formation of the C standard. The library, in particular, was the
subject of much refinement and compromise. Many of us had to give up our pet
notions and allow ourselves to be educated. I firmly believe that the result
is quite good. If another procedural language has a better specified library,
I don't know what it is.
Another part of my motivation is to demonstrate that the Standard C library is
capable of reasonable implimentation. It is often easier to show a way to do
something than to argue at length about whether it can be done. If the
demonstration can be made readable and at all elegant, so much the better.
With this column, I am taking a new departure. My plan is to continue
reviewing the Standard C library in further detail. I will talk about both how
the library can be used and how it can be implemented.
As often as possible, I will present credible examples and real code. The code
will be highly portable Standard C as much as possible. (I assume that an
implementation will tolerate "replacement" C code for standard library
functions.) I can't promise that the code will be bug free, but I'll certainly
try to keep it that way.
I also can't predict at this point how long this process will continue. It
should make a good change of pace from essays on the C language proper or on
how the standard is flawed. Think of it as a travelogue -- With Gun and Reel
Down the Standard C Library. I'll do my best to keep the slide show
interesting.
I begin with the standard header <assert.h>. It is not the first one addressed
in the C standard, but it is alphabetically the first on the list. I see no
compelling reason to address topics in any other order.


The Header <assert.h>


Here's what the C standard says about <assert.h>. I quote verbatim:


4.2 Diagnostics <assert.h>


The header <assert.h> defines the assert macro and refers to another macro,
NDEBUG
which is not defined by <assert.h>. If NDEBUG is defined as a macro name at
the point in the source file where <assert.h> is included, the assert macro is
defined simply as
#define assert(ignore) ((void)0)
The assert macro shall be implemented as a macro, not as an actual function.
If the macro definition is suppressed in order to access an actual function,
the behavior is undefined.


4.2.1 Program Diagnostics




4.2.1.1 The assert Macro




Synopsis


#include <assert.h>
void assert(int expression);



Description


The assert macro puts diagnostics into programs. When it is executed, if
expression is false (that is, compares equal to 0), the assert macro writes
information about the particular call that failed (including the text of the
argument, the name of the source file, and the source line number -- the
latter are respectively the values of the preprocessing macros __FILE__ and
__LINE__) on the standard error file in an implementation-defined format.
[Footnote 97: The message written might be of the form
Assertion failed: expression, file xyz, line nnn]
It then calls the abort function.


Returns


The assert macro returns no value.
Forward references: the abort function (4.10.4.1).


Using Assertions


The sole purpose of this header is to provide a definition of the macro
assert. You use the macro to enforce assertions at critical places within your
program. Should an assertion prove to be untrue, you want the program to write
a suitably revealing message to the standard error stream and terminate
execution abnormally. Thus, you might write:
#include <assert.h>

.....

assert(0<=idx &&
idx < sizeof a/sizeof a[0]);
/* a[idx] is now safe */
Any code you write following the assertion can be simpler. It need not check
whether the index idx is in range. The assertion sees to that. And should this
"impossible" situation arise while you are debugging the program, you get a
handy diagnostic. The program does not stumble on to generate spurious
problems at a later time.
Please note that this is hardly the best way to write production code. It is
generally ill-advised for a program in the field to terminate abnormally. No
matter how revealing the accompanying message may be to you the programmer, it
is assuredly cryptic to the user. Some form of error recovery and continuation
is almost always preferred. Any diagnostics should be in terms that the user
can understand.
What you want is some way to introduce assertions during debugging. That lets
you catch the worst logic errors and document the assertions you need early
on. Later, you might add code to recover from errors that truly can occur
during execution. You want to leave the assertions in as documentation, but
you want them to generate no code.
<assert.h> gives you just this behavior. You can define the macro NDEBUG to
alter the way assert expands. If NDEBUG is not defined at the point where you
include <assert.h>, the header defines the active form of the macro assert. It
expands to an expression that tests the assertion and prints an error message
if the assertion is false.
If NDEBUG is defined, however, the header defines the passive form of the
macro. It expands to a placeholder expression that does nothing. In either
case, assert behaves essentially like a function that takes a single int
argument and returns a void result.
How you control the macro expansion is a matter of taste. One style of
programming is to change the source code. Once you believe that assertions
should be disabled, just add a line before you include the header:
#define NDEBUG/* disable assertions */
#include <assert.h>
That neatly documents that assertions are henceforth inoperative. The only
drawback comes when you have to turn debugging back on again. (I can assure
you that eventually you will.) You must edit the source file to remove the
macro definition.
Many implementations support a somewhat more flexible approach. They let you
define one or more macros outside any C source files. You specify these
definitions in a command script or make file that rebuilds the program. A make
file can be a better place to document that assertions are to be disabled, and
can also be an easier file to replicate and alter when you must revert to more
primitive debugging phases. <assert.h> is designed with such a capability in
mind, though nothing in the C standard requires it.
This header has an additional peculiarity. All other headers are idempotent.
Including any of them two or more times has the same effect as including the
header just once. In the case of <assert.h>, however, its behavior can vary
each time you include it. The header alters the definition of assert to agree
with the current definition status of NDEBUG.
The net effect is that you can control assertions in different ways throughout
a source file. Performance may suffer dramatically, for example, when
assertions occur inside frequently executed loops. Or an earlier assertion may
terminate execution before you get to the revealing parts. In either case, you
may need to turn assertions on and off at various places throughout a source
file.
So to turn assertions on, you write
#undef NDEBUG
#include <assert.h>
And to turn assertions off, you write
#define NDEBUG
#include <assert.h>
Note that you can safely define the macro even if it is already defined. So
long as you always write the same (empty) definition, the code will produce no
diagnostics. This license is called "benign redefinition."


Implementing <assert. h>


This header requires very little code, but it must be carefully crafted. To
respond properly to NDEBUG, the header must have the general structure:
#under assert /* remove any
existing definition */
#ifdef NDEBUG
#define assert(test) ((void(0)
/* passive version */
#else
#define assert(test) <active version>

#endif
The initial #undef is innocuous if no macro definition of assert currently
exists. It is very necessary, however, if the definition is to change. The
passive version of the macro is closely spelled out by the C standard. Note
that the name of the dummy argument is unimportant. (No valid program you can
write can tell the difference if the name varies.)
All that remains is to write the active version of the macro. An obvious but
naive way to provide the needed functionality is to write the active version
as:
#define assert(test) if (!(test)) \
fprintf(stderr, \
"Assertion failed: %s, file %s, line %i\n", \
#test, __FILE__, __LINE__) UNACCEPTABLE
This form is unacceptable for a variety of reasons:
The macro must not directly call any of the library output functions, such as
fprintf. Nor may it refer to the macro stderr. These names are properly
defined only in the standard header <stdio. h>. The program may not have
included that header, and the standard header <assert.h> must not. A program
can define macros that rename any or all of the names from another header,
provided it doesn't include that header. That mandates a function with a
secret name to do the actual output.
The macro must expand to a void expression. The program can contain an
expression such as (assert(0 < x), x < y). That rules out use of the if
statement, for example. Any testing must make use of one of the conditional
operators within an expression.
The macro should expand to fairly efficient and compact code. Otherwise,
programmers will tend to avoid writing enough assertions. This version
involves a function call with five arguments, a space waster that can be
avoided.
Let's assume that we will add a function named _Assert to the library. (A name
of this form is reserved to implementors. The program may not contain a macro
with such a name. Even if the implementation supports external names with only
a single case, the name is still reserved to the implementor. You'll see a lot
of names of this form as part of the library code.) The first design decision
is whether to test the assertion within the function or inline. Each approach
has its merits.
Say the first argument to the function captures the result of the test:
void _Assert (test, <other parameters>);
#define assert(test) _Assert(test, <other arguments>)
The function returns immediately if its first argument is non-zero. Otherwise,
it uses the other arguments to compose a suitable error message.
Alternatively, the test can be performed inline:
void _Assert (<parameters>);
#define assert(test) ((test) _Assert(<arguments>))
The first form better enforces the requirement that the test has the proper
type. The second tolerates tests with pointer and floating types as well. (You
may enjoy that license until the day you have to move your large program to a
more restrictive implementation.)
It is hard to say in general which form generates more compact code. That
depends strongly on the implementation. The first form, however, results in a
function call on every execution. A rather ambitious global optimizer might
eliminate some calls, but don't count on it. The second calls the function at
most once, when the program is about to terminate.
I favor the second form, with the inline test, despite the weaker checking.
(The checking is no worse than what the passive version of the macro
supplies.) Many current optimizers can check a broad range of assertions at
translation time. They can often eliminate all of the code for an obviously
true assertion.
The remaining issue is how best to encode the diagnostic message. Here, the
string creation operator #x really pays off. The trick is to form all the
information into string literals at translation time. Then string literal
concatenation merges the pieces to produce a single argument.
One nuisance is that the built-in macro __LINE__ is not a string literal, but
a decimal constant. To convert it to the proper form requires an additional
layer of processing. That is performed by adding to the header a secret macro
_STR.
Listing 1 contains the final version of the standard header <assert. h>.
The definition of assert composes the diagnostic information into a single
string of the form:
xyz:nnn expression
(to use the notation of the C standard). It is a bit more compact than the
canonical form with the words "file" and "line" in it. A smart version of the
function _Assert can parse the diagnostic message and supply the missing bits
if it chooses. The version shown in Listing 2 does not, since the precise
format of the message is implementation defined.
Calling other library functions from within this one causes no problems. Any
library function may call any other. (Injecting the name of a library function
into a program via a standard header is another matter.) Because the
translator composes the diagnostic message, the simpler library function fputs
suffices. No need to invoke the full power of output formatting.
As you can see, neither the header <assert.h> itself nor the support function
_Assert involves much code. Opportunities abound, nevertheless, for going
astray. Past implementations of this facility have committed essentially every
one of the sins I outlined here. The C standard has gone a long way toward
making the use of assert more uniform. A careful implementation is needed to
finish the job.

Listing 1
/* assert.h standard header
* copyright (c) 1990 by P.J. Plauger
*/
#undef assert/* remove any previous definition */
#ifdef NDEBUG

#define assert(test) ((void)0)

#else /* NDEBUG not defined */

void _Assert(char *);

#ifndef _STR /* define stringize macro just once */
#define _STR(x) #x
#endif

#define assert(test) ((test) \
_Assert(__FILE__":" __STR(__LINE__) " " #test))

#endif


Listing 2
/* _Assert function
* copyright (c) 1990 by P.J. Plauger
*/
#include <assert.h>

#include <stdio.h>
#include <stdlib.h>

/* print assertion message and abort */
void _Assert(char *mesg)
{
fputs(mesg, stderr);
fputs(" - assertion failed\n", stderr);
abort();
}





















































Doctor C's Pointers (R)


Using The Quicksort Function




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementers of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091
or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.


For quite some time, the C library has had a sort function called qsort.
According to ANSI C, qsort now lives in the header stdlib.h. Prior to ANSI C,
this header did not exist so qsort was often declared in some other header or
more likely, explicitly declared by the programmer each time it was used.


Introduction


qsort uses the quicksort method to sort the elements of an array into
ascending order. Its prototype is:
void qsort(void *base, size_t
nmemb, size_t size,
int (*compar)(const void *,
const void *));
where base is the address of the first element of the array to be sorted;
nmemb is the number of elements to be sorted starting at address base; size is
the number of bytes in each element in the array; and compar is a pointer to a
user-supplied function to be used in element comparisons. (You may recall from
previous discussions that size_t is a type declared in a number of standard
headers besides stdlib.h; it is the unsigned integer type returned by the
sizeof operator.) The comparison function returns a negative value if the
first argument tests less than the the second, a zero if they are equal, else
a positive value. This is just like the library function strcmp.
Since a void pointer is assignment-compatible with any data pointer, the first
argument to qsort may be the address of any data type. When qsort is actually
called, the pointer passed will be converted, as if by assignment, to a void
pointer. (For most systems, such a cast results in no code being generated
since all their pointer types have the same representation.)
qsort walks through an array, given the base address, element size, and
element count, using some interesting casts. Note that base is not a const
void * since qsort typically must modify some or all of the elements in the
array passed. However, both arguments to compar are pointers to const void
since the comparison function should not modify the data it's comparing.


Sorting A List Of Integers


To ease into using qsort we'll begin by sorting an array of six ints into
ascending order.
The array of six ints has automatic storage duration. Historically, such
arrays were not permitted to have initializer lists, although ANSI C now
supports this capability. If your compiler complains, simply add the static
keyword to the declaration.
After you define the comparison function cmpia, calling qsort is quite
straightforward.
cmpia compares two ints whose addresses are passed. However, cmpia takes two
void pointers, not int pointers. Since you cannot dereference a void pointer,
a private copy of each argument is made in an int pointer. Note that without
the const qualifier on the int pointers, your compiler may issue a warning
since assigning a pointer to const to a pointer to non-const can lead to
undefined behavior. Using two int pointers, you can easily compare the
underlying ints and return the appropriate indicator, as shown.
I have developed the habit of using return 0 at the end of main, otherwise my
compiler warns me I've failed to return a value from a non-void function. I
find this message annoying so rather than ignore it, or worse yet, declare
main (incorrectly) as a void function, I take the trouble of actually
returning this exit status code even if it is never used. On some systems
(VAX/VMS, for example), zero might not signify a success value, in which case
you should either return the stdlib.h macro EXIT_SUCCESS or some other passive
value.
The variables pi1 and pi2 are really unnecessary and can be replaced by a cast
as follows (although you could argue the code is not as clear):
int cmpia(const void *pe1, const void *pe2)
{
if (*(int *)pe1 < *(int *)pe2)
return -1;
else if (*(int *)pe1 == *(int *)pe2)
return 0;
else
return 1;
}
An even simpler solution eliminates the test as well.
int cmpia(const void *pe1, const void *pe2)
{
return *(int *)pe1 - *(int *)pe2;
}
To sort the elements into descending order, you simply reverse the sense of
the comparison function as follows:
int cmpid(const void *pe1, const void *pe2)
{
return *(int *)pe2 - *(int *)pe1;

}
which produces the output:
descending integer order

array[0] = 25
array[1] = 24
array[2] = 22
array[3] = 3
array[4] = 3
array[5] = -5


The Need For Custom Comparisons


The comparison function is called directly from qsort, not from the
user-written program. One consequence of following the prescribed interface is
that you need a custom-built version of the comparison function for each type
of comparison. cmpia and cmpid can handle only ascending and descending sorts
for signed ints. They cannot handle such orders for arrays of type char,
short, or long int. Each type requires its own version.
Consider the case where the input array contains unsigned integer values (of
any type). In such cases we must revert to the if tests shown in Listing 1
since subtracting one unsigned value from another can never result in a
negative value.
If you called the comparison function directly, you could make it more
general-purpose so as to handle ascending and descending orders for a given
type. Doing so would require an extra argument, however, and qsort calls the
comparison function with only two. Of course, you could always use a global
flag. In any event, the tradeoff would be using several smaller and probably
faster, individual comparison functions versus using a few larger, more
general-purpose ones that make decisions on which way to sort.


Sorting A List Of Strings


qsort is described as sorting the elements in an array. However, what if a
given array does not actually contain the data to be sorted, but rather
pointers to that data? Listing 2 contains code in which the strings are
pointed to by an array of pointers to char.
Now this is very useful. qsort doesn't necessarily sort based on the contents
of the array elements. In fact, qsort simply traverses the underlying array a
pair of elements at a time, swapping those elements as directed by the
comparison function. And since we write the source to that function, we can
chose to interpret the data directly or to treat the element as a pointer (to
some level) to the data to be sorted.
In this example, cmpstra is given the addresses of two char pointers, hence
the (char **) casts on pe1 and pe2. The expression *(char **)pe1 has type char
* and its value is the address of the first string. Since strcmp will do the
job nicely, we call it directly.
A function that simply calls another function and does no other work should
raise a red flag -- that perhaps the first function should really be a macro.
However, you cannot take the address of a macro and qsort needs a function
address as its last argument.
To sort the strings in descending order, you simply negate the result returned
from strcmp.
int cmpstrd(const void *pe1, const void *pe2)
{
return -strcmp(*(char **)pe1, *(char **)pe2);
}
The output will be
descending string order
array[0] = xyz
array[1] = abc
array[2] = Xy
array[3] = Abc
array[4] = 1234


Sorting Structures


Sorting an array of structures is a little more complicated. In this example
each element in the array of structures contains an object count and
description for some arbitrary objects, in this case furniture pieces. It
might be useful to sort the structures into either ascending or descending
order by count or by description, at the user's pleasure. Listing 3 contains
one solution.
The array cmpfun is a list of function pointers which correspond to the option
entered, making it easier to setup the last argument to qsort.
The validation of the input option is hardly robust, but adequate for this
example.


Using The offsetof Macro


Note how each comparison function in Listing 3 is locked into the structure
type. The explicit struct pointer cast is needed to access the correct member
within that structure. Consider the case where each structure contains a
number of members of the same type. It should be possible to use a
general-purpose comparison function that works for all of them. This
arrangement, however, would require a global variable the user could set to
indicate the offset of the member to be compared.
By using the address passed into the comparison function, you could perform
some fancy pointer casts to access any given type of object at any offset from
the structure's beginning. In fact, given the structure address as a void
pointer and the offset, you don't even need to know the structure's type. That
is, the comparison function would work for all structure types containing a
given member type. You would compute the offset value using the offsetof macro
in stddef. h. Sounds great, let's give it a try.
The function cmpstia in Listing 4 contains some real pointer expressions. The
program actually works and in the bargain, I've finally found a use for
offsetof. In fact, I'm sure the code is portable, as well as ANSI-conforming.
This approach can, in fact, be used to sort any ints; they don't even have to
be in a structure. To sort an array of ints, simply define offset to be zero
so that the pointers passed to the comparison function are treated as actually
pointing to an int rather than to the base of a structure containing an int.


Multikey Sorts



qsort sorts elements according to one key. To sort elements using two keys,
you must first sort them on the primary key. Then for each unique primary key
value, you count up the number of duplicates and use qsort to sort them once
you make base point to the first in that sublist. You then repeat the process
for other keys.
Consider the case of an array of pointers to strings. You wish to search first
on columns 1-4 then on 10-15. You could use strncmp for the comparisons for
both keys, but you would need a global variable to notify strncmp where to
begin comparing.
The code that lets your comparison function know where to start is relatively
straightforward although I'm not sure this method is the best way to write an
efficient multikey sort.


Special Sorting Orders 


There are other sorting sequences than those used here. For example, a
dictionary sort would ignore case and possibly punctuation as well. A
telephone directory sort might treat names beginning with 'Mac' and 'Me' the
same. A phonetic sort would ignore certain letter patterns resulting in more
duplicates. How could you implement these with qsort?
The comparison function must make a private copy of the converted input data
before it performs the comparison. In no case is the function permitted to
modify the elements to which its arguments point. (They have the const
protection) And in any event, you don't want to modify them since you could
not restore them later on.
If you could know the maximum length of the strings, you could allocate a
local buffer (automatic if the code is to be reentrant, else static is OK) to
store converted strings prior to comparison. If this size was not known, you
could use strlen and malloc to allocate memory dynamically. However,
allocating and freeing memory every time the comparison function is called
could be quite expensive. And an even bigger issue is converting the same
input strings many many times, since the conversion is done each time the
comparison function is called.
A much better solution would be a two-dimensional array of pointers. The first
pointer in each row would point to the original string, and the second would
point to the converted string. Then you sort based on the second pointer
finally traversing the array via the first pointer to get the sorted strings.
That is, the comparison function is not involved in the string conversion at
all.

Listing 1
#include <stdio.h>
#include <stdlib.h>
#define NUMELEM(a) (sizeof(a)/sizeof(a[0]))

main()
{
int cmpia(const void *, const void *);
int array[] = {25, 3, 22, -5, 3, 24};
int i;

qsort(&array[0], NUMELEM(array), sizeof(int), cmpia);

printf("ascending integer order\n");
for (i = 0; i < NUMELEM(array); ++i)
printf("array[%d] = %2d\n", i, array[i]);

return 0;
}

/* compare ints in ascending order */

int cmpia(const void *pe1, const void *pe2)
{
const int *pi1 = pe1;
const int *pi2 = pe2;

if (*pi1 < *pi2)
return -1;
else if (*pi1 == *pi2)
return 0;
else
return 1;
}

Output:
ascending integer order
array[0] = -5
array[1] = 3
array[2] = 3
array[3] = 22
array[4] = 24
array[5] = 25



Listing 2
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NUMELEM(a) (sizeof(a)/sizeof(a[0]))

main()
{
int cmpstra(const void *, const void *);
char *array[] ={"abc", "Xy", "1234", "Abc", "xyz"};
int i;

qsort(&array[0], NUMELEM(array), sizeof(char *), cmpstra);

printf ("ascending string order\n");
for (i = 0; i < NUMELEM(array); ++i)
printf("array[%d] = %s\n", i, array[i]);

return 0;
}

/* compare strings in ascending order */

int cmpstra(const void *pe1, const void *pe2)
{
return strcmp(*(char **)pe1, *(char **)pe2);
}

Output:

ascending string order
array[0] = 1234
array[1] = Abc
array[2] = Xy
array[3] = abc
array[4] = xyz


Listing 3
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define NUMELEM(a) (sizeof(a)/sizeof(a[0]))

struct tag {
int count;
char *desc;
};

main()
{
int cmpstia(const void *, const void *);
int cmpstid(const void *, const void *);
int cmpstsa(const void *, const void *);
int cmpstsd(const void *, const void *);

static struct tag array[] = {
{123, "beds"},

{27, "tables"},
{17, "chairs"},
{95, "desks"}
};
int (*cmpfun[])(const void *, const void *) = {
cmpstia, cmpstid, cmpstsa, cmpstsd
};
int option = 0;
int i;

puts("1. Sort by ascending count");
puts("2. Sort by descending count");
puts("3. Sort by ascending description");
puts("4. Sort by descending description\n");

while (1) {
printf("Enter option: ");
scanf("%1d", &option);
if (option < 1 option > 4)
puts("Invalid option");
else
break;
}

qsort(&array[0], NUMELEM(array), sizeof(struct tag),
cmpfun[option - 1]);

for (i = 0; i < NUMELEM(array); ++i)
printf("array[%d]: count = %3d, desc = >%s<\n",
i, array[i].count, array[i].desc);

return 0;
}

/* compare ints in ascending order */

int cmpstia(const void *pe1, const void *pe2)
{
return ((struct tag *)pe1)->count -
((struct tag *)pe2)->count;
}

/* cmpstid ints in descending order */

int cmpstid(const void *pe1, const void *pe2)
{
return ((struct tag *)pe2)->count -
((struct tag *)pe1)->count;
}

/* compare strings in ascending order */

int cmpstsa(const void *pe1, const void *pe2)
{
return strcmp(((struct tag *)pe1)->desc,
((struct tag *)pe2)->desc);

}


/* compare strings in descending order */

int cmpstsd(const void *pe1, const void *pe2)
{
return -strcmp(((struct tag *)pe1)->desc,
((struct tag *)pe2)->desc);
}

Output:

1. Sort by ascending count
2. Sort by descending count
3. Sort by ascending description
4. Sort by descending description

Enter option:1
array[0]: count = 17, desc = >chairs<
array[1]: count = 27, desc = >tables<
array[2]: count = 95, desc = >desks<
array[3]: count = 123, desc = >beds<

Enter option:2
array[0]: count = 123, desc = >beds<
array[1]: count = 95, desc = >desks<
array[2]: count = 27, desc = >tables<
array[3]: count = 17, desc = >chairs<

Enter option:3
array[0]: count = 123, desc = >beds<
array[1]: count = 17, desc = >chairs<
array[2]: count = 95, desc = >desks<
array[3]: count = 27, desc = >tables<

Enter option:4
array[0]: count = 27, desc = >tables<
array[1]: count = 95, desc = >desks<
array[2]: count = 17, desc = >chairs<
array[3]: count = 123, desc = >beds<


Listing 4
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <string.h>
#define NUMELEM(a) (sizeof(a)/sizeof(a[0]))

struct tag1 {
int m1;
int m2;
int m3;
};

struct tag2 {
int m1;
int m2;
};

int offset; /* int member offset for comparison */


main()
{
int cmpstia(const void *, const void *);
struct tag1 st1[] = {
{1, 2, 3},
{3, 1, 2},
{2, 3, 1}
};
struct tag2 st2[] = {
{7, 9},
{5, 3},
{2, 6},
{6, 1}
};
int i;

/* sort st1 on m1 */

offset = offsetof(struct tag1, m1);
qsort(&st1[0], NUMELEM(st1), sizeof(struct tag1), cmpstia);

for (i = 0; i < NUMELEM(st1); ++i)
printf("st1[%d]: %d, %d, %d\n", i, st1[i].m1,
st1[i].m2, st1[i].m3);

/* sort st1 on m3 */

offset = offsetof(struct tag1, m3);
qsort(&st1[0], NUMELEM(st1), sizeof(struct tag1), cmpstia);

putchar('\n');
for (i = 0; i < NUMELEM(st1); ++i)
printf("st1[%d]: %d, %d, %d\n", i, st1[i].m1,
st1[i].m2, st1[i].m3);

/* sort st2 on m2 */

offset = offsetof(struct tag2, m2);
qsort(&st2[0], NUMELEM(st2), sizeof(struct tag2), cmpstia);

putchar (' \n' );
for (i = 0; i < NUMELEM(st2); ++i)
printf("st2 [%d]: %d, %d\n", i, st2[i].m1, st2[i].m2);

return 0;
}

/* compare ints in ascending order */

int cmpstia(const void *pe1, const void *pe2)
{
return *(int *)((char *)pe1 + offset) -
*(int *)((char *)pe2 + offset);
}

st1[0]: 1, 2, 3
st1[1]: 2, 3, 1
st1[2]: 3, 1, 2


st1[0]: 2, 3, 1
st1[1]: 3, 1, 2
st1[2]: 1, 2, 3

st2[0]: 6, 1
st2[1]: 5, 3
st2[2]: 2, 6
st2[3]: 7, 9






















































Implementer's Notebook


Tool Command Language




Don Libes


Don Libes is a computer scientist at the National Institute of Standards and
Tecnology. He is also the author of Life With UNIX, published by
Prentice-Hall. His electronic mail address is libes@cme.nist.gov. He can also
be reached at NIST, Bldg. 220, Rm A-127, Gaithersburg, MD 20899.


This column discusses a set of subroutines that are helpful in building tools
which require "startup" files.
You are probably already familiar with many tools that have startup files. For
examples UNIX shells such as the C-shell read a file called .cshrc upon
startup. Similarly, the DOS command interpreter reads a file called
COMMAND.COM. (Sometimes these are called "initialization", "configuration", or
"rc" files.)
Many tools take startup files that are executed at runtime. A few are truly
general purpose, such as those used by shells or emacs. Most, unfortunately,
are quite simplistic. Typically, you can set options, but that's about it.
Many programs would benefit from a more flexible scheme, but evaluating a
high-level language startup script requires a lot of code -- code that could
be considered unnecessary overhead.
Is it overkill for a tool to have a powerful startup file? Many individuals
might be tempted to automatically answer "yes". The problem is that we usually
phrase the design question as "What minimal subset of language capabilities
are necessary for this new tool I am writing today?"
Indeed, few people use more than a fraction of the programming capabilities
provided by such tools. Yet, this power is what makes these tools so much more
attractive than their predecessors. Of course, no one wants to dump lots of
unused code into a small project. But experience with highly configurable
programs, such as editors and shells, suggests that highly configurable
utilities provide answers to questions that their designers never imagined.


Tcl -- Tool Command Language


At the University of California at Berkeley, John Ousterhout addressed this
problem with a subroutine library which interprets language elements for
tools. Reading a startup script is a natural application of it.
The subroutine library is called Tcl. (Sydney Weinstein mentioned it in his
June '90 CUJ column). Tcl stands for "Tool Command Language" and is a small
interpreted system that is procedural and command-oriented. The language
resembles a mixture of C and the shell, although elements are derived from
other languages. For example, the following Tcl fragment swaps two variable
values.
if {$a < $b} {
set tmp $a
set a $b
set b $tmp
}
set and if are built-in keywords. Tcl has about 30 other keywords for language
and string operations. Conversion to numerics is automatic, much like SNOBOL.
Tcl also allows users to declare new procedures and variables. The syntax and
semantics of the recursive factorial procedure
proc fac x {
if {$x == 1} {return 1}
return [expr {$x *
[fac [expr $x-1]]}]
}
are sufficiently close to C and the shell that I am not going to go into
detail about the user's view of Tcl. However, it is completely defined by
Ousterhout [1]. Instead, I will focus on the programmer's view. 


Tcl -- Designed To Be Embedded


The nice thing about Tcl from a C programmer's standpoint is that Tcl is
designed to be embedded inside applications. Tcl supports this in three ways.
1. You can add new commands to the language.
For example, if you are writing a program to communicate with a remote system,
you might want to add a "dial" command. The dial command could take a phone
number argument and dial a remote system. You can also delete or replace
existing commands.
2. You can control evaluation of commands.
A startup script is just one way of executing commands. Another way might be
to take them from the command-line and feed them to the interpreter or you can
synthesize commands dynamically. You decide.
3. You can read and write the user's environment.
For example, if the user has the command set timeout 30 in the script of your
communications program, you can use that value in the dial procedure to
timeout after 30 seconds without an acknowledgement.
You can also set values that the user can access. For example, your
communications program might wish to save any results from the remote system
in a variable. The user could then reference this variable to use the results
when deciding on the next action.
Tcl operates on a per-interpreter basis. Yes, you can actually have multiple
interpreter instances, each with a different command set. (Perhaps the author
of Tcl read my previous column?)


An Example



When embedding Tcl you must first define your new application-dependent
commands. For example, when writing a communications program, you could add a
"send" and "expect" command to help do automatic dialing.
cmdSend (clientData, interp, argc, argv )
char *clientData;
int argc;
char **argv;
{
fprintf(modem,argv [1] );
}
All commands take four parameters. The first is a pointer to anything you want
a particular command invocation to have access to. The second parameter is a
pointer to the current interpreter (necessary in case you refer to variables
-- you must access the variables from the right interpreter instance). The
remaining two arguments resemble the argc and argv used by a C main. In fact,
you can even call getopt to process these last two parameters if you expect
flag-style arguments!
cmdSend is very simple. It just sends the first argument on to the modem. A
corresponding cmdExpect would be a lot more complicated. A simple version
might consist of a loop that reads from the modem, testing input against the
user-supplied argument. Thus, to dial a Hayes modem, the user startup script
might look like:
send ATZ\r
expect 0K
send ATD12016442332
expect CONNECTED
send \r\r
expect login:
send don\r
expect password:
send swordfish\r
A more sophisticated expect would offer a way to timeout or match against
multiple strings. You could test for what was returned and take different
actons on different inputs.
Assuming the send and expect commands are defined, an interpreter can be
defined as follows:
interp = Tcl_CreateInterp( );
Now add the previously defined commands to the new interpreter:
Tcl_CreateCommand(interp, "send",
cmdSend,(ClientData)0,deleteProc);
Tcl_CreateCommand(interp, "expect",
cmdExpect,(ClientData)0,deleteProc);
The arguments to Tcl_CreateCommand are straightforward. The first one
specifies the interpreter instance to add the command to. This is what allows
multiple interpreters with different command sets. The second argument is the
name of the command for Tcl's own use. The third argument is the
programmer-defined procedure which is executed when the interpreter encounters
the new command. ClientData is a pointer that allows an arbitrary object to be
passed to the programmer-defined functions. One use for this is to
differentiate between two commands that are implemented by the same procedure.
The remaining argument is a procedure to be invoked before your command is
deleted.
The interpreter is ready for use after these three calls. Feeding commands to
it from the startup file is a reasonable thing to do. Assuming the startup
file was opened as standard input, the following while loop will evaluate all
the commands.
while (EOF != (fgets(stdin,buf,BUFSIZ))) {
Tcl_Eval(interp,buf);
}
Tcl_Eval breaks arguments into the argc/argv style, executes the appropriate
command and handles errors. (A robust implementation would also check the
return value of Tcl_Eval for errors.) After this we can process anything else
that needs to be done in the main procedure.
That's the whole thing!


Other Tcl Interfaces


Now you should be able to see that processing startup files is just one way of
using Tcl. It is actually much more flexible than that. You can also execute
Tcl commands directly from within C. You can also read and write user
variables. For example, you could call the Tcl set command by the name cmdSet.
cmdSet receives its arguments in exactly the same way as cmdSend. Thus, you
have easy access to every variable that the user changes, and you can set
variables that the user will read.


Actual Experience


I recently used Tcl in a very large program, and am extremely pleased with it.
Tcl made my program much more useful that it ever would have been. It isn't
that I didn't have the skill. I just wasn't about to write a flexible language
for the tool that I was writing.
But since Tcl was already written, I could just drop it in, with zero cost to
me. This is much like using getopt to perform argument parsing. If getopt
wasn't around, most of my programs would have almost no options. But since it
is so easy, I don't think twice about whether I should have command-line
options or not. Similarly, I realize now that I will use Tcl again and again.
It has become yet another tool in my bag of tricks.
Like getopt, Tcl doesn't do everything. But it does make 90% of the
interesting cases very easy. If all utilities used Tcl, everything would be a
lot more flexible and easy to use.
The program I built sing Tcl is interesting in its own right. It is called
expect and is used to control interactive dialogues. expect is similar to what
I suggested earlier with the send/expect commands, however expect is not
restricted to just communication tasks. While some programs (e.g. Kermit,
Procomm) provide some of the capabilities of expect, they are oriented toward
communications and aren't nearly as flexible in terms of controlling
alternatives. You can read more about expect in [2].
expect relies heavily on Tcl. Tcl does all the work at deciding how to flow
through the script. What's left is setting up the connections and pattern
matching to guide the script. All in all, it feels like Tcl does 80% of the
work.
In my program, Tcl required about 45K. This seems like a lot at first but it
is reasonable considering what it does. In systems where libraries are shared,
such as UNIX and OS/2, concurrent programs that use Tcl would be much cheaper
than individual programs using their own interpreter, no matter how
simple-minded.


Where To Get Tcl


Tcl is available through the Internet by anonymous ftp from
ucbvax.berkeley.edu as pub/tcl.tar.Z. This distribution contains the complete
Tcl documentation. expect is also available through the Internet by anonymous
ftp from durer.cme.nist.gov as pub/expect.shar.Z. You may have either emailed
to you by sending the message send pub/tcl. tar.2 or send pub/expect.shar. Z
to library@durer.cme.nist.gov. The author may be contacted directly for
unusual mail, media, or format requests: John Ousterhout, Computer Science
Division, Electrical Engineering and Computer Sciences, University of
California at Berkeley, Berkeley, CA, 94720. His electronic mail address is
ouster@sprite.berkeley.edu. Tell him what kind of system you have and what
media formats you can handle or what public-access networks you can use.

Tcl is in the public-domain. The author claims that Tcl is highly portable
among UNIX systems. I suspect that most of it can be ported as is to DOS
systems as well.


Conclusion


I found Tcl extremely easy to use and highly adaptable. Just about any program
you write can benefit from it. Unless you write the most trivial applications,
Tcl is helpful and can avoid messy entanglements with lex and yacc.
References
[1] John Oustzerhout, "Tcl - An Embeddable Command Language", Proceedings of
the Winter 1990 USENIX Conference, Washington, D.C., January 22-26, 1990.
[2] Don Libes, "expect: Curing Those Fits of Uncontrollable Interaction",
Proceedings of the Summer 1990 USENIX Conference, Anaheim, California, June
11-15, 1990.





















































Questions & Answers


Locking UNIX Files




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


You may fax questions for Ken to (919) 493-4390. When you hear the answering
message, press the * button on your telephone. Ken also receives email at
kpugh@dukeac. ac. duke. edu (Internet) or dukeac! kpugh (UUCP).


Q
This is a UNIX question. It is a question I believe could concern every C
programmer who works with Unix. Specifically, I work on an Altos 2086 using
XENIX 3.4b. If I remove a file (for example using unlink() ) that someone else
already has open for writing, what happens if they perform a write after I
unlink the file (assuming I've removed the last link)?
Reads and writes can be performed on removed files if the remove took place
after the file was open. For example, if when using the more command to read a
file, I invoke a shell and remove the file I am more'ing, the more command
doesn't notice any difference and keeps on reading. Does the system notice? 
My big concern is the write process. For example, if I remove a log file while
someone else has it open for writing, and that person or process performs a
write, is the system smart enough to avoid overwriting someone else's precious
data?
One other question: under MS-DOS if I open a file for reading
(fopen("path","r")) and accidentally open the same path for writing
(fopen("path","w")), the disk (standard 5.25") file system for that disk
sustains major damage. Why can't MS-DOS prevent this? What, besides great
care, can be done to avoid this? XENIX handles this problem without file
system damage, although the results are never what is desired.
Vern Martin
Alliance, Ohio
A
My UNIX manual says "If one or more processes have the file open when the last
link is removed, the removal is postponed until all references to the file
have been closed".
If the request to delete the file is postponed till after the last close, then
file writing and reading can proceed normally. Records written by one process
can be read by another. Writing over other allocated disk blocks should not be
a problem.
Two processes that write to the same file may have conflicts. You should use
the locking() function to reserve the right to access a particular portion of
the file. This lock function interface is:
int locking(file_descriptor,
mode, size)
int file_descriptor;
int mode; /* 0 Lock remova l*/
/* 1 Blocking lock */
/* 2 Checking lock */
long size; /* Number of bytes*/
Each version of UNIX has slightly different file locking characteristics.
Perhaps some UNIX guru out there can answer this question in more detail.
Regarding your MS-DOS question, do not attempt to open the same file twice for
writing, except in the privacy of your own home and unless you have made full
backups of your entire disk and you are prepared to reformat the disk.
I can only guess that since MS-DOS was not designed as a multitasking system,
it's designers assumed that a file would never be opened twice. However
products like DESQview can do some checking along these lines. (I use DESQview
extensively as a multitasking environment running over MS-DOS.
My old Wordstar does not have multiple buffers. When I open three or four
Wordstar windows to edit multiple files, with fast keystroking I sometimes
open up multiple copies of the same file.
I suppose DESQview does some checking for me since I haven't had to reformat
the disk. Wordstar creates a couple of temporary files when a document is
opened. With multiple copies being edited, the contents in these files may
become garbage, but disk allocation (as tested with CHKDSK) does not have
problems. 
So it is possible to protect against this source of disk corruption -- but not
with straight MS-DOS. Now I have a question: does anyone know if MS-DOS v4.0
differs in its response to multiple opens? 
Q
I have discovered that many short, seemingly simple programs simply do not run
correctly until you insert a printf() for debugging purposes. I have noticed
that the position of the printf() with respect to the suspect code also
affects the outcome. Can you explain this? Thanks.
Mark Petrovic
Stillwater, OK
A
You either have the pointer blues or the compiler bug. The latter is a rather
rare item, so I suspect a pointer problem.
Most probably, you are wiping out some code by writing through an
uninitialized pointer reference. The addition or deletion of some printf's
changes the position of the code causing the damage to occur at a more benign
location. 
I recommend Robert Ward's book on Debugging C as the best place to start in
tracking down these errant pointer problems. 


Replies




Repeat Format



In the "Q?/A!" column in The C Users Journal, Vol 8, No. 3., a question was
asked about repeat format constructs in C. I devised a function in Microsoft
QuickC that will take a format string constructed as you suggested and convert
it to a valid format string for printf() with the repeated information.
Enclosed is a listing (Listing 1) of the function, and a short main program
with sample repeat format strings that demonstrate the function's ability to
generate valid printf strings. The function is rather long, but I believe it
handles most all of the conditions that were set forth. It would be nice if a
compiler could be designed so that a program is compiled directly with the new
format string, instead of generating it at runtime.
One problem that may exist in my routine is that it returns a pointer to a
local string that effectively disappears upon its return (the data is still
there though). It is recursive, so a static buffer wouldn't work. If you have
any comments or improvements let me know.
Doug Oliver
Wichita, Kansas


Indentation Styles


I recently read your article (Q?/A!) in The C User's Journal (April 90). In
the article you present several styles for the layout of braces within C
programs. I also saw the similar discussion in the C Gazette (as you noted). I
have a similar style to the one that you use, but with one exception; the
braces are "undented" from the code by one space. For example,
/* Sample if statement */
if (a 0)
{
a = 0;
b = 1;
}
else
{
a = 1;
b = 0;
}
This style has all of the plusses that yours has, but it has one more.
Sometimes it is necessary to line up braces physically, with pen and paper.
I've been using C for nine years, and maybe there will come a time when I
never have to do this (but I doubt it!). However, on those rare occasions when
this depressing activity becomes necessary, my indenting style helps quite a
bit. One may draw lines, connecting the braces, without ever going through any
code. This style evolved from an actual problem I encountered where two
variable names were different only in the first character (i.e., eflag and
aflag). When I drew lines to verify the blocking, they went through the first
character of the code. It just happened that this hid a place where my flags
were backwards. I continued to use that printout while I was debugging, but I
couldn't see the first character so it took me a while to find the problem.
When I did, I was a bit ticked off (mild understatement) so I decided to
change my style to prevent this in the future. Since then, the problem has
never re-occurred. Maybe that's the indenting style, maybe I'm a better
programmer. Probably it's a bit of both. Nevertheless, the style serves me
well, and is as readable and usable as any other I've come across, more so
than most (in my somewhat biased opinion).
On a related note, I have also given up using tabs (except for a single
initial tab) for indentation. I move my code between systems on a fairly
regular basis, and it never fails to surprise me just how many ways of
handling tabs and spaces there are in various editors and printers. As a
result, my indenting style consists of a single tab for the first level of
code, and three spaces for each following level (with braces being undented by
a single space). 
Anyway, I just wanted to provide you with another possible style for future
consideration, and to let you know that I think your column is very good. 
Brian S. Merson
Marlboro, MA
Thank you for your contribution. Your style has a lot going for it. If
pretty-print programs would output it, I think I would prefer it to mine. As
far as typing it originally, my fingers would probably refuse to learn it.
(KP) 
To improve the readability of the if () ... code it's convenient to
distinguish between the case when else is present and the case when it is not
present. This allows to expect or not expect else (which may appear on the
next page) already when reading if () ... statement.
Therefore I use a separate line for an open brace when if () is followed with
else. When else is not expected the open brace is on the same line as the
controlling statement. 
So, the codes look like these:
if (...) {
...
}
if (...)
{
...
}
else
{
...
}
David Finkelman
Israel
This also has some possibilities. My only objection would be that adding an
else to an if, would require going back and changing the line on which the if
appears. (KP)


Great Name / Obscure Code Contest


Here is a submittal (Listing 2) for the all new "Great Name/Obscure Code"
contest that you mentioned in the April 1990 issue. Notice the "unique"
function names used in the main program. The names of the functions convey so
little meaning that the author has used printf statements to document what is
going on. Since this code was headed for an imbedded controller environment,
the printf's are destined to be removed, leaving an impressive mystery for the
unfortunate soul that inherits this piece of "code". This should be an
exciting contest and I hope this is an interesting enough entry to compete. I
can't wait to read the results!!
David Sterling
Boulder, CO


Microsoft Floating Point Format



Old versions of Mbasic use a proprietary format called Microsoft Binary
Format. Quick Basic 4.5 and up use IEEE format but can read/write the old
format with a special set of functions. Programs can also be compiled to use
the old format by compiling with option /mbf. 
A couple of years ago I had to write a query-utility in C to query the
database of a program that calculates salaries and taxes. No information on
the MBF-format was available but I was able to reverse engineer it. The format
is very similar, but not identical to IEEE.
For example MKD$ creates an eight-byte string. The first byte is a sign bit.
Then follows a seven bit exponent followed by seven bytes of mantissa.
Enclosed you will find functions that read four and eight-byte floats (Listing
3). They should be self-explanatory.
Anders Gustafsson

Listing 1
#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <conio.h>

/* repeat_format() - Function that generates from a string with
repeat count information, a new format string for printf().
*/

char *repeat_format(char *format)
{
char newfmt[256], tempstr[80], *fptr, *nptr, *tptr;
int cnt, pcnt;

for (nptr = newfmt, fptr = format; *fptr; fptr++)
{
if (*fptr == '%')
{
/* if percent specifier found */
tptr = tempstr; /* hold text in temporary string */
cnt = 0; /* keep track of repeat count */
*tptr++ = *fptr++;
while (isdigit(*fptr))
{
/* scan till non-digit found */
*tptr++ = *fptr;
cnt = cnt * 10 + (*fptr - '0');
++fptr;
}
*tptr = 0; /* terminate temporary string */
if ((*fptr == '(') 
(*fptr == '%') 
(*fptr == ' '))
{
/* data should be repeated*/
if (*fptr == '(')
{
/* if data is enclosed in parenthesis */
pcnt = 0; /* count parenthesis */
tptr = tempstr;
++fptr; /* skip past opening paren */
for (;;)
{
if (*fptr == 0)
{
break;
}
else if (*fptr == ')')
{
if (pcnt == 0)
{
break;

}
else
{
--pcnt;
*tptr++ = *fptr++;
}
}
else if (*fptr == '(')
{
++pcnt;
*tptr++ = *fptr++;
}
else
{
*tptr++ = *fptr++;
}
}
*tptr = 0; /* use recursire call to format */
strcpy(tempstr, repeat_format(tempstr));
/* string within parenthesis */
}
else
{
/* else data not enclosed in parenthesis */
tptr = tempstr;
for (;;)
{
/* scan till type specifier found */
*tptr++ = *fptr;
if (strchr(" duoxXfeEfGcs", *fptr))
break;
++fptr;
}
*tptr = 0;
}
for (; cnt; cnt--)
{
/* now copy repeated info */
for (tptr = tempstr; *tptr; tptr++)
{
*nptr++ = *tptr;
}
}
}
else
{
/* wasn't repeated info, copy exactly, and scan on */
for (tptr = tempstr; *tptr; tptr++)
{
*nptr++ = *tptr;
}
*nptr++ = *fptr;
}
}
else
{
*nptr++ = *fptr;
}
}

*nptr = 0;
return(newfmt);
}
char *fmtstr[] =
/* some sample format strings to be converted */
{
" ABC %5%d",
"%15 %3%d",
"%5 (X=%d%6 AAA=%4(%c ) %d",
"X%d %2 ( A%2d B%4%c ) %3d %2( %2( C%c D%2d))",
"%2(%3(%3c %d ))",
""
};

void main ()
{
int i;
char newstr [250];

for (i = 0;;i++)
{
if (fmtstr[i) [0] == 0)
break;
strcpy(newstr, repeat_format(fmtstr[i] ));
/* copy it for use locally */
printf ("\r\n\"%s\" = \"%s\"", fmtstr[i), newstr);
if (getch() == '\x1b')
break;
}
}


Listing 2
#define true 1
#define false 0
main()
{
int initialization = true;
if (initialization == true)
{

/*Start the memory tests. */

printf("Call RANDOM ACCESS MEMORY test function x.1.1.1.\n");
x1_1_1();

printf("Call MASS MEMORY test function X.1.1.2.\n");
x1_1_2();

printf("Call FPROM PROGRAM MEMORY test function X.1.1.3.\n");
x1_1_3();
/* End the memory tests. */

printf("Call I/O SETUP function X.1.2\n");
x1_2();

printf("Call SET HARDWARE PARAMETERS function X.1.4.1.\n");
x1_4_();


printf("Call SET SOFTWARE PARAMETERS function X.1.4.2.\n");
x1_4_2();

printf("The END of ALL INITIALIZATION functions.\n");

printf("Enable the Programmable Interrupt Controller.\n");

initialization = false;

}
}
********


Listing 3
double cvd(s)
char *s;
/* Function to read an 8 yte string*/
/* representing a BASIC floatwritten*/
/* by MKDS. Entry = string*/
/* Return = value*/
{
double sum;

double t, tt, div;
int cnt, exp, pos, n, negative;

sum = 0;
exp = s[7]; /*exponent*/
negative = s[6] & 0x80; /*Save signbit*/
s[6] = s[6] & 0x7f; /*Mask signbit*/
exp -= 0x81; /*Subtract offset*/

/* printf("\nExp:%d T:%4.4f",exp, t); */
sum += t;
cnt = 7;
div -- 128.0;
pos = 8;
while (cnt--)
{
n=s [pos];
tt=t/div;
/* printf("\n&d : n 4.4f t-t4.4f tt=%d.df",pos,n,t,tt);*/
sum += tt*n;
div *= 256;
--pos;
}
if (negative)
sum = -sum;
return(sum);
}

double svs(s)
char *s;
/* Function to read a 4 byte string */
/* representing a BASIC float written */
/* by MKSS. Entry = string */
/* Return = value */
{

double sum;
double t, tt, div;
int cnt, exp, pos, n, negative;

sum = 0;
exp = s[3]; /*Exponent */
negative = s[2] & 0x80; /*Save signbit */
s[6] = s[2] & 0x7f; /*Mask */
exp -= 0x81; /*Subtract offset */
t = pow(2.0, (double) exp);
/* printf("\nExp:% t :%4.4f", exp, t); */
sum += t;
cnt = 3;
div = 128.0;
pos = 2;
while (cnt--)
{
n = s [pos];
tt = t / div;
/* printf("/\n%d : n=%4.4f t=%4.4f tt-%4.4f",pos,n,t,tt);*/
sum += tt * n;
div /= 128;
--pos;
}
if (negative)
sum = -sum;
return(sum);
}



































Applying C++


Hiding The Implementation, Part 1




Tsvi Bar-David


Tsvi Bar-David is president of Deerworks, a C++ and Object-Oriented design
training company, and currently a faculty member in the Software Engineering
Department at Monmouth College. He received his PhD in mathematics from the
University of California at Berkeley. Previously, he was employed at Bell Labs
in the development and delivery of UNIX, C++, and Object-Oriented courses. He
can be contacted at 411 Valentine St., Highland Park, NJ 08904 (201) 745-7458.


This column is dedicated to the premise that the correct way to use C++ is to
hide the implementation of functions, types and objects at all costs.
By hiding the implementation, the only way one programming entity can access
or modify another entity is by using an interface of (member) functions. If
you substitute the term "object" for "entity", you recover a fundamental
principle of object-oriented programming: the only way to inspect or modify an
object is to send it a message. In C++ terms, the programmer invokes a member
function in the public interface of the object.
Although C++ supports implementation hiding, the language does not compel the
programmer to do so. In some cases, the C++ programmer must make an extra
effort to hide the implementation. Programmers should be made aware of the
benefits of implementation hiding, in order to motivate them to be good
programming citizens.


The Benefits


The benefits of implementation hiding fall into two major categories:
abstraction and re-use.
Abstraction (conceptual simplicity): it is easier to design and implement
systems built from entities which observe implementation hiding, because the
programmer concerns himself with only the procedural interface properties of
the various entities. These properties are typically simpler (more abstract)
than the algorithms that implement them.
Re-use: decoupling the interface from the implementation allows the programmer
to re-implement program entities without modifying any other part of the
program. This modularity also enhances portability across operating systems
and hardware platforms, allowing you to re-use more of the code that you
write. Reusing code decreases the cost of maintenance, which is now more than
half the cost of any software product.
To solidify the notion of implementation hiding, I'll apply its principles to
various features of C++.
Part 1 of this column will cover ordinary functions, private and public
visibility within the class template, public class derivation and semantic
hierarchy, and private derivation.
Part 2 will cover protected visibility and class derivation, friend functions,
and static data members and global variables.


Ordinary Functions


A prevalent opinion in the programming community states that one must
completely throw away the procedural mindset when doing object-oriented
programming. This is not so. When doing a strictly object-oriented design, the
programmer encapsulates all functionality as operations in the public
interface of one type of object or another. These operations (member functions
in C++) are, after all, named procedures that hide their implementation, so
programmers can apply most of their intuition about functions to the public
interface of types. C++ supports but does not compel programmers to program in
the object model. C++ still provides full support for procedural-style
programming. Ordinary functions in C++ hide their implementations. Users can
pass a function's arguments, invoke the function and inspect its return value
(if any). Thus, a user can only see the function's return value and any
possible side effect. The body of a function can access only its arguments,
its local variables and global variables.
A subtlety: If you pass-by-reference an argument to a function, then the
function can modify that argument in the caller's context, potentially
producing side affects. In a previous column (October 1989), my colleague
David Bern and I wrote how to provide reference semantics without pointers
using the ampersand & in the functional prototype. For example:
void swap( double &x, double &y)
{
double t = x;
x = y;
y = t;
}
The swap function makes the following piece of code perform a square root
operation on a variable with a negative value.
...
double a = 1.0, b = -1.0;
swap( a, b);
b = sqrt( a);
...
In contrast, if all the arguments to a function have value semantics, at
worst, the function modifies a local copy of the argument's value. Using
references and not pointers in C++, the programmer can't tell whether or not a
function modifies variables in its caller, short of looking at the manual
page. Reference semantics may provide an undesirable coupling between a
function and its caller that can only be resolved by carefully reading the
manual page.
The classic example of procedural abstraction is a sort function
void sort( int *v, int dim);
which, given a vector v of integers of dimension dim, sorts v in place into
ascending order. Many algorithms (i.e., implementations) which will provide
the specified result (a sorted vector), among them bubble sort and quicksort
[1]. The very name of the function hides its implementation and insulates
users from the details of the selected algorithm. Indeed, all the user need
know is the function's name and its manual page specification. Liskov [3]
calls this procedural abstraction.


Private And Public Visibility Within The Class Template


Implementation hiding equals data hiding for C++ classes. Member functions,
like ordinary functions, hide their implementations. The only question
remaining for a member function is whether or not its name is visible. The
data members represent the state of an object. For a class to observe
implementation hiding, its data members must be private so that public member
functions provide the only access to the object state. Indeed, the public
member functions constitute precisely the public interface mentioned in the
definition of implementation hiding. Notice that for a class, implementation
hiding means hiding both the data members and the code of the member
functions, as in the example class Stack below.

In contrast, a counter-example of data exposure for class Stack.
class Stack { // Stack of integers
public: // these members
are public
...
int sp; // sp 'points' to
next available slot
int elt[size];
};
makes sp public. Nothing prevents a malicious or ignorant programmer from
directly setting sp to a nonsensical value, such as -6.
Stack stk;
...
stk.sp = -6.
Here the benefit of implementation hiding is preserving the state integrity of
stack objects. Because the data is private, only the member functions can
access it. Indeed, one can prove inductively that, if a Stack object is
created and initialized to a sane state, and the member functions are
correctly implemented, then the object will be in a sane state throughout its
lifetime, independent of the specific sequence of operations invoked against
it.
The major benefit of implementation hiding for classes is the ability to
change a class's implementation while not changing its public interface (and
semantics) one whit. Thus any application using the class can benefit from the
changed implementation with no maintenance other than possibly re-compilation.
Listing 2 shows a re-implementation of class Stack as a linked list of nodes,
where each node contains an integer value.


Public Class Derivation


To apply implementation hiding to inheritance, review the semantics of class
derivation in C++. [6] If D is derived from B,
class D : public B { ... };
then all of the members of class B are members of D. In effect we can pretend
that every D object contains an anonymous data member of type B. Furthermore,
D objects inherit the implementation of B member functions. Thus, invoking a B
member function against a D object is like invoking the function against D's
anonymous data member of type B. By the way, this description applies equally
to multiple inheritance as it does to single inheritance.
Here is a problem. Imagine a type queue of integers with the public interface
in Listing 3.
class Stack can be implemented by deriving it from Queue.
class Stack : public Queue {
public:
Stack() : ();
void push( int x) { puthead( x); }
int pop() { return gethead(); }
// Truth isempty()is inherited
// Truth isfull() is inherited
};
This Stack implementation re-uses a great deal of Queue's implementation
(which presumably already exists), cutting down the programming load. The
problem, however, is that Stack also inherits member functions that have
nothing to do with the semantics of stacks. For example
Stack stk;
...
stk.puttail ( 37);
I can modify the base of the stack, thus corrupting its state. That is,
successive popping will not reveal the very first element pushed on the stack,
but rather the number 37!
This conflict between class derivation and the intuitive semantics of Stack
can be resolved by realizing the need for a semantic definition of
inheritance. A semantic definition of inheritance is quite different from the
kind of inheritance that all object-oriented languages (including C++)
support, namely the inheritance of implementation (called class derivation in
C++).
Barbara Liskov [4] has proposed a semantic definition of inheritance that
relates the behavior (public interface) of one type to another. Liskov and
Snyder [5] were the first (to my knowledge) to distinguish between behavioral
inheritance and class derivation as it is practiced in many object oriented
languages.


Defining Behavioral Inheritance


A type S is said to (behaviorally) inherit from a type T if S's public
interface includes the public interface of T. S is (behaviorally) derived from
T, or is a child of T, and T is a base or parent type with respect to S.
This rich definition has many consequences. First, to say that the public
interface of one type (S) contains the public interface of another (T) means
that if x is an object of type S and f is an operation on T, then you can
invoke f against x (i.e, x.ƒ() makes sense). Moreover, in the language of
manual pages, the manual page (specification) for f as an S-operation must
include all of the behavior (manual page specification) of f as a T-operation.
f as an S-operation may extend f as a T-operation. f as an S-operation cannot
contradict f as a T-operation. Any contradiction violates behavioral
inheritance.
This semantic definition is completely independent of implementation. In fact
one can implement semantic inheritance relationships in procedural languages
like C and Pascal, though it can take some work. C++'s support for class
derivation greatly facilitates implementing semantic hierarchy. This is so in
large part because of the following common scenario: D is semantically derived
from B and has an operation f''with semantics identical to an operation f in
the public interface of B. If I choose to implement D using class derivation
class D : public B { ... }
then I can implement f' by simply inheriting f from B with all its
implementation. I simply don't provide an explicit override for f in class D.
The software designer can exploit the implementation independence of semantic
hierarchy. Because semantic hierarchy is a relationship between the behavior
of two types, it can be checked at design time, long before implementation
decisions have to be made. The inheritance relationship could be implemented
with class derivation, or some other means all together (e.g. layering of data
abstractions). Derived type designers must expect, in the worst case, to
re-implement (over-ride) every function in the public interface of the base
type, particularly for multiple (behavioral) inheritance. If you're lucky, you
can inherit the implementation of a function (which is what class derivation
yields).
Nothing in the definition of behavioral inheritance precludes of multiple
inheritance. Indeed, many useful situations derive S behaviorally from two (or
more) types, T1 and T2. My favorite example is of graphics editors. A Picture
is derived from both Shape and List-of-Shape. That is, just as you can
display, rotate, translate, and scale a graphical Shape, so can you apply
these operations to a collection of shapes, that is a Picture. On the other
hand, what does it mean for Picture to be a kind of List-of-Shape? It means
that we expect to insert, delete and select shapes out of the picture; in
other words, perform all of the operations on a picture that we would perform
on any list.
Applying behavioral inheritance to Stack and Queue, it is clear that Stack is
not semantically derived from Queue, despite the use of class derivation,
because puttail() and gettail() have no place in the public interface of
Stack. Class derivation does not always imply semantic inheritance.
Implementational inheritance (i.e., class derivation) is simply a way of
implementing new classes in terms of other classes. Indeed, language support
for implementation inheritance does make it relatively easy to implement
semantic inheritance, but there is generally additional work left to do. Since
class derivation is an implementation technique, perhaps it should be hidden
from view in a manner analogous to private data members. The next section
explores this approach using special C++ syntax.
Liskov's behavioral definition of inheritance is so clear that I was inspired
to come up with a mathematically rigorous definition of inheritance based upon
her definition. This rigorous definition is suitable as formal methods for
proving program correctness and formal specification of object-oriented
systems (see last month's column on "Formal Specification and Object-Oriented
Design"). The definition in question involves an arcane mathematical pastime
called chasing commutative diagrams and is beyond the scope of this article,
and thus awaits publication in a suitably theoretical journal!


Private Derivation



All the inheritance examples seen so far have employed public derivation
class D : public B { ... };
// public derivation
which allows sending (public) base class messages to derived class objects. By
contrast, private derivation
class D : private B { . . . };
// private derivation
is unfortunately the default for class derivation
class D : B { . . . };
// also private derivation
public base class operations cannot be invoked against derived class objects
by other objects. However, in both visibilities of derivation, derived class
member functions can invoke public base class member functions.
Private derivation allows hiding the implementation inheritance to easily
construct new classes from old ones. Private derivation allows multiple
classes to effortlessly re-use implementation. The following implementation of
Stack (privately) inherits the implementation of Queue and solves the problem
encountered in the public derivation of inheriting Queue member functions
unrelated to stackness.
class Stack : private Queue {
public:
Stack() : ();
void push( int x) { puthead( x); }
int pop() { return gethead(); }
Truth Queue::isempty();
Truth Queue::isfull();
};
Note the use of the scope resolution operator (::) to render visible to Stack
users the Queue member functions isempty() and isfull(). Stack users cannot
access any other Queue member functions.
One could just as easily have layered the implementation of Stack on top of
Queue, as follows:
class Stack {
public:
Stack() : q() { }
void push( int x) { q.puthead( x); }
int pop() { return q.gethead(); }
Truth isempty() { return
q.isempty(); }
Truth isfull() { return
q.isfull (); }
private:
Queue q; };
That is, a Stack object has a single Queue data member, and the member
functions of Stack are implemented (here trivially) in terms of the member
functions of Queue. In fact, I can always convert the private derivation of a
class D from a class B to a layered implementation
class D {
public:
// member functions
private:
B b;
// other data members
};
in which D contains a private data member of type B, and which implements the
member functions of D in terms of B's member functions (and the other data
members). Notice that layering is just a fancy word for data hiding, which is
certainly a far more fundamental concept than private derivation. As a
computer language instructor, I would expect most C++ programmers to master
data hiding (and there are those who don't). Even experienced C++ programmers
have trouble with private derivation!
This canonical conversion of private derivation into layering is troubling,
because it leads me to the conclusion that private derivation is excess
baggage, which complicates a language which is growing ever more complex.


Next Time


In the next column, I will apply implementation hiding to protected visibility
and class derivation, friend functions, and static data members and global
variables.
References
[1] B. W. Kernighan and P. J. Plauger, Software Tools in Pascal,1981, Addison
Wesley.
[2] Stan Lippman, C++ Primer, Addison-Wesley, 1989. [LSK] B. Liskov and J.
Guttag, Abstraction and Specification in Program Development, 1986, MIT Press.
[3] B. Liskov, "Data Abstraction and Hierarchy", addendum to Conference
Proceedings, Object Oriented Programming Systems Languages and Applications
(OOPSLA), 1987.
[4] Alan Snyder, "Encapsulation and Inheritance in Object-Oriented Programming
Languages", Proceedings of the ACM Conference on Object Oriented Programming
Systems Languages and Applications (OOPSLA), SIGPLAN Notices 21, November
1986.
[6] Bjarne Stroustrup, The C++ Programming Language, 1987, Addison Wesley.
[7] Niklaus Wirth, Programming in Modula-2, Springer-Verlag, 1983.

Listing 1
typedef int Truth;

const int TRUE = 1;
const int FALSE = 0;
const int size = 100; // constant integer, kind of like a macro

class Stack { // Stack of integers
public: // these members are public
Stack() { sp = 0; }
void push( int x) { elt[sp++] = x; }
int pop() { return elt[--sp]; }
Truth isempty() { return sp <= 0; }
Truth isfull() { return sp >= size; }
private: // these (data) members are private
int sp; // sp 'points' to next available slot
int elt[size];
};

Listing 2
class Node {

public:
Node( T x) { val = x; Next = 0; }
Node *next() { return Next; }
void link( Node *neighbor) { Next = neighbor; }
T value() { return val; }
private:
Node *Next;
T val;
};

class Stack {
public: // these members are public
Stack() { head = 0; }

// precondition: !isfull()
void push( int x)
{
Node *p = new Node( x);
if( head == 0)
head = p;
else {
p->link( head);
head = p;
}
}

// precondition: !isempty()
int pop()
{
int r = head->value();
Node *p = head;
head = head->next();
delete p;
return r;
}

Truth isempty() { return head == 0; }

Truth isfull()
{

Node *p = new Node( 0);
delete p;
return (p == 0);
}
private: // these (data) members are private
Node *head;
};


Listing 3
class Queue { // queue of integers
public:
// initialize the queue to empty
Queue();

// returns TRUE if the queue is empty, else FALSE
Truth isempty();

// returns TRUE if queue is full, else FALSE
True isfull();

// deletes and returns value at head of queue
// precondition: !isempty()
int gethead();

// make x the new head of the queue
void puthead( int x);

// delete and return the last element in the queue
// precondition: !isempty()
int gettail();

// make x the last element in the queue
void puttail( int x);
private:
...
};


























PC Expert And PC Expert Professional


David Keathly


David Keathly is Director of Computer Systems Research and Development for
Comar, Inc., of Richardson, Texas where he is involved in the development of
industrial vision and control systems. David received his B.S. and M.S. in
electrical engineering from Oklahoma State University with an emphasis in
image processing, vision and advanced architectures.




Product Description


Both PC Expert and PC Expert Professional (version 1.00) are expert system
shells and development environments offered by Software Artistry, Inc. This
report focuses on the PC Expert because this basic package fits more squarely
in the price/performance category for all but the most serious personal or
corporate expert system developer. The package requires a PC XT, AT, PS/2 or
compatible with a minimum of 256K (512K recommended). Versions of the package
support several compilers including Microsoft C, Turbo C and Turbo Pascal. PC
Expert Professional is source-level compatible with Software Artistry's
high-end Application Software Expert tool which runs on IBM's AS/400
minicomputer and also under AIX and OS/2.


Features


The basic PC Expert package offers a backward-chaining inference engine and
full-function integrated development environment. The development environment
provides a rules editor and tracing and debugging facilities which help create
and refine rule-based and expert systems. More importantly, this package
provides all the functions and libraries necessary to embed the inference
engine and knowledge base within a C program. The libraries include full math
functions, time and date arithmetic, screen control and window creation. The
libraries can also create context-sensitive help via the HOW, WHY and BECAUSE
statements. The rules in the knowledge base can be constructed with a variety
of AND & OR operators and ELSE clauses, as well as supporting confidence
factors associated with conclusions.
The basic package can define demon rules and functions that are attached to a
particular parameter or variable. The demon rule, or C function in the case of
a demon function, will be fired, or execute, any time the value of the
variable changes. This feature is useful in systems that monitor external
processes and events, or in systems that consistently update a database of
relevant information. The ability to "compile" knowledge bases improves
performance and reduces storage requirements. The package also provides an
Agenda utility which operates much like a simple C program or DOS batch file.
The Agendas chain together multiple knowledge bases and even execute them
conditionally. Consider the following example from the PC Expert User's
Manual:
LOAD('A.KB');
SOLVE(RESULT1);
CLEARRULES;
IF RESULT1 = 'B' THEN ACTION = LOAD('B.KB');
IF RESULT1 = 'C' THEN ACTION = LOAD('C.KB');
SOLVE(FINALRESULT);
This example opens a window, loads the first knowledge base and invokes the
inference engine to determine the value of RESULT1. Depending on the outcome
of this consultation, the Agenda opens one of two other knowledge bases and
obtains the final result.
PC Expert Professional offers some additional features for the more
sophisticated user. In particular, the package includes method rules and
functions that operate in a fashion similar to the demon rules and functions.
These rules or functions are used to acquire a value for the variables they
are attached to. This capability is not limited to simple user prompts or disk
file access. Using the method function, a programmer can access any device
connected to the computer to acquire this information -- from analog input
boards to LAN controllers.
Another feature of the Professional version is a choice of inferencing
methods. In addition to the backward-chaining offered in the basic package,
the user can select forward-chaining, mixed forward- and backward-chaining,
and backward-chaining with a final forward pass through unfired rules. These
methods increase the programmer's flexibility in designing and consulting
knowledge bases for a variety of applications. The Professional package also
provides links to dBase and Lotus 1-2-3 files for business and database
management applications.
Finally, the Professional version provides a very capable and sophisticated
set of tools for creating graphics and animation. The tools can be included in
consultation sessions and initiated either from within the knowledge base or
via the C library calls. The graphics can be generated by PC Paintbrush or
other such packages and then saved in a compressed format for display
purposes. Animation sequences are stored as differences from the first image
in the sequence to save memory and improve performance. Gauges and dials can
also be created to provide more realistic and comfortable user input
mechanisms and visuals.
Unlike earlier expert system shells, PC Expert's limitations on the expert and
rule-based systems depend on the machine's characteristics of the machine in
use. The maximum rule size is limited only by the available memory, and the
maximum knowledge base size is limited by the available disk space. An
unlimited number of knowledge bases can be chained together or conditionally
executed via the Agenda files, and a particular parameter or variable can have
an unlimited number of legal values, and multiple values assigned limited only
by available memory.


Documentation


The documentation for the package is easy to read and understand. The
documentation presents the basics of expert systems technology and describes
the major features of the shell including examples. The manual discusses each
of the knowledge base statements and C function calls available to the
programmer/developer. The documentation is well done but doesn't sufficiently
explain all the techniques and details required for a novice to become an
"expert" system developer. Some understanding of artificial intelligence,
knowledge representation and manipulation, and boolean logic is required for a
developer to fully understand and use this product.


An Example


To illustrate the use of this product, consider a knowledge base designed to
select the appropriate cheese to fit a particular person's tastes and
preferences. The rules and questions were entered using the integrated
development environment - a process requiring about 15 minutes. I could also
have constructed the knowledge base using any text editor or word processor
capable of producing a raw ASCII file. Listing 1 contains the knowledge base
as constructed and used by PC Expert, and illustrates the ease with which a
developer can add help items and control the screen environment as aids for
user interaction. Listing 2 illustrates how the knowledge base can be invoked
from C rather than consulted via the development environment. Of course, the C
code could be much more complex and perform other operations, invoke
additional knowledge bases using the results of this consultation, or make
multiple consultations to this knowledge base altering the facts and/or
inferencing methods on each pass.
The integrated development environment's monitoring and tracing facilities
made it easy to construct and debug the rule set. Once the basic knowledge
base performed consultation satisfactorily, I integrated it with the C
routines which provided input and consulted the knowledge bases.


Conclusions


As expert system development tools, PC Expert and PC Expert Professional
provide a host of useful features. Although the Professional version may be
too expensive for the casual user, its additional inferencing and
graphics/animation features are well worth the money for those programmers
developing serious applications.
The PC Expert's range of inferencing choices and user-friendly development
environment compares favorably with other expert system shells on the market.
Few other products in this price range can interface knowledge bases with
existing code written in C or other procedural languages. The PC Expert's C
interface is powerful and flexible. The ability to intermix C functions in the
knowledge base, including the use of demon and method functions, provides the
system developer with an extremely flexible tool for building a variety of
complex applications. This package is one of the few that uses the PC's
ability to interface with the outside world via the input and output boards.
Applications can interface directly to gauges, digital voltmeters, power
supplies, thermometer, barometers and other measurement devices to drive
diagnostic expert systems. The expert system can drive the diagnostic testing,
or other external events, by controlling these external devices via the C
interface.
Overall, PC Expert provides a friendly and eminently usable expert system
development tool for the C programmer experimenting with artificial
intelligence techniques. PC Expert Professional enhances the performance and
sophistication with which the serious hobbyist or the commercial systems
developer can build expert system applications that interface with the real
world.
Software Artistry
3500 DePauw Boulevard
Suite 2021

Indianapolis, IN 46268
(317) 876-3042

Listing 1
DECLARE(DOMESTIC,String);
QUESTION(DOMESTIC,'DO YOU PREFER DOMESTIC CHEESE?
(YES OR NO)');
LEGAL(DOMESTIC,'YES/NO');
HELP(DOMESTIC,'','Do you prefer cheese made in the
United States?');
DECLARE(GROUP,String);
DECLARE(CHEESE,String);
DECLARE(TASTE,String);
QUESTION(TASTE,'DO YOU PREFER CHEESE THAT IS MILD
MEDIUM OR SHARP?');
LEGAL(TASTE,'MILD/MEDIUM/SHARP');
DECLARE(FEEL,String);
QUESTION(FEEL,'SHOULD THE CHEESE BE SOFT OR NOT?');
LEGAL(FEEL,'SOFT/NOT');
HELP(FEEL, '','Should the cheese be spreadable(soft)
or sliceable(not soft)');
RULE1: IF DOMESTIC = 'NO' AND
GROUP = 'SHARPSOFT'
THEN CHEESE = 'CHEVRES';
ACTION=OpenWindow(1,1,80,20,White,BLue,'RESULT',TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,BLue);
ACTION=Write('CHEVRES');
ACTION=Pause;
ACTION=CloseWindow.
RULE2: IF DOMESTIC = 'NO' AND
GROUP = 'MEDFIRM'
THEN CHEESE = 'FONTINA';
ACTION=OpenWindow(1,1,80,20,White,Blue,'RESULT',TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,Blue);
ACTION=Write('FONTINA');
ACTION=Pause;
ACTION=CloseWindow.
RULE3: IF DOMESTIC = 'NO' AND
GROUP = 'MILDFIRM'
THEN CHEESE = 'GOUDA';
ACTION=OpenWindow(1,1,80,20,White,BIue,'RESULT',TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,Blue);
ACTION=Write('GOUDA');
ACTION=Pause;
ACTION=CloseWindow.
RULE4: IF DOMESTIC = 'NO' AND
GROUP = 'SHARPFIRM'

THEN CHEESE = 'LIMBERGER';
ACTION=OpenWindow(1,1,80,20,White,Blue,'RESULT',TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,Blue);
ACTION=Write('LIMBERGER');
ACTION=Pause;
ACTION=CloseWindow.
RULE5: IF DOMESTIC = 'NO' AND
GROUP = 'MEDSOFT'
THEN CHEESE = 'CAMEMBERT';
ACTION=OpenWindow(1,1,80,20,White,Blue,'RESULT',TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,Blue);
ACTION=Write('CAMEMBERT');
ACTION=Pause;
ACTION=CloseWindow.
RULE6: IF DOMESTIC = 'NO' AND
GROUP = 'MILDSOFT'
THEN CHEESE = 'TELEME';
ACTION=OpenWindow(1,1,80,20,White,Blue,'RESULT', TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,Blue);
ACTION=Write('TELEME');
ACTION=Pause;
ACTION=CloseWindow.
RULE7: IF DOMESTIC = 'YES' AND
GROUP = 'MILDIRM'
THEN CHEESE = 'MONTEREY';
ACTION=OpenWindow(1,1,80,20,White, Blue,'RESULT', TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,Blue);
ACTION=Write('MONTEREY');
ACTION=Pause;
ACTION=CloseWindow.
RULE8: IF DOMESTIC = 'YES' AND
GROUP = 'MILDSOFT'
THEN CHEESE = 'CREAM';
ACTION=OpenWindow(1,1,80,20,White,Blue,'RESULT',TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,Blue);
ACTION=Write('CREAM');
ACTION=Pause;
ACTION=CloseWindow.
RULE9: IF DOMESTIC = 'YES' AND

GROUP = 'MEDFIRM'
THEN CHEESE = 'VERMONT';
ACTlON=OpenWindow(1,1,80,20,White,Blue,'RESULT',TRUE);
ACTION=GotoXY(1,1);
ACTION=ClrScr;
ACTION=GotoXY(2,2);
ACTION=WRITE('The best cheese for your taste is ');
ACTION=Color(Yellow,Blue);
ACTION=Write('VERMONT');
ACTION=Pause;
ACTION=CloseWindow.
RULE10: IF DOMESTIC = 'YES' AND
GROUP = 'MEDSOFT'
THEN CHEESE = 'CREAM'.
RULE11: IF DOMESTIC = 'YES' AND
GROUP = 'SHARPFIRM'
THEN CHEESE = 'NEW YORK SHARP'.
RULE12: IF TASTE = 'MILD' AND
FEEL = 'NOT'
THEN GROUP = 'MILDFIRM'.
RULE13: IF TASTE = 'MILD' AND
FEEL = 'SOFT'
THEN GROUP = 'MILDSOFT'.
RULE14: IF TASTE = 'MEDIUM' AND
FEEL = 'NOT'
THEN GROUP = 'MEDFIRM'.
RULE15: IF TASTE = 'MEDIUM' AND
FEEL = 'SOFT'
THEN GROUP = 'MEDSOFT'.
RULE16: IF TASTE = 'SHARP' AND
FEEL = 'NOT'
THEN GROUP = 'SHARPFIRM'.
RULE17: IF TASTE = 'SHARP' AND
FEEL = 'SOFT'
THEN GROUP = 'SHARPSOFT'.

Listing 2
/************************************************************

CHEESE.C - a program to invoke the
cheese knowledge base

************************************************************/

#include <stdio.h>
#include "pcxpro.h"

main()

{
int cf; /* confidence factor for result */
char value[80]; /* contains the return string value
from the consultation */
if(load_knowledge_base("cheese.kb"))
/* if success in loading */
solve_for("CHEESE",
/* the goal for this consultation */
'B', /* inference method, backward for this ex. */
value, /* string for the returned result */

&cf); /* confidence factor for result */

printf("The cheese recommended for this consultation
is %s\n", value);
}


























































Programmer's Essential OS/2 Handbook


Tom Prodehl


Tom Prodehl is a senior technical analyst with Covia Partnership where he
works on large scale data distribution. He has programmed in C for six years
and worked with OS/2 since its earliest days. You may reach Tom at Covia,
Dept. DENND, 5350 South Valentra Way, Englwood, CO 80111.


The Programmer's Essential OS/2 Handbook by David Cortesi is one of several
books bringing OS/2 development to programmers' attention. The book differs
from many other OS/2 books because the author is not a member of the IBM or
Microsoft development team. Cortesi provides an unbiased view of OS/2
capabilities.
The first edition documents OS/2 v1.0, and covers neither the Presentation
Manager nor the kernel additions made for v1.1, so you will need another
reference for these areas. Cortesi's writing style is straightforward and
informal. In fact, this book is one of the more readable reference books I
have come across.
The book is divided into two major sections. The first section is 460 pages
organized as 16 chapters. The first chapter provides the OS/2 interpretation
of common computer science concepts such as processes, semaphores and I/O. It
provides a level setting for later in-depth discussions. Chapter two explains
the 286 architecture in detail. For example, the book explains why address
calculations cannot perform segment arithmetic as done in the DOS environment.
Understanding linking is essential for any OS/2 programmer. Chapter three
clearly explains linking, both in general and in particular to OS/2. Chapter
four covers the loading and initialization procedure for programs and dynamic
link libraries. Chapter five describes how to obtain system information and
country dependent information. Chapter six explains process creation and
control. It does not describe the debugger system call DosPTrace. Chapter
seven details memory allocation for all three types of allocation and
discusses named and anonymous memory sharing. Chapter seven also includes both
file I/O and file system management. Chapter eight describes threads and how
to manage their interaction with interprocess (and interthread) communication
constructs such as semaphores, critical sections, and queues. (Pipes are
covered in process creation and management in chapter six.)
Chapter nine includes a detailed design study of multi-threaded programming,
which should benefit programmers unclear on threads or interprocess
communication. Chapters 10, 11, and 12 concentrate on the high-performance
interface to the video, keyboard, and mouse respectively. These interfaces
provide faster-than-BIOS level access to the underlying hardware. Chapter 13
covers device drivers, but the discussion is brief. In chapter 14 Cortesi
provides useful information on accessing the serial port, but he tends to
wander off when describing RS-232. Chapter 15 contains a five-page summary of
whole disk operations that can be used to treat your drive as a single large
file. Chapter 16 touches on device monitors that let a program tap into a
device's stream.
These sections introduce the functions for the area and explain how the
functions interrelate. The emphasis is not on the parameters of the functions,
but on what they perform. Information necessary to operate the functions is
included along with a discussion of the structures used. Cortesi is brief when
discussing topics he thinks the average programmer will not be interested in.
This brevity is probably a good idea with device drivers, but in other areas,
such as device monitors, Cortesi may have been too brief. On the meatier
subjects of process management, thread management and programming, the file
system, and high performance interfaces, Cortesi is thorough and credible.
The second part of the book is a list of most (but not all) functions in OS/2
1.0. Fast safe RAM semaphore functions and session management functions are
missing from the book, but the rest are included in alphabetical order. Each
function lists parameters, actions, errors, and related functions but also
refers the reader to the first section where more information is available.
This reference aids the novice programmer by offering more detailed
explanations while still giving advanced programmers access to the simple
facts without the unnecessary explanations.
The book comes with samples of code to illustrate the text. A disk is
available with these samples. The disk also contains a header file that
declares the structures used in system calls and $defines for many values.
This header file is not compatible with the similar header files provided by
Microsoft or Lattice C compilers, which is good if you wish to migrate between
one C compiler for OS/2 to another. The samples and header files are also
provided in Pascal for those more comfortable with records than structs. The
header files are listed in the book's appendices so you don't need the disk to
understand the examples.
I highly recommended this book for your OS/2 library. It contains the
practical set of OS/2 system calls, a thorough exposition of the development
environment and a wealth of practical information.
Programmer's Essential
OS/2 Handbook
David Cortesi
M & T Books, 1988
$24.95, 709 pages
($39.95 with disk)
ISBN 0-934375-89-5




































Publisher's Forum
I am pleased and excited to announce that Dr. P.J. Plauger will become editor
of The C Users Journal, effective with the November 1990 issue.
Dr. Plauger should need no introduction to the C community. He has been deeply
involved with C since its creation. He was working at Bell Labs when C and
UNIX were created. He later founded Whitesmiths, Ltd., one of the early major
C compiler vendors. He is presently secretary of the ANSI C standards
committee, X3J11, and convenor of the international ISO C standards committee,
WG14. His latest book is Standard C, co-authored with Jim Brodie. He is also
author of our very popular "Standard C" column.
Dr. Plauger's influence, though, has reached far beyond just the C community.
Together with Brian Kernighan, he co-authored two significant and classic
texts. The Elements of Programming Style, a delightful takeoff on Strunk and
White's classic, championed the view that programs should be written to be
read, long before that view was popular. Software Tools (and later Software
Tools in Pascal) cogently demonstrated the power of UNIX-like tool design. You
need only count the number of Software Tools implementations in the CUG
library to appreciate how forcefully this book affects programmers.
In recent months I have found it increasingly difficult to find enough time to
attend properly to both my responsibilities as publisher and editor. I'm
pleased to be able to shift the major editorial responsibilities to someone as
competent, both technically and editorially, as Dr. Plauger. I'm also excited
about this change because I believe it extends our ability to continually
improve the quality of the information we deliver.
Sincerely Yours,
Robert Ward
Publisher & (Short Time) Editor





















































New Releases


CUG324 WGCONIO


Contributed by William Giel (CT), WGCONIO Library is a set of text-windowing
functions that emulate most of Turbo C's text windowing functions. Giel
created the library after discovering that some of Turbo C functions didn't
work when the application program was memory resident. The library provides
box drawing, cursor manipulation, keyboard control, window manipulation,
shadowing and text editing in windows. The disk includes C source code,
documentation, a sample program and small-model library. Although the library
was developed using Turbo C v2.0, it should be compilable using other C
compilers by replacing int86() calls with the corresponding routines of your
compiler.


CUG325 VGA Graphics Library


Ismail Zia (U.A.E.) has contributed a shareware VGA graphics library. The
library contains routines for filling a region with specified pattern and
color, setting up a view port, drawing an area bigger than the physical
screen, saving and loading a screen image, drawing a rectangle, ellipse,
polygon, line and arc with specified line style and color, special screen
effect, clipping, stroked font manipulation, transforming, scaling, and
rotating an object, etc. The program works on VGA standard modes and some
extended modes. The distribution disk includes a huge model library for
Microsoft C v5.0 or later and a large model library for Zortech C/C++ v2.0,
documentation that describes all the functions in the library, demo animation
image files, programs, and batch files, font files(8,16 points bit-mapped and
stroked), and executable stroked font editor. Since the program was developed
in C and 80386 Assembly (not included in the distribution), it will run under
MS-DOS only on a 386 machine.


CUG326 SoftC Database Library


Submitted by Jan Schumann (SoftC Ltd.), SoftC Database Library v2.0, a
shareware package, provides 120 functions for fully compatible access to dBASE
III/III+ data, memo, and index files, dBASEIV data files, Clipper, and Foxbase
index files. Those functions support routines; read/write data files,
delete/recover deleted records, manipulate I/O buffers, retrieve information
about the record structure, data field manipulation, find key, move key to
next/previous record, add/delete keys, retrieve current key, build key, data
and time conversions, calculation, and validation, record locking, etc. The
distribution disk includes complete documentation, header file, demo programs,
and small memory model libraries for Turbo and Microsoft C. To obtain the
souce code, contact the author (16820 Third St. NE, Ham Lake, MN 55304).










































New Products


Industry-Related News & Announcements




Library Handles dBase Files


Code Base 4.2 from Sequiter Software gives C and C++ programmers complete
dBase or clipper functionality and file compatibility.
Programmers using C++, ANSI C, or K&R C under DOS, OS/2, UNIX or Microsoft
Windows can now use Code Base 4.2. A customizable browse/edit facility has
been added. Adaptable end-user documentation is also included. It can be
distributed royalty-free along with developed programs.
MS-DOS and OS/2 versions of Code Base 4.2 are $295, and the portability
versions are $495. Contact Sequiter Software, 9644-54 Ave., Suite 209,
Edmonton, Alberta, Canada T6E 5V1, (403) 448-0313; FAX (403) 448-0315.


ParaGen Produces C Source For Paradox Engine Applications


Concept Dynamics has released ParaGen, a C code generator for the Paradox
Engine. ParaGen produces C source code specific to a user's application that
contains calls to the Borland Paradox Engine. Using Paradox tables previously
defined by the user, ParaGen examines the data structure of the tables and
produces code to perform various operations on those tables.
Functions generated by ParaGen will open, close, create and empty Paradox
tables using the Paradox Engine. Record level operations include full and
partial key searches first, last, next, and previous record fetches, and field
searches. Append, insert, and delete record operations are also provided.
The product sells for $99, including printed documentation. Contact Concept
Dynamics, 1147 South Euclid, Oak Park, IL 60304, (708) 524-2814.


SilverScreen Changes License


Schroff Development Corporation's new CAD Engine policy allows software
developers to license SilverScreen for use as a CAD engine. SilverScreen is a
3D CAD/solids modeling software system. A resident C compiler in SilverScreen
allows software developers to create custom applications. The C compiler
implements a library of more than 200 functions, including a subset of the
standard C library.
Contact Schroff Development Corporation at P.O. Box 1334, Mission, KS 66222
(913) 262-2664; FAX (913) 722-4936.


Informix Announces CASE Program


Informix Software has announced OpenCase, a program aimed at delivering a
complete distributed application development environment for open systems
using Computer-Aided Software Engineering (CASE) technology. Informix signed
an agreement with Systematica Limited, a software engineering company in the
United Kingdom, to help develop OpenCase products that meet United States and
European market requirements.
Informix's OpenCase provides a flexible, integrated environment that supports
a full software lifecycle. Informix will use the Methods Workbench in
Systematica's Virtual Software Factory (VSF) to develop its initial CASE
product offering, called OpenCase/SSADM (Structured Systems Analysis and
Design Method). The product will be aimed at the European market where the
vendor feels CASE standards are evolving more rapidly than in the United
States. Using the VSF technology, Informix will develop a flexible and
integrated open systems development environment that can be adapted to
emerging standards in the United States.
The OpenCase/SSADM product supports such features as on-line validation rules
and consistency and completeness checks. Developers will be able to
automatically generate Informix-4GL code from the final design. The
development environment will also enable programmers to make changes at the
application design level rather than in the code.
Informix's OpenCase/SSADM should be available during the first quarter of
1991. Contact Informix Software, 4100 Bohannon Dr., Menlo Park, CA 94025 (415)
926-6300.


OMG Seeks User Input


The Object Management Group has issued a Request for Information (RFI) on
object-oriented technologies. The RFI is the next step toward the OMG's goal
of providing a single set of interface specificiations in an object-oriented
environment.
To receive a full copy of OMG's guidelines for responding to the RFI or copies
of the OMG's standards manual contact RFI Inquiries Desk, Object Management
Group, 492 Old Connecticut Path, Framingham, MA 01701, (508) 820-4300; FAX
(508) 820-4303.


CI Releases UNIX Debugger


Computer Innovations has released a source-level debugger for C and C++
programmers. Debug 2000 is designed for UNIX programmers.
Debug 2000 has specific support for C++, which helps programmers deal with
complex data structures inherent in C++ programming. C++ program variable
names are presented in human-readable format.
The debugger interface is a menu system with partial name matches that gives
the user control over which debugging windows are presented. A range of
program animation and control commands and an interactive data structure
display browser are available.
The Debug 2000 is priced at $495 for a single license. Contact Computer
Innovations at 980 Shrewsbury Ave., Tinton Falls, NJ 07724 (201) 542-5920.


LPI Launches New C Compiler



Now available from Language Processors, Inc. is New C, a C compiler for
MS-DOS. It is available on all Intel 80386-based systems.
New C is compatible with pre-ANSI C implementations, most notably PC-based
compilers and other Kernighan and Ritchie implementations. With New C,
developers can correctly compile existing C code without any coding changes.
New C also supports a number of non-ANSI extensions, provided via compiler
options.
Under MS-DOS, New C operates in protected 32-bit mode using the full potential
of the 80386 chip, while using MS-DOS for all I/O and system calls. New C
allows all the machine's physical memory to be addressed at one time.
New C includes an integrated preprocessor, a standard runtime library with
header files, function prototypes, and optional in-line code generation of
math and string handling functions.
In addition to 80386-based MS-DOS environments, New C is available under UNIX
and XENIX on Motorola's 680X0 series, the RISC-based Motorola 88000, Sun's
SPARC processor and Intel's 80386. New C supports a variety of floating point
coprocessors, including the 68881, 80287 and 80387. The list price for New C
is $495. Contact Language Processors at (508) 626-0006.


M++ Library Supports Matrices


M++ 2.0 matrix library is now available from Oasys. The M++ matrix library
allows users to eliminate many of the looping structures required in dealing
with arrays and matrices within C++ programs.
Version 2.0 offers support for character, integer and floating point matrices,
and improved numeric support. M++ reduces code size for most applications. Its
dynamic memory management eliminates memory fragmentation and improves
allocation speed.
Prices for M++ begin at $725. For more information contact Oasys, 230 Second
Ave., Waltham, MA 02154 (617) 890-7889.


C Library Supports Communications


Magna Carta Software has released the C Communications Toolkit, a
communciations library for C programmers in MS-DOS. C Communications Toolkit
is a library of functions to perform serial communciations on PCs and PS/2s.
The Toolkit has a flexible internal design and comes with complete source code
which is 99 percent C and one percent assembler. Because the interrupt
handlers are pointed to by members of the structure that describes each
communications port, one can use the C interrupt handlers and have a 100
percent C application. All functions are prototyped.
Documentation for the toolkit includes a 600-page manual that contains 100
pages of toolkit tutorial, 125 pages of serial communications theory and 35
example programs.
The Toolkit supports Microsoft C v5.0+, Microsoft Quick C, Borland Turbo C and
Turbo C++, Watcom C and Mix Power C v1.2+.
The Toolkit is priced at $149.95. Contact Magna Carta Software, P.O. Box
475594, Garland, TX 75047-5594, (214) 226-6909.


Constantine Plans Tutorial


Larry L. Constantine, co-author of Structured Design and author of Family
Paradigms, will give a one-day tutorial, "Building Effective Project Teams:
Using Teamwork Paradigms," on Nov. 6, 1990, at the Omni Parker House in
Boston. The tutorial will precede the east coast Software Development
'90-Fall, the conference and exhibition for professonal software developers.
Constantine is the originator of structured design and has more than 50
published papers and seven books to his credit. Constantine asserts that
professionals working in teams are the unexplored resource for the 1990s.
Recent figures suggest that in some organizations, just $2,000 in teamwork
skills building can yield improvements in quality and productivity equivalent
to $41,500 invested in CASE tools and support.
For more information contact Software Development Seminars at 500 Howard St.,
San Francisco, CA 94105, (415) 995-2472; FAX (415) 995-2494.


Object DBMS Runs On AIX


Versant Object Technology has released Versant, an object database management
system supporting both C++ and C for the IBM RISC System/6000 under the AIX
operating system.
According to the vendor, Versant is a probable solution when any of the
following criteria apply: the application requires more than 5,000 lines of
programming code; more than a handful of relational tables are needed to model
the application; the application typically requires more than three joins of
relational tables to respond to data requests; the application requires
non-value-based searches; the application requires data types such as bit
maps, geometries, audio and video; performance is unacceptable with relational
databases or file systems.
The Versant database management system is priced at $15,000. For more
information contact Versant Object Technology, 4500 Bohannon Dr., Menlo Park,
CA 94025, (415) 325-2300; FAX (415) 325-2380.


Meta Upgrades Design/OA


Meta Software has announced product upgrades for Design/OA and MetaDesign
under MS-Windows v3.0. MetaDesign will continue to operate under MS-Windows
v2.1.
Design/OA is a C language construction kit for building diagram-based
applications. It can create graphical front-ends to a database, code
generator, telecommunications network or data processing system.
Both upgrades will integrate online help, color support and expanded
documentation. Design/OA updates include improved diagram portability to other
applications and platforms, a simplified debugging process, more sophisticated
caching of data, and reduction of the installation and configuration process.
For more information contact Meta Software at 150 CambridgePark Dr.,
Cambridge, MA 02140, (617) 576-6920.


Vendor Updates Organized C


Code Farms has released Organized C v1.6. This programming tool provides a
comprehensive library of data objects for application programmers to use when
developing databases within C and C++ programs. Version 1.6 now includes large
memory models for both Turbo C and Microsoft C and can be converted to any
user-coded memory management system under UNIX, VMS and DOS.
Organized C consists of a preprocessor and parametric type library of commonly
required functions and macros. No intermediate code is generated. Errors are
reduced through strong typing, information hiding and through runtime
protection against dangling pointers.
Organized C is priced at $75 for PC compatibles and $1,745 for UNIX
workstations and VAX/VMS machines. Contact Code Farms, 7412 Jock Trail,
Richmond, Ontario, Canada K0A 2Z0, (613) 838-4829; FAX (613) 838-3316.



New smx Can Pre-empt DOS


Version 2.0 of the smx real-time multitasking kernel from Micro Digital offers
several new features. smx has full MS-DOS compatibility so it can now pre-empt
DOS calls. Tasks requiring large stacks can now use far stacks of up to 64K
bytes per task.
Version 2.0 supports task re-entrant floating point in two ways. The Turbo C
floating point emulator puts its floating point stack and variables at the
start of the stack segment, and user-written exit and entry routines may be
hooked into the scheduler on a task-by-task basis. The exit routine is
automatically called whenever its task is pre-empted or suspended. The entry
routine is automatically called whenever its task is resumed. These routines
can save and restore 8087 internal registers or handle other task environment
functions.
smx uses heap fences and task i.d. smx also uses V25 register bank support.
Contact Micro Digital at 6402 Tulagi St., Cypress CA 90630-5630, (714)
891-2363.


Informix - Net Reaches VINES


Informix Software, Inc. has announced that its Informix-Net connectivity
software now delivers client/server processing capabilities to Informix
database users on Banyan VINES local or wide area networks. This allows VINES
users access to any of Informix's full line of application development tools
on UNIX-based servers.
For more information contact Informix Software, Inc., 4100 Bohannon Dr., Menlo
Park, CA 94025. (415) 926-6300.


Saber Software Enhances Debugger And Saber Development Environment


A newly-released version of Saber-C from Saber Software offers enhanced
debugging and optimized functionality that integrates the functions most
commonly performed during implementation and maintenance phases of software
development.
Version 3.0 gives programmers all the tools they need to develop, debug, test
and maintain C programs on UNIX workstations without leaving the Saber-C
programming environment. Capabilities include automatic static and runtime
error detection, source level debugging, graphical code, data and error
browsing and a multi-window graphical user interface.
Saber-C allows programmers to automatically detect more than 250 static and
runtime errors. The debugger allows programmers to debug source and object
code modules, set breakpoints and examine the value and definition of objects,
plus trace and step through program execution. Other features include an error
browser that allows programmers to graphically scan and correct program errors
quickly. The multi-window graphical user interface allows a programmer to
define commands and customize buttons.
Within the Saber-C environment, users can write or modify C code, execute test
samples, and identify and correct errors. Code browsers and an incremental
linker/loader assist programmers in modifying and testing existing programs.
Versions are available for Sun Microsystems workstations and for Digital
Equipment Corporation workstations running Ultrix.
Version 3.0 sells for $2,495 in the United States. Customers covered under the
Saber-C software support program will receive v3.0 for free. Contact Saber
Software at 185 Alewife Brook Parkway, Cambridge, MA 02138, (617) 876-7636;
FAX (617) 547-9011.


Windowless Source Debugger Works With 68000 Family


Software Development Systems has introduced FreeForm, a windowless
source-level debugger for the Motorola 68000 family. FreeForm does not use
windows in its user interface, nor does it use an in-circuit emulator.
The FreeForm debugger is command-language driven. FreeForm allows the user to
choose precisely which code and data are being displayed at any given moment.
Each data object may occupy the entire screen if needed. If an object is too
big to fit on a screen, it can be redirected to a file and edited. Arrays,
enumerations, and structures can be displayed in fully symbolic form, and
complex data structures can be displayed automatically.
Instead of using an in-circuit emulator, FreeForm controls the target
application directly through any available serial or parallel I/O port by
communicating with a small target monitor program that is supplied with the
package.
FreeForm is available under MS-DOS, Xenix, and UNIX. Contact Software
Development Systems at 4248 Belle Aire Lane, Downers Grove, IL 60515,
1-800-448-7733; FAX (708) 971-8513.


NABJA Object-Environment Supports Microsoft C v6.0


NABJA Software's object-oriented development environment for C, NABJAooc, now
supports Microsoft C 6.0. NABJAooc provides an on-line hypertext reference.
This includes a full class hierarchy that allows the user to browse through
both system and user class definitions.
NABJAooc's implementation of the object-oriented paradigm includes multiple
inheritance and support for integrated programming environments. Since
NABJAooc is not a preprocessor, it allows the use of existing source level
debuggers, including Microsoft Code-View. NABJAooc also features foundation
classes for basic data structures and a message trace facility. The base
system introduces less than 10K of overhead.
NABJAooc is priced at $49, with the full source code version for $99. Contact
NABJA Software, P.O. Box 413, Girard, PA 16417-0413, (814) 774-3699.


PDS Offers New Source Generator


A new development tool, PDS-C, from Parameter Driven Software enables software
developers to create programs in a 4GL language and run them as compiled C
code at up to twice typical 4GL execution speeds, according to the vendor.
PDS-C v1.0 operates with BTOS/CTOS computers. PDS-C lets programmers take
advantage of the application development, testing and debugging facilities
offered by PDS-Adept and the speed and extended programmer control offered by
C. PDS-Adept is a 4GL developed by Parameter Driven Software. It has three
components. The first, a precompiler, generates standard C source code from
programs developed with PDS-ADEPT. The C source code is then compiled and
bound with the PDS procedure library and the BTOS/CTOS system libraries. The
second component is a development kit, which is made up of the Procedure
Library's source code and library files. The development kit is hardware
specific and lets the programmer port applications from one platform to
another. The third component, user access software, is used by each master
workstation, or server, that will execute a compiled PDS-C application. The
user access software is used to set up the user's operating environment and
password security.
The PDS-C source generator is priced at $7,500 to $12,000 depending on
hardware platform, plus $75 to $100 per master workstation. Contact Parameter
Driven Software at 359 Enterprise Court, Bloomfield Hills, MI 48013, (313)
335-7475; FAX (313) 335-7346.


Driver Utility Supports FAX Boards


The XChange Fax/LAN Toolkit from Commtech International allows PC software
publishers, developers and system integrators to implement faxing capabilities
into their software programs for either a standalone PC or a workstation on a
network. This utility works with any CAS-compatible fax board.
XChange Fax/LAN can be executed from the DOS command line or integrated via
four different source code functions provided with the utility. Software users
can fax from within an application with which they are already familiar to any
Group III fax machine without having to purchase specialized third-party fax
software or fax servers.
Licenses range from $89 for a single license to $995 for multicopy licenses
with source code. For more information contact Commtech International, 2580
Cumberland Parkway, Suite 150, Atlanta, GA 30339, (404) 438-9999; FAX (404)
438-1615.



Softaid Offers New Emulators


Softaid has released a new line of in-circuit emulators. The UEM-Series
includes a source-level debugger that interfaces to most third-party C
compilers, all assemblers and several other high-level languages.
One of the emulator's new features is a background mode that lets a program
continue to run interrupts and DMA even when the rest of the code is stopped.
For more information contact Softaid, 8930 Route 108, Columbia, MD 21045-2101,
(301) 964-8455; FAX (301) 596-1852.


Toolkit Manipultes PCX Files


Genus has released PCX Programmer's Toolkit v4.0. With the toolkit, PC
software developers can create applications with the ability to manipulate PCX
format graphic images without developing their own routines.
Version 4.0 introduces the GX Development Series. Unlike other graphics
libraries that use memory resident kernels, all products within the series
will share a centerpiece of common functions that are linked directly into the
developer's application. Programmers using member libraries will save on code
overhead by sharing kernel functions between libraries. Only the required
kernel functions will be linked into the programs. The kernel is responsible
for all display adapter interfacing, memory allocation and virtual bitmap
support.
Several new functions have been added to the PCX toolkit. Sizing and scaling
of images and enhanced virtual (large, off screen) buffer functions have been
added to the toolkit. These buffers can be created in conventional or expanded
memory, or on disk. Support for more Super VGA chipsets have been added and
now include Orchid, STB, Micro Labs, Willow Peripherals, Video Seven,
Paradise, ATI, Trident and Zymos.
Contact Genus at 11315 Meadow Lake, Houston, TX 77077, (713) 870-0737; FAX
(713) 870-0288.


Library Builds Graphs And Charts


Geograf Level One from Geocomp Corporation is a graphics library of
subroutines and functions used to create graphs and charts. It supports 13
fonts and several line types. Plots from Geograf Level One can be imported
into any program that accepts HPGL-formatted files, such as WordPerfect 5.1 or
Microsoft Word.
Geograf Level One can produce hardcopy on any output device and can produce
real-time graphics on the screen. A set of graphics device drivers is included
which support most graphics cards, dot matrix printers, ink jet printers,
laser printers and pen plotters. Device drivers can be changed at any time
without changing the software, giving the programmer control over output
options without tying up valuable computer memory. Additional copies of the
device drivers are available.
Geograf Level One supports Microsoft C and QuickC, Microsoft Pascal and
Quick-Pascal, Microsoft BASIC 7.0, Microsoft QuickBASIC 4.5, Borland Turbo C,
and Borland Turbo Pascal. Geograf is priced at $149. Contact Geocomp
Corporation, 66 Commonwealth Ave., Concord, MA 01742, (508) 369-8304; FAX
(508) 369-4392.


UNIX Stage Now Available


Crucible has released Stage for UNIX systems. Stage provides a complete
development environment for the Truevision ATVista videographics adapter.
Stage for UNIX systems is compatible with the Truevision MS-DOS Stage 2.0
release. Applications created with Stage for UNIX Systems take full advantage
of the multitasking capabilities and memory of UNIX. Contact Crucible at 1717
Seabright Ave., Suite 1, Santa Cruz, CA 95062, (408) 423-4600; FAX (408)
423-4602.


































We Have Mail
Dear Mr. Ward,
About a year ago I made the decision to leave the ranks of GWBASIC
programming. After years of enduring the smug remarks about being an
undisciplined GOTOer I decided that the time was right to begin my new
learning experience. After months of research my compiler of choice was Turbo
C v2.0 and my reference of choice is The C Users Journal. I use the word
"reference" lightly because after reading your magazine from cover to cover
every month for the last eight months I can honestly say that I don't have any
idea what the hell you are talking about. I love it! I use your magazine to
gauge how much I am learning by seeing how much of each issue I understand. In
all fairness I must say that I can only study between projects and job
assignments which leaves little time. I figure at this pace I've got to go
another two years before I can develop a meaningful database application. But
as of the June 1990 issue I am up to two pages and six ads of comprehension.
This quantum leap in learning is the direct result of Leor Zolman's File I/O
tutorial. The pages are worn thin as I go back and forth knowing that someday
it will all sink in (kind of like Calculus).
For the past year I have felt that my professionalism is lagging because I am
still proficient in GWBASIC and have not mastered even the basic skills of
file I/O in C. But three things happened in your April, May, and June issues
that have me believing in myself again and not feeling so bad about my
programming skills (or lack of them):
June -- The Bad C Pun Contest. This led me to believe that not all C
programmers walk around with a smug look on their face and a frowned brow.
May -- The Obfuscated Code Contest. This led me to believe that regardless of
what language we preach, we are all hackers at heart.
April -- File I/O Tutorial (first installment). As I slid my ruler slowly down
Listing #2 being careful not to miss a single comma, semi-colon or space for
fear of being trapped in the debugger zone. There it was, on line 63, as big
as life. And again on line 80. And 104, 110, and finally on line 118. There
before my eyes was THE word......GOTO. Imagine, an intelligent, experienced
and respected C programmer like Mr. Zolman using GOTOs. There's hope for me
yet.
Please don't try to explain them because I wouldn't understand what you were
talking about. Suffice it to say that I know why they are there. C will
apparently dig holes like any other language that only a GOTO can climb out
of.
By the way, are there any other secrets you're not telling. Like GOSUBs, DATA
statements, INKEY$, CVI, CVS, or CVD. Keep up the good work and please
continue to remember us Novices once in a while.
Mr. Zolman and all of the Editors; Thank you for a great series of articles (I
think).
Bill Sacramone
27 Carpenter Road
Dudley, MA 01570
I asked Leor about these gotos. He feigned ignorance. --rlw
Even intelligent, experienced and respected programmers like myself can fail
to check that the correct version of code makes it into the magazine listings.
In Listing 3 of July's installment of my series, delete lines 23, 28 and 40.
This should make the code a bit more understandable. --leor
Dear Mr. Ward:
Since my article, "A Date Object in C+ +", was published in the June 1990
issue, I have had the opportunity to implement the program in Borland's new
Turbo C++. Doing so revealed several minor flaws and one larger one in the
original version.
The functions DMYtoJulian() and JulianToDMY() (Listing 2, pages 61 and 62)
contain several literal constants which should be denoted as long integers by
adding the 'L' suffix. This addition should not cause any change in the actual
operation of the code but should improve clarity.
Also in function DMYtoJulian(), the local variable Ja, Jm, Jy and Jul should
be long ints, not unsigned long ints. Obviously, if years can take on negative
values, arithmetic on years must be signed. The declaration of unsigned long
ints is a remnant of some earlier experimentation I did with the function. For
some reason it works with the Zortech compiler under the conditions I tested.
The function must use long ints to work with Turbo C++.
Depending on which warnings are enabled, Turbo C++ will complain that the
assignment to Year near the end of JulianToDMY() will cause a loss of
significant figures. The warning can be squelched by surrounding the statement
with #pragma warn sig- and #pragma warn sig+.
The last two problems occur in the function DateToString() (Listing 2, page
63). The variables c, Len and Pos are remnants of earlier work on the
function. They and any statements that use them can be safely deleted.
Finally, the call to calloc(), near the top of the function will generate a
type mismatch error that can be fixed simply by casting the result of the call
to a char*.
Sincerely,
David D. Clark
507 N. Division St.
Bristol, IN 46507
Dear Friends,
What a wonderful surprise to find a magazine devoted just to the C programming
language. A friend of mine recently let me borrow his first copy of your
magazine (July '90) and I was delighted. I found your ad for the public domain
library and fell in love all over again. Here was the serious software source
code that I have lusted after for the past two years (since I began to try to
learn C). I have owned a Tandy Color Computer for about five years now and am
presently running OS-9 Level II on a CoCo 3. The C language seems to be the
preferred programming language for most serious non-assembly programmers and I
learn better from example so your library is a God send.
I have included a photocopy of your order blank with the two disks that I wish
to order first. I requested them to be in MS-DOS 5-1/4" double sided format
because I don't know if one of those 100 formats you advertize includes OS-9
or not. If you DO have the Tandy Color Computer OS-9 Level II format
available, I would of course prefer it over MS-DOS but I can read and write to
MS-DOS disks as long as it is in ASCII or TEXT form. I would assume that the
source code and doc files would be in ASCII would it not?
If you do not presently offer OS-9 Level II format disks, I would urge you to
consider it. I will most certainly pass the word to my friends and to the
"Rainbow" magazine about your library. I'm not sure if they are aware of your
existence. Believe me, we in the CoCo community need all the help we can get
at present since Tandy seems to have decided the Color Computer is not worth
supporting. This is not the case and many of us users are not happy with Tandy
at the moment. We in the 6809 world have created our own new version of the
CoCo. At present, two vendors are presenting a 68000 version of the Color
Computer and they look most promising. I paln to upgrade to one of them as
soon as I can get more info on them both and decide which one I want and can
afford.
In the very near future, (August), it is my intention to subscribe to The C
Users Journal. I live on a fixed income at present and so must pace my
expenses to my income. You know how that is. I know that I will learn a lot
from it. It's tough trying to learn a programming language on your own but
each day it become a bit clearer. Your magazine will help fill my needs. I
hope that I will receive a quick response from you folks with regard to the
disks that I am ordering. Our "OS-9 Users Group" has a similar library but
they just can't seem to get it to you in less than 10 to 12 weeks and it is
not nearly as extensive as your library. Most discourage that.
Best Regards,
James F. Carter
529 S. Speer
Monticello, AR 71655
We should do much better than 10 to 12 weeks. In fact our internal goal is to
ship all standard format orders (that includes MS-DOS) in two working days.
We researched MS-DOS to OS-9 conversion software and hardware about a year ago
and decided that the available systems were too technically demanding and too
expensive for our needs. If you can read MS-DOS, that's probably your best
alternative for the forseeable future.
Our library does hold some programs (see CUG132) developed specifically for
the Color Computer, but unfortunately they weren't developed under OS-9. Even
so, you should be able to find many directly useful filters and other
text-based applications in the library. -- rlw
Dear Editor:
Not a question of changing style or of good style, standard English requires
the titling of figures and charts.
Regrettably The C User's Journal stylistic changes have failed to correct a
long standing disreputable habit of neglecting to label graphics. This failure
is inexcusable for two reasons. One, you're publishing a professional
magazine, which requires the oversight of those versed in English rhetoric,
along with its historical forms of graphic presentation; and two, as editors
of programming code you should highly regard adequately commented code,
comprehensible diagrams, clear flowcharts, and so on.
By remedying this situation the Journal will not only reflect greater quality
and consume less reader's time, but also make C clearer to those thinking it
obscure.
Sincerely,
John Rieley
2295 Andrews Avenue
Bronx, NY 10468
Robert Ward,
In his C++ article in the July 1990 issue, Ali Hazzah states:
"A structure may only contain variables."
Although this is true in C, it is not in C++. A C++ struct is exactly the same
as a class with all members public.
Bruce Eckel
Revolution 2
P.O. Box 760
Kennett Square, PA 19348
As usual with C+ + details, I'll have to plead ignorance and defer to experts
like you. Thanks for writing. -- rlw
Dear Robert,
I felt I must write to acknowlege Leor Zolman's series of articles. As a
serious student of programming, I was pleased with both the program and his
explanations, I do not know your reader make-up, but I would judge that most
are much farther advanced than I. I do hope there is enough interest in his
work to keep him busy with similar projects. Of course I read and enjoy all
the other writer's work. I am not into C++ yet, but Leor's article caused me
to sit down and type in the code and go to work. By the way, the line numbers
reduce typo's remarkably. Keep up the good work.

Sincerely,
Floyd A. Young
2425 Bissonnet St.
Houston, TX 77005
Leor's series has been very well received. We've already settled on a topic
for another series. As soon as we can complete some internal programming
projects, he'll get busy with some more writing. --rlw
Mr. Ward,
I read the article "Using yacc Or lex Twice In One Program" by Don Libes. It
discussed some useful programming techniques, but part of the motivation for
using these techniques was flawed. It is not necessary to "use yacc twice" in
order to have multiple parsers in the same program. There is a relatively
simple technique that can be used to allow multiple parsers to be combined in
the same yacc grammer.
If we have two grammars called grammar_a and grammar_b they can be combined by
writing
%token yySTART_a
%token yySTART_b
%%
grammer : yySTART_a grammar_a
 yySTART_b grammar_b
;
and then arranging for either yySTART_a or yySTART_b to be returned by the
lexical analyzer the first time it is called. This allows the lexical analyzer
to direct the parser to recognize either grammer_a or grammer_b. A more
complicated but cleaner way of achieving the same result involves modifying
the yacc generated parser so that it takes its first token as a parameter
rather than obtaining it from the lexical analyzer.
yacc-generated parsers keep the current token in a variable called yychar.
Normally at the beginning of the parser yychar is initialized to -1. The value
-1 is special for yychar. When yychar is set to -1 the parser knows that it is
time to call the lexical analyzer to get the next token. To make the first
token a parameter to the parser.
1. Modify the function yyparse so that yychar is declared as a parameter.
2. Remove the declaration of yychar as a local variable.
3. Remove the statement that initialized yychar to -1.
Sincerely,
Mark Grand
GeoMaker Software
P.O. Box 273124
Concord, CA 94527-3124
Thanks very much for the tip. If others out there have tips for using yacc or
lex, I'd really appreciate seeing them. Just send it to me in a letter, and
I'll print it here.
If you know of good resources for lex or yacc documentation, you might mention
that too. I think these are important tools that are unnecessarily difficult
to use because documentation is so scarce. -- rlw
Dear Robert,
I had so much fun in reading the Publisher's Forum (CUJ June 1990) and was
surprised to know what the Bad C Pun contest was all about. Perhaps, I was so
busy at the time when you announced the contest that I did not pay much
attention to it. I really enjoyed it. I suggest that you create a small column
in CUJ and publish some of the entries you received. This can add humor to
your highly technical journal. I'm sure a lot of C-soned programmers would
enjoy it also.
As they say, "It's Better late than never". So, I'm sending some of my C Puns,
too (just for the sake of fun).
1. What did Dennis Ritchie say to Ken Thompson after designing C? To C is to
B-leave!
2. A young programmer told his girlfriend, "Programmers often fall in love
with Pascal and end up happily married to C. Me, I still love U because you're
Uniks".
3. She sells C shells by the C store.
Also, I have a response to Mr. Alexander Vladimirovich Pavlov letter (CUJ
April 1990 p.139). You can still use libraries whose titles begin with any of
Borland's library names (i.e., CS..., CM..., CC..., CL..., CH... ). Just
remember that when you do so, you override the standard library for the
particular model that you are using. So, you have to explicitly specify in
your PRJ file, the libraries (EMU.LIB or FP87.LIB, MATHS.LIB, CS.LIB) that
should be used together with your own library (CHESLLIB. LIB). In that case,
you don't have to rename your library to LIBCHES. LIB.
I also have an entry in the Great Name/Obscure Code Contest.
if (virgin)
...
Description: virgin is a flag that indicates if a set of statements was
executed already or will be performed for the "first time".
It came from a report program written five years ago by a friend of mine.
Sorry, I can't disclose his name.
Sincerely yours,
Victor Caballero
World Health Organization
UN Avenue, P.O. Box 2932
Manila, Philippines
Maybe you should have saved those puns till next year ... and captured some
prize money.
We'll pass your variable name to Ken Pugh, but It's meaning seems perfectly
clear to me. -- rlw
Dear Robert,
I must congratulate you, your staff, and your fine team of contributors on a
consistently well-produced and informative magazine. The Obfuscations this
year were wonderfully hideous -- the modern equivalent of catharsis in the
Greek tragedies! (James Joyce called it catsarses, but that's another story.)
I once suggested, tongue-in-cheek, a tautological APL Obfuscation contest at
SIGAPL (ACM's APL special interest group), but they are, with a few
exceptions, touchy and humorless on this subject.
Your excellent and growing coverage of C++ is most timely in view of the
arrival of Borland's lowcost implementation. All the signs are that Turbo C++
is selling well beyond expectations. Since it offers K&R, ANSI-C and AT&T
version 2.0 C+ +, I suspect that many of your readers will be tempted into
exploring Bjarne's baby.
Bill Plauger's honest accounts of the ANSI-C committee's travails are greatly
appreciated. On the metaphysics of malloc(0), you may be interested in the
solution that Tom Clune and I crafted the other week. If there are exactly
zero bytes available, malloc(0) succeeds, returning a pointer to the byte just
beyond the (full) heap. If 1 or more bytes are available, an error message
"Too much memory available" is displayed, and the call returns a pointer to
some vulnerable part of the user's code.
Regards to Donna and all,
Yours sincerely,
Stan Kelly-Bootle
The C Users Journal Official Bad C
Pun Adjudicator && Contributing
Editor, UNIX Review && Contributing Editor, Computer Language
Thanks very much Stan. It's nice to hear from you.
As for your malloc() "solution", it's clever -- but only because it's not in a
real runtime package. --rlw


































































Debugging Objects With Turbo Debugger


Alex Lane


Alex Lane is the Senior Technical Writer in the Languages Business Unit at
Borland International. This article was excerpted from Turbo C++ by Example
(M&T Books, 1990) and has been reprinted with the permission of M&T Books
(415/366-3600). Thanks to Mark Weaver for his help in coding the program.


The C++ language allows programmers to use object-oriented programming (OOP)
techniques in program development. A programmer's tools, especially the
debugger, must cope with object and class hierarchies and inspection of
objects and classes. OOP aside, debugging technology itself must keep the
programmer productive. This article demonstrates Turbo Debugger's capabilities
in supporting OOP by debugging a sample C++ program.


About Turbo Debugger


Turbo Debugger is a source-level debugger designed for Borland language
programmers, yet it can be used by programmers using other compiler products
as well. Among other features, the current release (version 2.0) supports the
expanded memory specification (EMS) for debugging large programs,
assembler/CPU access when needed, support for 80386 and third-party debugging
hardware, and TSR and device driver debugging. Turbo Debugger's features also
include:
a user interface, including overlapping windows, pull-down menus and mouse
support,
support for OOP using either Turbo Pascal 5.5 or Turbo C++,
reverse execution, which restores machine state without re-executing the
program from the beginning,
breakpoint and logging facility,
real mode, protected-mode and virtual (386) debugging support


The Test Program


The sample program in this article roughly simulates an ant colony. In this
souped-up version of Conway's "Game of Life," an ant not only appears, lives
and dies, it also moves and behaves in specific ways, depending on what type
of ant it is: Queen or Worker.
The rules of the anthill are: All ants have an energy store that is
decremented in each cycle of the program. If the level falls to zero, the ant
dies. All worker ants have a life span and an age that increments each cycle.
The ant dies when its age exceeds its life span. When an ant dies, it
disappears from the screen. The Queen does not age, but she can die of hunger.
When all ants are dead, the simulation is over.
Workers feed the Queen by foraging for food and bringing it back to the Queen.
The Queen is restricted to one corner of the screen and can only eat food and
lay eggs. The eggs hatch into worker ants after a few cycles.
Listing 1 contains the header file for the program. Listing 2 shows the class
definitions, supporting functions and main() function.
After watching the program run awhile, you'll notice a few annoying
characteristics. First, from time to time, ants seem to stop dead in their
tracks. Second, the count of worker ants at the bottom left of the screen
never goes down. Third, if we run the program for a long time with a large
amount of food on the screen, the queen "dies" (her energy goes to zero) for
no apparent reason, even though moments before, she had plenty of energy.


Running Turbo Debugger


For the sake of brevity, I won't describe the exact keystrokes or mouse
actions needed to perform various actions. (You can perform virtually all
actions within the debugger by a set of keystrokes or a set of equivalent
mouse actions. Function keys also provide shortcuts for some common actions,
such as toggling breakpoints or tracing through code.)
The anthill program is packaged in one file for convenience, so finding parts
of code in the 800-plus lines is relatively easy. Normally, source code is
scattered among a number of files, and under such conditions, hunting for a
specific routine is a challenge. Browsing through the class hierarchy can
help.
The class is a basic object-oriented concept. You can think of a class in C++
as a template for a data type that has private, so-called "member" functions
available to it. A class can inherit the data and member functions of another
class, or from multiple classes (see Figure 1). Classes in a hierarchy can
share a common member function defined in some base class, or they can each
define their own version of a function. When many classes have their own
version of a member function, declaring the function as virtual assures that
the appropriate class's function is executed in any given context. Letting
objects invoke their own form of a function having a common name is
polymorphism, and lends a great deal of flexibility to OOP.


Debugging The Anthill


First, let's tackle the problem with the queen dying for no apparent reason.
By calling up the Class hierarchy window shown in Figure 1, we can examine
relationships among classes in the program. The pane in the window's upper
right-hand portion displays a hierarchy tree for all objects and classes in
the program module. The tree shows that both Worker and Queen classes are
derived from the more generic Ant class. Ant, in turn, is derived from both
Mover and Consumer via multiple inheritance. If you move the highlight bar
down to Ant, the lower right pane displays a reverse tree showing the parents
of the highlighted class; in this case, Mover and Consumer are shown as
parents of Ant. The left pane diplays a "flat" alphabetical list of all
classes in the program module.
Selecting the Queen from the hierarchy opens a Class inspection window, shown
in Figure 2. This window's upper pane shows all of the member data of the
Queen class; the lower pane shows the member functions for the class. (To make
the window less crowded, you can toggle a 'Show Inherited' flag to No, and
you'll see only the member data and functions of the Queen class.) Selecting
the line
void doTheAntThing ()
brings up a function inspection window, containing the starting address of
this member function. Selecting again brings up another window, shown in
Figure 3, containing the function's source code. The line we're interested in
is visible in this window.
We set a breakpoint at the line
if (nEnergy<1)
so the breakpoint fires only when an expression we enter evaluates as true.
Figure 4 shows a filled in dialog box, which will fire the breakpoint when the
expression
nEnergy <= 0
is true. We then run the program, setting the queen's initial energy level to
20,000, and the amount of food on the screen to 1,000. After a minute or so,
the breakpoint fires. We inspect the queen's energy level and find that it's
negative (In our run, it was -31,744.) The solution becomes apparent. This
looks like integer overflow (the variable nEnergy is probably an int, and if
we look back to the Class Inspection window in Figure 2, we confirm that it
is). The easiest way to fix this problem is to change the nEnergy variable to
a long integer (unsigned integer won't do, because a freak addition might
result in 0, again killing the queen). We choose, however, to fix the problem
by modifying the queen's eating habits; if the Queen goes to eat and finds her
energy is high, she won't eat. This characteristic is implemented as a member
function to the Queen class (Listing 3).
The remaining two problems might be related, because when a worker ant dies,
the ant should disappear and the global count of workers should decrease.
Since the count never decrements, maybe the ants aren't dying. If they are
dying, why are they visible, and why doesn't the count change?
We set a breakpoint at the line
thisAnt->doTheAntThing ();
in the code that processes the Worker ant list in main(). We execute to that
line and begin to trace through the code. Our first call is to the function
Worker::doTheAntThing(), which immediately calls the function
Ant::doTheAntThing(). Here, the ant is aged, and its energy is decremented.
The ant is also checked to see if it has died of old age or starvation. If we
accientally trace past the line that reads

if ( nEnergy < 1 nAge >
nLifeSpan )
we can undo our trace using reverse execution (Alt-F4). Reverse execution lets
you move backward through code when an expected path is not taken, or when you
want to cover all possible paths through a statement (for example, a switch
statement).
Here, let's "kill" the ant by setting the energy level to zero using the
Evaluate/modify dialog box shown in Figure 5. Now, the call to die() executes
Ant::die(), which sets the bDead flag to TRUE, clears nEnergy, and erases the
ant.
We then return to Worker::doTheAntThing(), which evaluates an if statement
that, if true, executes moveTowardQueen(), and if false, executes
lookForFood()! Using the hierarchy browser to find the code for these
functions, we see that both functions move and redraw the ant! In other words,
after the ant is dead and erased, another function draws it again. Once
doTheAntThing() is finished, an if statement deletes the dead ant object back
in main(). We can solve the problem of non-disappearing ants simply by moving
the call to doTheAntThing() after the if statement. This way, the ant doesn't
get to move again once it's dead.
Finally, we attack the problem of decrementing the global number of workers.
Using Turbo Debugger's file browsing capability to look at ANTHILL.CPP, we
search for all occurrences of die() and find three occurrences: the one in
Ant::doTheAntThing(), which calls die(), another call from a Queen member
function, and an explicit call from Worker::doTheAntThing() to Ant::die().
Notice that the call from Ant::doTheAntThing() merely calls die(), and yet the
code executed was Ant::die(). Since the object originating the call is a
Worker object, the call to die() should execute the Worker class's die()
member function, not the Ant's. This call didn't happen because the die()
member function is not declared virtual. Making die() virtual will result in
die() calling the correct version of the function, in this case, Worker:
:die().


Summary


After implementing the code changes we've identified, we rerun the program and
note that the queen does not gorge herself, and that workers disappear quietly
when their time comes. Our debugging session has been a success.
Turbo Debugger can help programmers achieve better productivity through its
enabling features, such as informative window displays, breakpoints, reverse
execution and an easy-to-use interface. Turbo Debugger makes debugging
object-oriented programs easier by offering the user access to the program's
different parts to examine the data and member functions of various classes.
Figure 1 Turbo Debugger Class Hierarchy Window
Figure 2 Turbo Debugger Class Inspection Window
Figure 3 Source Code for Inspected Function Queen::doTheAntThing()
Figure 4 Breakpoint Options Dialog Box
Figure 5 Evaluate/Modify Dialog Box. nEnery is set to 0

Listing 1
/////////////////////////////////////////////////////////////
// //
// Header file for ANTHILL simulation program //
// //
// by Mark Weaver and Alex Lane //
// //
/////////////////////////////////////////////////////////////

typedef unsigned char BOOLEAN;
#define TRUE 1
#define FALSE 0

#define BACKGROUND 0
#define MAX_FOOD_SIZE 30000

// Types of ants
enum antType { typeQueen, typeWorker, typeEgg } ;

// Function Prototypes
void moverDefaultDrawFunc(int, int, BOOLEAN, int);
int distance(int, int, int, int);
void showFood(int, int, BOOLEAN );
void showStatus(void);
void resetDefaults(void);
void displayValues(void);
unsigned getNumber(char *, unsigned, unsigned,
unsigned );
char getChoice(void);

//////////////////////////////////////////////////////////////
// //
// MOVER class definition //
// //
//////////////////////////////////////////////////////////////

class Mover {
private:
void moverDrawFunc(int, int, BOOLEAN, int); // Function to draw

object
BOOLEAN bVisible; // Is object currently visible on screen?
protected:
int nInitX, nInitY; // Original position
int nVelX; // Velocity in X direction
int nVelY; // Velocity in Y direction
unsigned uMaxRange; // Max distance object moves from original pos.
unsigned uMaxSpeed; // Max distance object moves in one cycle
int nX, nY; // Current position
int color; // Color of mover
public:
Mover(unsigned, unsigned, int, int );
BOOLEAN moveTo(int, int );
BOOLEAN move( void );
void show( void );
void erase( void );
void draw( void );
virtual int getX( void ) = 0;
virtual int getY( void ) = 0;
};

/////////////////////////////////////////////////////////////
// //
// CONSUMER class definition //
// //
/////////////////////////////////////////////////////////////

class Consumer {
public:
int nEnergy; // If this reaches zero, it's curtains!
Consumer(int _nEnergy )
{ nEnergy = _nEnergy; } // inline constructor
void eatFood( int ); // Eat food at current location
virtual int getX( void ) = 0;
virtual int getY( void ) = 0;

};

////////////////////////////////////////////////////////////
// //
// ANT class definition //
// //
////////////////////////////////////////////////////////////

class Ant : public Consumer, public Mover {
protected:
int nLifeSpan; // Age at which ant dies
unsigned nAge; // How old is this ant
BOOLEAN bEgg; // Am I an egg still?
public:
BOOLEAN bDead; // Does this ant bDead?
Ant( int, int, int, int, int ); // constructor
void die(void); // Assume room temperature
void doTheAntThing(void); // Do whatever ants do
virtual int getX( void ) { return Mover::getX(); };
virtual int getY( void ) { return Mover::getY(); };
};

/////////////////////////////////////////////////////////////

// //
// QUEEN class definition //
// //
/////////////////////////////////////////////////////////////
class Queen : public Ant { // queen is derived from ant
private:
unsigned birthRate;
public:
Queen( unsigned, int,int,int,int,unsigned ); // Constructor function
void layEgg( void ); // Queen can lay eggs
void doTheAntThing( void ); // Do whatever queen ants do
};

/////////////////////////////////////////////////////////////
// //
// WORKER class definition //
// //
/////////////////////////////////////////////////////////////
class Worker : public Ant { // worker is derived from ant
private:
Worker *nextWorker; // Pointer to next ant in list
Worker *prevWorker; // Pointer to previous ant in list
int nFood; // Indicates worker carrying food

void dropFood(); // Drop food for queen
void grabFood(void); // Pick up food
void moveTowardQueen(void); // Carry food toward queen
void lookForFood(void); // Search for more food
void addToList(void); // Add self to head of list
void die(void); // Assume room temperature

public:
Worker( BOOLEAN, int, int); // Constructor Function
~Worker(void);

Worker *next(void){ return(nextWorker); } // Access function to next ptr
void doTheAntThing( void );
};


Listing 2
#include <graphics.h>
#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <math.h>
#include <dos.h>
#include <ctype.h>
#include <string.h>
#include "anthill.h"

#define MIN(a,b) (((a) < (b)) ? (a) : (b))

// Global Variables //

float gfCellSizeX; // Width (in pixels) of a cell
float gfCellSizeY; // Height (in pixels) of a cell
unsigned guFoodUnits[100][100]; // How many units of food in each cell
int gnTimeToHatch; // Cycles required to become a worker ant

int gnDropDistance; // Distance at which to drop food for queen
unsigned long gulTotalFood; // Total units of food on board
unsigned guWorkers; // Total workers on board
unsigned guEggs; // Total number of eggs on board
unsigned long gulCycle; // Which life cycle are we in?
unsigned guWorkersPortion; // What a worker eats from each food packet
unsigned guMaxLifeSpan; // Longest life span
unsigned guBirthRate; // Percentage of cycles queen lays an egg
unsigned guMaxSpeed; // Maximum speed for worker ants
unsigned guQueensEnergy; // Queens extra starting energy level
unsigned guInitEggs; // Initial # of eggs
unsigned guInitWorkers; // Initial # of worker ants
unsigned guInitFood; // Initial # of cells with food
unsigned guPause; // Milliseconds to pause each cycle
Queen *gQueen; // Queen ant
Worker *headOfAntList; // Head of linked list of worker ants

// MOVER class functions //

Mover::Mover(unsigned uMaxRng, unsigned uMaxSpd, int nStartX, int
nStartY)
{
uMaxRange = uMaxRng;
uMaxSpeed = uMaxSpd;
nX = nInitX = nStartX;
nY = nInitY = nStartY;
bVisible = FALSE;
nVelX = -uMaxSpeed + random(uMaxSpeed + 1);
nVelY = -uMaxSpeed + random(uMaxSpeed + 1);
}

int Mover::getX( void )
{
return nX;
}

int Mover::getY( void )
{
return nY;
}

BOOLEAN Mover::moveTo(int nNewX, int nNewY )
{
if ((distance(nNewX, nNewY, nInitX, nInitY) <= uMaxRange !uMaxRange)){
erase();
if ((nX = nNewX) < 0){
nX = 0;
}
else
if (nX > 99) {
nX = 99;
}

if ((nY = nNewY) < 0){
nY = 0;
}
else
if (nY > 99) {
nY = 99;

}

draw();
return( TRUE );
}
else {
return( FALSE );
}
}

BOOLEAN Mover::move( void )
{
int nNewX, nNewY;

nNewX = nX + nVelX;
nNewY = nY + nVelY;

if ((distance(nNewX, nNewY, nInitX, nInitY) <= uMaxRange !uMaxRange)){
erase();
if ((nX = nNewX)<0){
nX = 0;
nVelX = -nVelX;
}
else
if (nX > 99) {
nX = 99;
nVelX = -nVelX;
}
if ((nY = nNewY) < 0) {
nY = 0;
nVelY = -nVelY;
}
else
if (nY > 99) {
nY = 99;
nVelY = -nVelY;
}
draw();

// Turn once in a while
if (random(100) < 10){
nVelX += random(3) - 1;
nVelY += random(3) - 1;
if (nVelX > uMaxSpeed) {
nVelX = uMaxSpeed;
}
else
if (nVelX < -uMaxSpeed) {
nVelX = -uMaxSpeed;
}

if (nVelY > uMaxSpeed) {
nVelY = uMaxSpeed;
}
else
if (nVelY < -uMaxSpeed) {
nVelY = -uMaxSpeed;
}
// Make sure I do not stop

if (nVelX == 0 && nVelY == 0) {
nVelX = 1+random(uMaxSpeed-1);
nVelY = -1-random(uMaxSpeed-1);
}
}

return( TRUE );
}
else {
draw();
return( FALSE );
}
}

void Mover::show( void )
{
if (!bVisible){
draw();
bVisible = TRUE;
}
}

void Mover::erase( void )
{
moverDrawFunc(nX, nY, FALSE, BACKGROUND );
}
void Mover::draw( void )
{
moverDrawFunc(nX, nY, TRUE, color);
}

void Mover::moverDrawFunc(int nX, int nY, BOOLEAN bVisible, int nColor)
{
setfillstyle(SOLID_FILL, bVisible ? nColor : BACKGROUND);
bar(nX * gfCellSizeX, nY * gfCellSizeY,
(nX + 0.5) * gfCellSizeX, (nY + 1) * gfCellSizeY - 1);
}
// ANT class functions //

Ant::Ant( int Range, int Speed, int nX, int nY, int Energy) :
Consumer( Energy ), // constructor
Mover( Range, Speed, nX, nY ) // constructor
{
bDead = FALSE;
}

// Dying ant updates counters and drops any food he is carying
void Ant::die(void)
{
bDead = TRUE;
nEnergy = 0;
erase();
}

void Ant::doTheAntThing(void)
{
// Age the ant
nAge++;


// Every ant uses one energy unit per cycle
nEnergy -= 1;

if (nEnergy<1 nAge > nLifeSpan) {
die();
}
else {
if (bEgg) {
// force redraw in case egg has been walked on
moveTo(nX, nY);
if (nAge > gnTimeToHatch) {
bEgg = FALSE;
color = GREEN;
guEggs--;
guWorkers++;
}
}
}
}

// QUEEN class functions //

Queen::Queen( unsigned InitEnergy, int QRange, int QSpeed,
int QIntitX, int QInitY, unsigned _birthRate) :
Ant(QRange, QSpeed, QIntitX, QInitY, random(100)+InitEnergy )
{
birthRate = _birthRate;
color = YELLOW;
}

// Queen ant lays eggs once in a while
void Oueen::layEgg(void)
{
new Worker(TRUE, nX, nY);
guEggs++;
nEnergy -= MIN(150, nEnergy);
}

void Queen::doTheAntThing(void)
{
// Every ant uses one energy unit per cycle
nEnergy -= 1;

nVelX = random(uMaxSpeed+1)*(random(2) ? 1 : -1 );
nVelY = random(uMaxSpeed+1)*(random(2) ? 1 : -1 );

if (nEnergy<1) {
die();
}
else {
// Turn around if at limits of areal
if (distance(nInitX, nInitY, nX + nVelX, nY + nVelY) > uMaxRange) {
nVelX = -nVelX ;
nVelY= -nVelY ;
}

// Move somewhere
move();


if (guFoodUnits[nX] [nY]) {
eatFood(guFoodUnits[nX] [nY]);
}
// need a minimum of 750 units to be able to lay an egg
// (the idea here is to conserve energy)
if ((nEnergy > 750) && (random(100) < birthRate)) {
layEgg();
}
// if the energy level gets too high, improve the chances of
// laying an egg; and vice versa
// keep the birthrate between 50 end 5 percent'
if ((nEnergy > 5000) && (birthRate < 49)) birthRate++;
if ((nEnergy < 3000) && (birthRate > 6)) birthrate--;
}
}

// WORKER class functions //

Worker::Worker( B00LEAN bEggYes, int nX, int nY ) :
Ant( 0, guMaxSpeed, nX, nY, random(100)+50 )
{
bEgg = bEggYes;
addToList();
nFood = 0; // Start out carrying no food
// ants live for lifespan plus/minus 20%
nLifeSpan = (0.8 * guMaxLifeSpan) + random(0.4 * guMaxLifeSpan);

if (bEggYes){
color = CYAN;
nAge = 0;
}else{
color = GREEN;
nAge = gnTimeToHatch + 1;
}
}

Worker::~Worker(void)
{
// Remove myself from the ant list
if (prevWorker == NULL) {
headOfAntList = nextWorker;
}
else {
prevWorker->nextWorker = nextWorker;
}
if (nextWorker != NULL) {
nextWorker->prevWorker = prevWorker;
}
}

// Lie down and be counted (out)!!
void Worker::die(void)
{
// Drop any food before going to ant heaven
if (nFood) {
guFoodUnits[nX] [nY] += nFood;
}
Ant::die(); // Call ant die function


if (bEgg) {
guEggs--;
}else{
guWorkers--;
}
}

// Add new ant to head of doubly linked list
void Worker::addToList()
{
// Place myself at the head of doubly linked list of ants
nextWorker = headOfAntList;
headOfAntList = this;
prevWorker = NULL;
if (nextWorker != NULL) {
nextWorker->prevWorker = this;
}
}

// Drop food for the Queen to eat
void Worker::dropFood(void)
{
int xDist, yDist;

guFoodUnits[nX] [nY] += nFood;
nFood = 0;
color = GREEN;
showFood(nX, nY, TRUE);
// Move AWAY from queen
if (nX < gQueen>getX()) {
xDist = -uMaxSpeed;
}
else
if (nX > gQueen->getX()) {
xDist = uMaxSpeed;
}
if (nY < gQueen->getY()) {
yDist = -uMaxSpeed;
}
else
if (nY > gQueen->getY()) {
yDist = uMaxSpeed;
}
moveTo(nX + xDist, nY + yDist);
}

// Grab food to carry to queen, after eating a portion for self
void Worker::grabFood(void)
{
// Eat some food to replenish energy
eatFood(guWorkersPortion);

// Pick up rest to carry to queen
nFood = guFoodUnits[nX] [nY];
guFoodUnits[nX] [nY] = 0;
showFood(nX, nY, FALSE);
color = LIGHTRED;
}


// Carry food toward queen ant
void Worker::moveTowardQueen(void)
{
int xDist, yDist;

if (nX > gQueen->getX()) {
xDist = -1 * MIN(uMaxSpeed / 2, nX - gQueen->getX());
}
else
if (nX < gQueen->getX()) {
xDist = MIN(uMaxSpeed / 2 , gQueen->getX() - nX);
}
else {
xDist = 0;
}

if (nY > gQueen->getY()) {
yDist = -1 * MIN(uMaxSpeed / 2, nY - gQueen->getY());
}
else
if (nY < gQueen->getY()) {
yDist = MIN(uMaxSpeed / 2, gQueen->getY() - nY);
}
else {
yDist = 0;
}

moveTo(nX + xDist, nY + yDist);
}

// Roam around randomly, looking for food
void Worker::lookForFood(void)
{
if (!bEgg) {
move();
if (guFoodUnits[nX] [nY]) {
grabFood();
}
}
}

void Worker::doTheAntThing(void)

{
Ant::doTheAntThing(); // age, reduce energy, die if necessary
if (nFood) {
if (distance(nX, nY, gQueen->getX(), gQueen->getY()) <= gnDropDistance) {
dropFood();
}
else {
moveTowardQueen();
}
}
else {
lookForFood();
}
}




// CONSUMER class functions //
void consumer::eatFood(int nUnits)
{
int nX = getX();
int nY = getY();
nEnergy += MIN(nUnits, guFoodUnits[nx] [nY]);
gulTotalfood -= MIN(nUnits, guFoodUnits[nX] [nY]);
if (!(guFoodUnits[nX] [nY] = MIN(nUnits, guFoodUnits[nX] [nY]))){
showFood(nX, nY, FALSE);
}
}

// Support Functions //

int distance(int nX1, int nY1, int nX2, int nY2)
{
double sumSq = (double)((nX2-nX1)*(nX2-nX1) + (nY2-nY1)*(nY2-nY1));
if (sumSq > 0.0) {
return((int) sqrt(sumSq));

}else{
return(0);
}
}

void showFood(int nX, int nY, BOOLEAN bVisible)
{
setfillstyle(SOLID_FILL, bVisible ? WHITE : BACKGROUND);
bar((nX + 0.5) * gfCellSizeX + 1, nY * gfCellSizeY,
(nX + 1) * gfCellSizeX - 1, (nY + 1) * gfCellSizeY - 1);
}

void showStatus(BOOLEAN bShowAll)
{
char szMsg[100];

setfillstyle(SOLID_FILL, BLACK);
if (bShowAll){
bar(0,getmaxy()-19,getmaxx(),getmaxy());
sprintf(szMsg, "Workers: Eggs: Food: Cycle:
Queen: <X-eXit>", guWorkers, guEggs, gulTotalFood, gulCycle++);
outtextxy(5,getmaxy()-16, szMsg);
}else{
sprintf(szMsg, "%4u", guWorkers);
bar(5+8*8, getmaxy()-19, 5+8*13, getmaxy());
outtextxy(5+8*8, getmaxy()-16, szMsg);

sprintf(szMsg, "%4u", guEggs);

bar(5+8*20, getmaxy()-19, 5+8*24, getmaxy());
outtextxy(5+8*20, getmaxy()-16, szMsg);

sprintf(szMsg, "%7lu", gulTotalFood);
bar(5+8*32, getmaxy()-19, 5+8*39, getmaxy());
outtextxy(5+8*32, getmaxy()-16, szMsg);

sprintf(szMsg, "%6lu", gulCycle++);
bar(5+8*47, getmxy()-19, 5+8*53, getmaxy());

outtextxy(5+8*47, getmaxy()-16, szMsg);

sprint(szMsg, "%5d", gQueen->nEnergy);
bar(5+8*61, getmaxy()-19, 5+8*67, getmaxy());
outtextxy(5+8*61,getmaxy()-16, szMsg);
}
}

void runSimulation(void)
{
int grMode, grDriver; // Used to initialize graphics device
int nErrCode; // Results of graphics operation
char ch;

// Clear out accumulators
headOfAntList = NULL;
gulTotalFood = 0L;
guWorkers = 0;
guEggs = 0;

grDriver = DETECT;
Initgraph(&grDriver, &grMode, "");

nErrCode = graphresult();

if (nErrCode != grok){
printf("\n\nGraphics error: %s\n", grapherrormsg(nErrcode));
}else{
// Determine size of cells for 100 x 100 grid
gfCellSizeX = getmaxx() / 100.0;
gfCellSizeY = (getmaxy() - 20) / 100.0; // Allow room for status line

// Draw line to seperate status area
line(0,getmaxy() - 20, getmaxx(), getmaxy() - 20);

// Reset timer
gulCycle = 0; // Start timer at 0

// Create and show the queen ant
gQueen = new Queen( guQueensEnergy, 20, 2, 10, 10, guBirthRate );
gQueen->show();

// Now the Worker ants and eggs will be created. The pointers to the
// newly created ants are not kept because they are automatically put
// in a doubly linked list by the constructor function. The head of the
// list is the global variable head.

// Create worker ants
for (int i=0; i<guInitWorkers; i++){
Worker *temp = new Worker(FALSE, random(100), random(100));
temp->show();
guWorkers++;
}

// Create the eggs
for (i=0; i<guInitEggs; i++){
Worker *temp = new Worker(TRUE, random(30), random(30));
temp->show();
guEggs++;

}

// Create some food
for (i=0; i<100; i++){
for (int j=0; j<100; j++){
guFoodUnits[i][j] - 0;
}
}
for (i=0; i<guInitFood; i++){
int nY = random(100);
int nX = (nY>50) ? random(100) : (50 + random(50));
guFoodUnits[nX) [nY] = random(1000) + 1000;
gulTotalFood += guFoodUnits[nX] [nY];
showFood(nX, nY, TRUE);
}

BOOLEAN bDone = FALSE;
showStatus(TRUE);

while(!bDone){

// Update status line
showStatus(FALSE);

// Pause
delay(guPause);

// Process the queen ant
if (!gQueen->bDead){
gQueen->doTheAntThing();
}

// Process all ants that are in list
Worker *thisAnt = headOfAntList;
while(thisAnt != NULL){
Worker *nextWorker = thisAnt->next();
if (! thisAnt ->bDead){
thisAnt ->doTheAntThing();
}
if (thisAnt->bDead){
delete thisAnt;
}
thisAnt = nextWorker;
}

// Are all ants dead?
if (gQueen->bDead && headOfAntList == NULL){
bDone = TRUE;
bar(5+8*67, getmaxy()-19, getmaxx(), getmaxy());
outtextxy(5+8*67,getmaxy()-16,"<Hit A Keys>");
showStatus(FALSE);
sound(6000);
delay(1000);
nosound();
getch();
}

// Has user hit X key?
if (kbhit()){

ch = toupper(getch());
bDone = (ch == 'X');
if (!bDone){
sound(1000);
delay(500);
nosound();
}
}
}

delete gQueen;

// Go back to original screen mode
closegraph();
}
}
void resetDefaults(void)
{

// These are the parameters to adjust to affect the simulation //

gnDropDistance = 4; // Drop food when distance 8 away from queen
gnTimeToHatch = 20; // Cycles to become a worker ant
guInitWorkers = 2; // Initial # of workers
guInitEggs = 2; // Initial # of eggs
guInitFood = 400; // Initial # of food cells
guPause = 0; // Milliseconds to pause each cycle
guWorkersPortion = 250; // What a worker eats from each food cell found
guBirthRate = 10; // How many of 100 rounds does egg get laid
guMaxLifeSpan = 200; // Maximum age ant will live to
guMaxSpeed = 2; // Maximum speed that worker ant will move
guQueensEnergy = 2000; // Queens extra starting energy
}

void displayValues(void)
{
clrscr();
gotoxy(26,1);
printf("ANTHILL by Mark Weaver & ALex Lane");
gotoxy(35,24);
printf("SELECT ONE");

gotoxy(2,5);
printf("[W] initial number of Worker ants");
gotoxy(2,6);
printf("[E] initial number of Eggs");
gotoxy(2,7);
printf("[F] initial number of cells containing Food");
gotoxy(2,8);
printf("[B] Birth rate (chance in 100 of a birth each cycle)");
gotoxy(2,9);
printf("[L] worker ant's maximum Life span");
gotoxy(2,10);
printf("[V] worker ant's maximum Velocity (cells per cycle)");
gotoxy(2,11);
printf("[P] Portion of food worker eats each time food is /
found");
gotoxy(2,12);
printf("[H] number of cycles it takes an egg to Hatch");

gotoxy(2,13);
printf("[Q] Queens initial energy level");
gotoxy(2,14);
printf("[D] Delay between each cycle (in milliseconds)");
gotoxy(2,16);
printf("[R] Run simulation");
gotoxy(2,17);
printf("[X] eXit to DOS");

gotoxy(7,5);
printf("%5u", guInitWorkers);
gotoxy(7,6);
printf("%5u", guInitEggs);

gotoxy(7,7);
printf("%5u", guInitFood);
gotoxy(7,8);
printf("%5u", guBirthRate);
gotoxy(7,9);
printf("%5u", guMaxLifeSpan);
gotoxy(7,10);
printf("%5u", guMaxSpeed);
gotoxy(7,11);
printf("%5u", guWorkersPortion);
gotoxy(7,12);
printf("%5u", gnTimeToHatch);
gotoxy(7,13);
printf("%5u", guQueensEnergy);
gotoxy(7,14);
printf("%5u", guPause);
}

unsigned getNumber(char *szPrompt, unsigned nMin, unsigned nMax,
unsigned nDefault)
{
char szBuff[500];
int nRetval;

gotoxy(1,20);
clreol();
gotoxy(1,21);
clreol();
gotoxy(1,20);
printf("%s", szPrompt);
gotoxy(1,21);
printf("Range (%u to %u), <Return> for %u : ", nMin, nMax, nDefault);
gets(szBuff);
gotoxy(1,20);
clreol();
gotoxy(1,21);
clreol();

// strip off leading spaces
char *ptr = szBuff;
while(*ptr == ' '){
ptr++;
}

if (*ptr == '\0'){

return(nDefault);
}else{
sscanf(ptr, "%d", &nRetval);
return(nRetval);
}

}

char getChoice(void)
{
char chRetval = ' ';

while (strchr("WEFBLVPHQDRX", chRetval == NULL){
chRetval = toupper(getch());
}
return(chRetval);
}

// Main Program //
main()
{

// Initialize random number generator
randomize();

resetDefaults();

BOOLEAN bDone = FALSE;
while(!bDone){
displayValues();
char ch = getChoice();
switch(ch){
case 'X':
bDone = TRUE;
break;

case 'H': // hatch time
gnTimeToHatch = getNumber("Enter the number of cycles it takes an /
egg to hatch",
5, 100, gnTimeToHatch);
break;

case 'W': // Initial workers
guInitWorkers = getNumber("Enter initial number of worker ants",
0, 100, guInitWorkers);
break;

case 'E': // Initial eggs
guInitEggs = getNumber("Enter initial number of worker eggs",
0, 100, guInitEggs);
break;

case 'F': // Initial number of food cells
guInitFood = getNumber("Enter number of cells initially containing /
food",
0, 1000, guInitFood);
break;

case 'B': // Birth rate

guBirthRate = getNumber("Enter percentage chance of a birth each cycle",
5, 50, guBirthRate);
break;

case 'V': // Max velocity
guMaxSpeed = getNumber("Enter workers maximum speed", 1, 20,
guMaxSpeed);
break;

case 'Q': // Queens start energy
guQeensEnergy = getNumber("Enter queens initial energy level",
1000, 100000, guQueensEnergy);
break;

case 'L': // Max worker Lifespan
guMaxLifeSpan = getNumber("Enter maximum lifespan of worker ants",
50, 500, guMaxLifeSpan);
break;

case 'P': // Workers portion of food
guWorkersPortion = getNumber("Enter amount of food a worker eats /
each time it finds some food",
0, 500, guWorkersPortion);
break;

case 'D': // Delay between each round
guPause = getNumber("Enter delay for each cycle in milliseconds",
0, 10000, guPause);
break;

case 'R': // Run simulation
runSimulation();
break;
}
}
clrscr();
}


Listing 3
/* Debugging Objects with Turbo Debugger

add the following line to the public part of the
Queen class declaration:

void eatFood( int nUnits );

(i.e., the line before the closing brace
of the declaration!)
*/
// Add the following function definition to the
file ANTHILL.CPP:


void Queen::eatFood( int nUnits )
{
if (nEnergy < 20000)
Consumer::eatFood( nUnits );
else {

int oldX = nX;
int oldY = nY;
move();
showFood( oldX, oldY, TRUE );
}
}
// end of file
























































Debugging With The 80386 Hardware Debug Registers


Jon Chappell


Jon Chappell is product manager at Computer Innovations, Inc., and has been
developing language compiler and related software for microcomputers for the
past eight years. You can reach him at Computer Innovations, Inc., (908)
542-5920.




Abstract


Although debugging tools have grown increasingly sophisticated over the years,
some debugging tasks have become so complex that they require hardware
support. Intel provides one solution in its 80386/486 processors -- hardware
breakpointing. Hardware breakpointing allows the programmer to plant debugging
traps in a program in such a way that the program being debugged can be run at
nearly full-speed. Hardware breakpoints can benefit programmers if they have
the tools needed to use the chip's built-in debugging registers.


Breakpointing


In this article, breakpointing refers to the process of stopping the program
being debugged. Debuggers must be able to perform this fundamental operation,
so that the user is given control over program exectution. (Breakpointing is
also referred to watchpointing or tracepointing.)


Traditional Debugging


Traditional debugging refers to software debugging without the use of
additional hardware. Several problems that traditional debugging facilities do
not address well include breakpoints in ROM locations, multitask debugging and
breakpointing on memory reference.
Setting code breakpoints in ROM is impossible using a software-only debugger.
The usual mechanism replaces the first part of the instruction at a given
address with an interrupt instruction, which passes control to the debugger.
Since the CPU cannot write into a ROM, a trap cannot be placed and the code
cannot be breakpointed. However, using the 80386/486 debug registers to set a
breakpoint in ROM solves this problem.
In recent years, the implementation of multiple task systems has increased.
Frequently, multiple tasks share common resources, including RAM memory, disk
space and hardware ports. When more than one task share the same memory,
debugging the resultant system can become very difficult. For instance,
suppose you used a memory buffer to buffer a hardware device. The memory
buffer has two different programs running that are allowed to write into it,
according to a protocol that you have devised. However, one of those programs
is not following the protocol correctly and is overwriting memory in the
buffer. The system is not running correctly. Which program should you debug?
The Intel 80386 debugging mechanism lets you stop on task switch, so you'll
have a chance to find the problem.


Breakpoint On Condition


When debugging a program, a programmer may have difficulty knowing just where
a given piece of memory is getting corrupted. He may have single-stepped his
heart out, trying to locate the offending lines of code. One solution to
invalid memory access is "breakpoint on condition". In particular, it would be
nice to have the debugger stop the program whenever that particular piece of
memory is read or written.


Software Breakpointing Mechanism


Stopping on reference to a given variable is available in many debuggers: sdb
under UNIX allows it, and several DOS-based debuggers allow it. Some debuggers
go a little further in letting you say: "Stop when variable!=0". This feature
isn't worth much unless the programmer already knows where the problem exists.
Without hardware assistance, the debugger is forced to single-step the
program, checking each instruction for reference to the particular memory
location. The program must run one-instruction-at-a-time, with control being
transferred to and from the debugger for each instruction. Depending on the
amount of code you run, this process requires too much time.


Hardware Breakpointing Mechanism


Intel solved the speed problem by placing debug registers right on the chip
and providing instructions to access them. Since there are four address
registers, you can monitor up to four memory locations simultaneously. In
addition, there is one control and one status register. You can use these
registers without adversely affecting the resulting program's speed. (In fact,
I ran the dhrystone benchmark under UNIX on a 386, with and without the debug
register set, and found no significant difference in the timings.)


Debug Control Register


The debug control register contains groups of bits that correspond to the four
debug registers. For each address register, the control register has bits that
control local/global exception, read/write control and length.
Local/Global Exception: The local context is used for single-task breakpoints.
It causes breakpoints to be generated only inside the task being debugged.
However, the global context bits can be set for multiple task debugging.
Read/Write Control: Three condition states are defined: instruction execution
(for code breakpointing), read/write (for usual access to a given data
address), and write only (for stopping only when a value is written to the
address). The distinction between read/write and write only is convenient, as
it allows the programmer to skip all the reads of a particular piece of memory
and concentrate on portions of code that modify the value.
Length: Only one-, two-, and four-byte lengths may be used. For larger data
structures, use of more than one debug register is needed.



Debug Status Register


With this list of controls, determining which condition triggered a debug
exception can be a problem. You can consult the debug status register for this
and other information.


The UNIX Debugging Interface -- ptrace (2)


ptrace, the UNIX debugging interface, can be used by one process to control
the execution of another and to read and write the memory space of that
process. ptrace () is necessary for doing anything to the debugged task
(unlike under DOS, where you could just set up a far pointer and read/write
another program's memory space at will). Under UNIX, you must have permission
to access another program's memory and registers. Table 1 lists available
ptrace() requests.
The debugger issues all the requests in Table 1, except for request 0, which
the process being debugged issues to initiate debugging. The debugger can only
control the process being debugged directly via calls 7, 8, and 9, which
correspond to debugger commands like Go, Single Step, and Kill process.
Reading and writing of the process's instruction space can be used for machine
disassembling and for setting instruction breakpoints. Reading and writing of
the process's data space is useful for displaying the contents of variables
and allowing the user to modify them. Reading and writing of the process's
user space is required for accessing many other important pieces of
information about the debugged process, including the CPU registers and
hardware debugging registers.
More information is available in the ptrace(2) section of the UNIX System V
Programmer's Reference Manual.


Implementation


The code in Listing 1 is a simplified debugging example. The code will help
you understand how to use the hardware breakpointing mechanism inside a
debugger program which consists here of a simple C++ debugger class, with
several methods for accomplishing low-level debugging. This is indeed a very
small part of the code necessary to implement your own commercial-grade
debugger.


Where To Go From Here


If you are interested in using the hardware debugging features of the
80386/486, the Intel 80386 and 80486 Programmer's Reference Manual contains
complete specification of the debugging facilities on board those chips. The
UNIX Programmer's Reference Manual (for the 80386) contains more information
on the UNIX interface for debugging.
Alternatively, you may want to use a debugger that takes advantage of this
hardware. Computer Innovations is shipping such a debugger, called
Debug-2000TM, for the UNIX/386 environment.
References
Intel 80386 Programmer's Reference Manual, 1986, Intel Corp.
Intel 80486 Programmer's Reference Manual, 1990, Intel Corp.
UNIX Programmer's Reference Manual, 1989, AT&T, Prentice Hall.
Table 1
Request Description
0 Start tracing (issued by debugged process)
1 Read word in instruction space (code)
2 Read word in data space (data + stack)
3 Read word in user space (registers + more)
4 Write word in instruction space
5 Write word in data space
6 Write word in user space
7 Resume execution (continue)
8 Terminate debugged process (kill)
9 Trace debugged process (single step)

Listing 1
/* debug.c - Demonstration of UNIX debugging interface,
including Hardware Breakpointing.
Copyright (C) 1990 Computer Innovations Inc.
Jon Chappell 8/90, In C++

*/
#include <stdio.h>
#include <osfcn.h>
#include <string.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/mount.h>
#include <sys/dirent.h>
#include <sys/param.h>
#include <sys/immu.h>

#include <sys/region.h>
#include <sys/proc.h>
#include <sys/user.h>
#include <sys/reg.h>
#include <sys/debugreg.h>

// simplified debugger class
class debug {
private:
char *name; /* program name */
char **argv; /* command line args */
int pid; /* process id */
int read_register(int r); /* read process register r */
public:
debug(char *n, char **a); /* constructor */
void startup(void); /* start debugging */
int wait_process(void); /* wait for "debuggee" */
void set_hardware_bkpt(int address); /* set hardware bkpt */
void sing[e_step(void); /* single step (trace) */
void go(void); /* run "debuggee" */
void display_status(void); /* tell user where we stopped */
};

/* There is a lot that could be done here, but let's just
set a data breakpoint on a variable in the program,
do a Go (run the program),
then come back and print out where we are (EIP value)
*/
main()
{
int address;

prog.startup(); /* start program */
if(prog.wait_process()){
/* must single-step one instruction when you first come in */
prog.single_step();
prog.wait_process();
/* Ask user where to place it: */
printf("Enter address (in hex): ");
scanf("%x",&address);
prog.set_hardware_bkpt(address); /* set data breakpoint */
prog.go(); /* run the program */
if(prog.wait_process()){ /* wait for program */
prog.display_status(); /* tell where we are */
}
printf("Done\n");
}
}

// sample program
class debug prog("sample",NULL);

// constructor for debug class.
// Set up the name and argv of the program to debug
debug::debug(char *n, char **a)
{
memset((char *)this,0,sizeof(debug));
name=n;
argv=a;

}
/* load the program being debugged
*/
void debug::startup(void)
{
/* Create a copy of this process. Basically, the program to be debugged
is loaded as follows: First, fork() creates a copy of the process.
fork() is a UNIX system call, which returns 0 to the child, and the
process id to the parent. Then (in case 0: below), the child turns
around and calls ptrace(0,...) to give the parent permission to debug
it, then proceeds to transform itself into the real process to
be debugged via execvp(), which is another UNIX system call, which
performs program chaining.
*/
pid = fork();
switch(pid){
case -1: // error in forking
perror("error in forking");
exit(1);
case 0: /* child process */
ptrace(0,0,0,0); // tell parent it's ok to debug me
/* execute process to be debugged */
if(execvp(name,argv)!=0){
perror("execvp failed");
exit(1);
}
break;
default: /* parent process (continue) */
break;
}
}

/* Wait for process to get back to you, then check to see
if the process has terminated, etc.
*/
int debug::wait_process(void)
{
int status;

if(wait(&status)== -1){
printf("wait() returned -1");
return 0;
}
// status is tested here, to see if a signal was
// received, if the program was terminated, etc.
// see ptrace(2) and signal(2) for details
if((status&0xff)==0177){
if(((status>>8)&0xff)!=SIGTRAP){
printf("program stopped on signal");
// process signal here
return 0;
}
}
// program terminated via exit ?
else if((status&0xff)==0){
printf("Program terminated via exit().");
return 0;
}
// program terminated via signal() ?

else if(((status>>8)&0xff)==0){
printf("Program terminated via signal().");
return 0;
}
return 1;
}

// structure member offset
#ifndef offsetof
#define offsetof(s,m) ((int) ((char *)&((s *)0)->m - (char *)0))
#endif

/* set a hardware data breakpoint, 4 bytes wide,
read/write on the word at this address in the user space.
*/
void debug::set_hardware_bkpt(int address)
{
int type=3; /* read/write */
int len=3; /* 4 bytes (one word) */
int hardware_reg_off; /* hardware register offset */
int control_reg_off; /* control register offset */
int control_word; /* control word to write */

/* Let's use the first hardware register only */
control_word
= ((type + (len<<2)) << DR_CONTROL_SHIFT) 1
 DR_CONTROL_RESERVED DR_LOCAL_SLOWD0WN;

/* write the address into the hardware debug register */
hardware_reg_off = offsetof(struct user, u_debugreg);
ptrace(6,pid,hardware_reg_off,address);

/* write the control word */
control_reg_off = offsetof(struct user, u_debugreg)
+ (DR_CONTROL*sizeof(int));
ptrace(6,pid,control_reg_off,control_word);
}

/* read register 'r' (where r is a #define from debugreg.h)
*/
int debug::read_register(int r)
{
unsigned off;

/* read the base of the register area */
off = ptrace(3,pid,offsetof(struct user, u_ar0),0);

/* read the individual register */
return ptrace(3,pid,off+(r * sizeof(int)),0);
}

/* return control to the program being debugged and Go
*/
void debug::go(void)
{
ptrace(7, pid, 1,0); /* UNIX process trace service call */
}

/* trace a single machine instruction

*/
void debug::single_step(void)
{
ptrace(9,pid,1,0);
}

/* tell where we are
*/
void debug::display_status(void)
{
int eip;
/* read and print EIP */
eip = read_register(EIP);
printf("Stopped at %xh\n",eip);
}
/* end debug.c */















































Portable Transaction Tracking


Russell Cook


Russell is a microcomputer programmer/analyst with the A.C. Nielsen Advanced
Information Technology Center. He develops software for Microsoft Windows
using C++ and does contract work on MS-DOS-based PCs and both UNIX- and
AIX-based computer systems. You can reach him at 400 Manda Lane #118,
Wheeling, IL 60090 (708) 217-3277.


Nearly every program written must access data which is stored on disk. A few
programs only read from disk, but the vast majority of programs both consume
and produce disk-based information. Thus, it is surprising so little effort is
spent ensuring the integrity of disk files. This article will describe a set
of routines I use to protect against corruption of my data files.


Ensuring Database Integrity


Theoretically one could ensure data file integrity by making a duplicate copy
of all data files. By making changes to the duplicate files and then replacing
the original files, the potential for corruption is nullified -- presuming:
There is file system space for a full copy of the database.
The file copying mechanism is error-free.
It is possible to detect a partial replacement, i.e., an interrupted copy
operation.
One can reduce file space overload by making copies of only those database
files which are to be altered, but the other problems remain. (Note that in
neither of these scenarios have I considered resolution of the multiple
readers and writers problem associated with multi-user systems. See [1] for
more details.)
The space overhead can be reduced dramatically by making a copy of only that
data which is to be altered. Since the quantity of data being copied has been
minimized this approach also reduces the potential for a failure during the
copy operation. Moreover, if the changed data is copied to a holding file and
deleted after the original data file is altered, we can detect interrupted
alterations just by checking to see if the holding file exists.


Some Terminology


A transaction is any file I/O operation which modifies the contents of an
existing file. Thus, a transaction includes appending new data to a file,
changing the size of a file by truncation, and, of course, overwriting
existing data. Note that the creation of a file does not constitute a
transaction.
A transaction log file holds descriptions of the modification(s) made to a
data file. This file is the holding file previously mentioned. Maintaining the
log file is called transaction tracking. A transaction rollback is the process
by which the alteration(s) listed in the log file are re-applied to the
database. This process will return the database to a previous state which is
assumed to be consistent.


Design Goals


My design objects for the accompanying code were:
Minimize file system overhead. I wanted to minimize not only the space needed
by the log files, but also the number of seeks, reads and writes.
Recognize potential database inconsistencies automatically. If an
inconsistency is detected, a transaction rollback must occur without user or
program intervention. Until the rollback is successful, access to the
potentially corrupted file(s) must be denied.
Maximize portability. The minimal set of environments to be supported included
SCO Xenix, SCO UNIX, MS-DOS and MS Windows.


Implementation


I could foresee two ways I might want to implement a database integrity
system. The first method would view the database as an integrated whole with
no meaning when divided into its constituent files. Thus, a single transaction
log file would be used to track changes made to any of the database files. A
single transaction log file would be created at program startup, making it
easy to provide rollback and commit options.
The second method would view a database as a collection of sets of database
files, where each set is a distinct, sharable resource. In this view, each
file set would be associated with a distinct transaction log file. Most
likely, a transaction log file would be created just prior to modifying a set
of files and deleted when the modifications were complete. Rollback/commit
options would be more difficult to accomplish in this case.
In order to permit either of these methods to be used, I decided on the
following1 design. Two file open functions were required (see the sidebar
"Function Description"). The first, VFOpen(), opens the database file(s). The
second, Open-Transact(), creates the transaction log file(s). OpentTransact
tests for inconsistency and initiates rollbacks as necessary. Both functions
return a handle which permits access to the associated file. So that the user
cannot "back-door" the transaction tracking system, these handles are
different from system-provided file handles.
Once the database file(s) and transaction log file(s) have been opened, a call
to BeginTransaction(X,Y) tells the tracking system to record alterations to
database file X in transaction log file Y. This function records the
relationship X->Y in the data structure associated with X and bumps the use
count in Y's structure. When tracking is no longer needed for X, a call to
EndTransaction(X,Y) reverses this process.
The use count is necessary because a transaction log file can have multiple
database files associated with it. Only when the use count is 0 may the log
file be safely closed and deleted. Each database file can be associated with
only one active log file. The handle of the associated log file is stored in
the database file's structure. Once a log file is associated with a database
file, a new association cannot be made until the current association is
terminated by a call to EndTransact ion (X,Y).
Since the file open functions do not return system-provided file handles, it
is impossible for the caller to use the normal file I/O routines. Thus, the
system includes a host of functions to replace the normal I/O routines (see
the "Function Descriptions" sidebar).
VFWrite() and VFChangeSize() carry the burden of the transaction tracking
responsibilities. These must determine whether a transaction is: extending the
file, truncating the file, or modifying the existing contents of the file. (In
fact, I treat truncations as a special case of modifications.) Once the type
of transaction is determined, the necessary transaction record(s) are written
to the log file by the WriteTransRecord() function.


Achieving Portability


Differences in filenaming conventions, multi-user access capabilities, and
run-time library functionality make file I/O one of the most
platform-dependant aspects of C programming The code makes heavy use of
preprocessor macros to hide as many platform variations as possible.
Many of these macros are in ENVIRON.H (Listing 1). This file defines the
compilation environment. You should define exactly one compiler specifier
(MSC51_ENV., TURBOC_ENV., SCOUNIX_ENV., etc.) to adjust the remaining code for
the nuances of the environment. Another set of identifiers adapts for mixed
model variations.
TRANSACT.H (Listing 2) provides the function prototypes for all of the
functions exported for your use. (The line continuation characters which
appear on each line of a prototype compensate for an anomaly in the SCO UNIX
development system.)

Near the beginning of TRANSACT.C (Listing 3) are three major blocks of
conditionally compiled code. Each of these blocks includes the necessary
header files and defines a set of macros and manifest constants based upon the
compiler specifier defined in ENVIRON.H.
Generally, the macros are used to create a consistent return code (ERROR_CODE
or SUCCESS_CODE) from all of the runtime functions.


Cautions


This code assumes that it has exclusive access to all resources associated
with a transaction log file, once the log file is opened. Calls to VFOpen() or
OpenTransact() must include a fully qualified filename because the names of
the database files re stored in the log files. If a database file was opened
using a path relative to the current working directory, a rollback might later
undo changes to an identically named database in a different directory.
If transaction log files are specified with relative paths, an existing
transaction log file might not be detected. Because of this filename, no
rollback would be attempted and your program would be led to believe that the
database was consistent when, in fact, it might be grievously corrupted.


Conclusions


In this article, I have presented the essential facets of a transaction
tracking system I use to protect the integrity of my databases. The code has
been written to be as portable as possible. It has been successfully tested on
SCO Xenix and UNIX systems, MS-DOS using both Turbo C and Microsoft C 5.1 and
Microsoft Windows v2.11 (although not all such code is provided here).
References
[1] Lyle Frost, "Using Files As Semaphores." The C Users Journal, April 1990,
Volume 8., Number 4.
Transaction Record Format
In order to undo the changes made to a database file, the following
information must be known:
the fully qualified database file's name,
the mode and permissions used to open the database file,
the offset in the database file where the data was changed,
the number of bytes changed,
a copy of the altered data and
the type of alteration which took place.
This information forms the core of a transaction record in a transaction log
file.
Figure 1 shows the specific format of a transaction record. Since both the
length of the filename and the number of bytes in the data area may vary, both
of these values are stored in the TRollBackStruct. (Note: the null terminator
of the filename is also recorded in the transaction record for use during a
rollback.) While it is possible to make both of these areas a fixed size,
doing so would result in unnecessary data being written to disk.
Since a transaction record has a fixed maximum size (XFER_BYTES), a single
user-initiated file modification may require multiple transaction records.
For a file extension operation (appending new data or changing the file size
so it is larger), no bytes are recorded in the data area. Instead, the value
in the file offset field of the structure is the old file length.
Transaction records must be rolled back in reverse chronological order. To do
so efficiently, we must be able to find the beginning offset of each record.
Storing the total record size at the start of each record enables the system
to seek to the location where the next record should be located.
Once the end of the last record is found, the fixed size TRollBackStruct at
the end of the record can be retrieved. This structure holds enough
information to retrieve both the filename and logged data and position the log
file at the end of the previous record.
Each transaction record must be recorded in a single file write() call.
Multiple writes would open a window in which the system may fail, leaving a
partial transaction record in the log file. Multiple writes would also add
unnecessary file system overhead.
Function Descriptions
FHANDLE VFOpen(filename,flags,permissions) This function is used to open your
program's data file(s). The filename must be fully qualified since it is
written to the transaction log file. The flags and permissions are any valid
combination permitted by the operating system. The function returns BADFHANDLE
if the file cannot be opened. Otherwise it returns a valid FHANDLE.
BOOL VFClose( dataHandle ) This functions closes a program's data file(s). It
returns TRUE if successful, and FALSE otherwise. The dataHandle must have been
obtained from a call to VFOpen().
FHANDLE OpenTransact( logName, logType ) As with data files, the logName must
be a fully qualified filename. The logType can be either NORMAL_TRANS or
DEL_ON_CLOSE. If you specify NORMAL_TRANS, you must make an explicit call to
DeleteTransactionFile() after closing the log file. If you specify
DEL_ON_CLOSE, the log file is automatically deleted after it is closed. The
function returns BADFHANDLE if the log file cannot be created or if the
rollback of an existing log file failed. Otherwise, a valid FHANDLE is
returned.
BOOL CloseTransact( logHandle ) To close a transaction log file, pass this
function the FHANDLE returned by OpenTransact(). It will return TRUE if the
log file is closed and, if specified, deleted. Otherwise, FALSE is returned.
BOOL DeleteTransactionFile( logName ) If you specify NORMAL_TRANS alone in a
call to OpenTransact(), you must call this function with the same logName when
you wish to commit all transactions recorded in the log file. The function
will return TRUE if the file is successfully deleted, and FALSE otherwise.
BOOL BeginTransaction( dataHandle, logHandle ) This function associates the
data file with the indicated transaction log file. Subsequent changes to the
data file will be tracked using the log file. dataHandle is the return value
of VFOpen() and logHandle is from OpenTransact(). TRUE is returned if the
association is established; FALSE otherwise.
BOOL EndTransaction( dataHandle, logHandle ) This function terminates an
update session begun with a call to BeginTransaction(). It takes the same
parameters as the corresponding BeginTransaction call. TRUE is returned if the
tracking is discontinued.
int VFRead( dataHandle, buffer, bytes ) This function retrieves data from the
data file associated with dataHandle. It assumes that buffer is large enough
to hold the number of bytes specified by bytes. On error it returns -1,
otherwise, it returns the number of bytes retrieved from the data file.
int VFWrite( dataHandle, buffer, bytes ) This function writes up to bytes
characters from buffer to the data file associated with dataHandle. If the
data file has been associated with a transaction log file using
BeginTransaction(), the necessary transaction records will be recorded. The
return value is -1 if an error occurs; otherwise the number of bytes written
to the data file is returned.
long VFLSeek( dataHandle, offset, whence ) This function repositions the
current file location of the data file specified by dataHandle. Repositioning
is relative to either the start of the file (whence=0), the current file
position (whence=1) or the end of file (whence=2). The file position may not
be positioned before byte 0 or beyond the current end-of-file. The return
value is -1L in case of error. Otherwise, the new file position is returned.
long VFTell( dataHandle ) This function returns the current file position; a
shorthand for VFLSeek( dataHandle, 0L, 0).
BOOL VFLocking( dataHandle, lockType, bytes ) This function locks a portion of
a data file. The lock starts at the current file position and extends for the
specified number of bytes. The lockType may be either SHARED_LOCK (permits
other processes to read the bytes), EXCLUSIVE_LOCK (blocks all access to the
locked bytes) or UNLOCK_BYTES (releases locks on the bytes). The return value
is TRUE if the operation is successful or FALSE if it fails.
BOOL VFChangeSize( dataHandle, newSize ) This function changes the length of
the data file to the number of bytes specified. If the file is extended, the
new region is filled with zeros. currently being tracked, the proper
transaction record is written to the log file. The function returns TRUE if
the file's size is successfully altered. NOTE: This function always returns
FALSE (failure) under SCO UNIX.
BOOL VFStat( dataHandle, statStruck ) This function fills the stat structure
parameter with environment specific data concerning the file associated with
dataHandle. It returns TRUE if able to retrieve the data; else FALSE.
int VFLastError() Whenever a function returns a failure code, calling this
function will indicate the cause of the problem. This function should be
called immediately as all functions zero the error code variable on entry.
Figure 1 Len <=XFER.SIZE

Listing 1 (environ.h)
/*************************
File: ENVIRON.H
Created
By: Russell Cook
*************************/


#ifndef _ENVIRON_H_____LINEEND____

#define _ENVIRON_H_____LINEEND____


#ifndef BOOL
# define BOOL unsigned char
#endif


#ifndef TRUE
# define FALSE (BOOL)0
# define TRUE (BOOL)˜FALSE
#endif


#ifndef VOID
# define VOID char
#endif


define MSC51_ENV
/* using Microsoft C 5.1
compiler */
/* #define TURBOC_ENV
/* using Borland Turbo C
version 2.0 */
/* #define SCOUNIX_ENV
/* using Santa Cruz Op's Unix */


/% #define MIXED_MODEL
/* using both near
and far pointers */
/* NOTE: Shouldn't be used
w/LARGE model */


#ifdef MIXED_MODEL
# define FARFNCT far pascal
# define NEARFNCT near pascal
# define FAR far
# define NEAR near
#else
# define FARFNCT
# define NEARFNCT
# define FAR
# define NEAR
#endif


#endif /* _ENVIRON_H_ */


Listing 2 (transact.h)
/*************************
File: TRANSACT.H
Created
By: Russell Cook
*************************/


#ifndef _TRANSACT_H_____LINEEND____
#define _TRANSACT_H_____LINEEND____

#define FHANDLE int

#define BADFHANDLE ((FHANDLE)-1)

FHANDLE FARFNCT VFOpen( char FAR *, /* fully qualified data file name */\
int, /* open flags (O_RDONLY...) */\
int /* permissions */\
);
BOOL FARFNCT VFClose( FHANDLE /* from VFOpen() */ );

FHANDLE FARFNCT OpenTransact( char FAR *, /* fully qualified log name */\
BOOL /* operation flags */\
);
BOOL FARFNCT CloseTransact( FHANDLE /* from TransactOpen() */ );
BOOL FARFNCT DeleteTransactionFile( \
char FAR * /* name of transaction file */\
);
BOOL FARFNCT RollBack( char FAR * /* fully qualified log name */\
);

/* transaction log file type specifiers */
#define NORMAL_TRANS ((BOOL)1)
#define DEL_ON_CLOSE ((BOOL)(NORMAL_TRANS 2 ))

BOOL FARFNCT BeginTransaction( FHANDLE, /* from VFOpen() */\
FHANDLE /* from TransactOpen() */\
);
BOOL FARFNCT EndTransaction( FHANDLE, /* from VFOpen() */\
FHANDLE /* from TransactOpen() */\
);

int FARFNCT VFRead( FHANDLE, /* from VFOpen() */\
VOID FAR * /* buffer to be filled */\
unsigned int /* # bytes to be read */\
);
int FARFNCT VFWrite( FHANDLE, /* from VFOpen() */\
VOID FAR *, /* buffer to be written */\
unsigned int /* # of bytes to be written */\
);

long FARFNCT VFLSeek( FHANDLE, /* from VFOpen() */\
long, /* # of bytes to seek */\
int /* location from which to seek */\
);
long FARFNCT VFTell( FHANDLE /* from VFOpen() */ );

BOOL FARFNCT VFLocking( FHANDLE, /* from VFOpen() */\
int, /* type of lock desired */\
long /* # of bytes to lock */\
);
#define SHARED_LOCK ((int) 1)
#define EXCLUSIVE_LOCK ((int) 2)
#define UNLOCK_BYTES ((int) 4)

BOOL FARFNCT VFChangeSize( FHANDLE, /* from VFOpen() */\
long /* new file size */\

);

BOOL FARFNCT VFStat( FHANDLE, /* from VFOpen() */\
struct stat NEAR * /* struct to be filled */\
);

int FARFNCT VFLastError( void );
/* The following error codes may be returned by VFLastError() */
#define ENONE 0 /* no error */
#define EHANDLE 1 /* FHANDLE bad */
#define ECLOSED 2 /* FHANDLE references a closed file */
#define ENOTUSER 3 /* specified file is not a user file */
#define EACTIVE 4 /* unmatched BeginTransaction() */
#define ECLOSE 5 /* system file close failed */
#define EUSER 6 /* specified file is a user file */
#define EUNLINK 7 /* system file unlink failed */
#define ENOTTRACK 8 /* handles specify non-tracking files */
#define ESEEK 9 /* system file seek failed */
#define EOFFSET 10 /* pos. offset from EOF */
#define EPOS 11 /* file position < OL */
#define ELTYPE 12 /* bad lock type */
#define ELOCK 13 /* system lock function failed */
#define ECHSIZ 14 /* system file size change function failed */
#define EFSTAT 15 /* system file status function failed */
#define EMALLOC 16 /* system memory allocation failed */
#define EREALLOC 17 /* system memory reallocation failed */
#define ESTRDUP 18 /* system string duplication failed */
#define EOPEN 19 /* system file open failed */
#define ETTYPE 20 /* transaction file type is invalid */
#define EREAD 21 /* system file read failed */
#define ENOTCLOSED 22 /* eleting non-closed file not permitted */
#define EWRITE 23 /* system file write failed */
#define ETRACK 24 /* failure storing transaction record */
#define EDELTA 25 /* file size change too great for tracking */
#define EROLL 26 /* failure during transaction log rollback */

#endif /* _TRANSACT_H_ */


Listing 3 (transact.c)
/************************
File: TRANSACT.C
Created
By: Russell Cook
************************/

#include <stdio.h>
#include <fcntl.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <errno.h>

#include "environ.h"
#include "transact.h"

/*===== MODULE STATIC VARIABLE TYPES =====*/
/* offset: long file offset where user believes file is positioned */
/* fd: valid file descriptor for CLOSED_FILE (when free) */

/* openMode: mode with which file was opened */
/* operPerms: access permissions used for open */
/* index: handle of associated trans. file (bFileType == USER_FILE) */
/* nUseCount: # of associated user files (file type is trans. file) */
/* bFileType: USER_FILE or a valid transaction log file type specifier */
/* npcFileName: near pointer to a null-terminated filename */
typedeE struct {
long offset;
int fd;
int oponMode;
int openPerms;
union {
FHANDLE index;
int nUseCount;
} unionData;
BOOL bFileType;
char NEAR *npcFileName;
} TFile;

#define FILEOFFSET( npFile ) (npFile)->offset
#define FILEDESCRIPTOR( npFile ) (npFile)->fd
#define FILEMODE( npFile ) (npFiLe)->openMode
#define FILEPERMS( npFile ) (npFile)->openPerms
#define FILEINDEX( npFile ) (npFile)->unionData.index
#define FILEUSECOUNT( npFile ) (npFile)->unionData.nUseCount
#define FILETYPE( npFile ) (npFile)->bFileType
#define FILENAME( npFile ) (npFile)->npcFileName

/* lOffset: offset in USER data file where data should be written */
/* DataBytes: number of bytes of data extracted from USER file */
/* FileMode: read/write accessibility used to open file */
/* FilePermissions: access permissions used when USER file opened */
/* NOTE: NEW_FILE already stripped */
/* sOperation: action which caused transaction logging */
/* NameLen: # of bytes (including null) in filename field */
typedef struct {
long lOffset;
unsigned int DataBytes;
int FileMode;
int FilePermissions;
int NameLen;
short sOperation;
} TRollBackStruct;

#define ROLLOFFSET( npRoll ) (npRoll)->lOffset
#define ROLLBYTES( npRoll ) (npRoll)->Databytes
#define ROLLMODE( npRoll ) (npRoll)->FileMode
#define ROLLPERMS( npRoll ) (npRoll)->FilePermissions
#define ROLLOP( npRoll ) (npRoll)->sOperation
#define ROLLNAMELEN( npRoll ) (npRoll)->NameLen

/*===== MODULE STATIC MANIFEST CONSTANTS =====*/
#ifdef MSC51_ENV

# include <io.h>
# include <stdlib.h>
# include <mattoc.h>
# include <dos.h>
# include <sys/locking.h>

# include <limits.h>

# define ERROR_CODE 1
# define SUCCESS_CODE 0
# define NEW_FILE (0_CREAT 0_EXCL)
# define READ_FILE 0_RDONLY
# define WRITE_FILE 0_WRONLY
# define BINARY_FILE 0_BINARY
# define SHARED_FILE S_IWRITE S_IREAD
# define SYNC_FILE 0
# define _LPN_MAX _MAX_DIR
# define _LFN_MAX_ (_MAX_FNAME + _MAX_EXT )
# define MAXUINT UINT_MAX

# define OPEN(name,mode,perats,pFd) \
((*pFd = open(name,mode,perms)) == -1 ? ERROR_CODE : SUCCESS_CODE )
# define READ(fd,buf,count,pNum) _dos_resd(fd,buf,count,pNum)
# define WRITE(fd,buf,count,pNum) _dos_write(fd,buf,count,pNum)
# define LSEEK(fd,offset,whence, pPos) \
((*pPos = lseek(fd,offset,whence)) == -1L ? ERROR_CODE : SUCCESS_CODE
# define CLOSE(fd) (close(fd) == -1 ? ERROR_CODE : SUCCESS_CODE)

# define CHSIZE(fd, lbytes) \
(chsize(fd, lbytes) == -1 ? ERROR CODE : SUCCESS_CODE )

# define UNLlNK(npName) \
(unlink(npName) == -1 ? ERROR_CODE : SUCCESS_CODE )
# define FSTAT(fd,buf) \
(fstat(fd,buf) == -1 ? ERROR_CODE : SUCCESS_CODE )

# ifdef MIXED_MODEL
# define FSTRCPY(lpDest, lpSource) farStrcpy(lpDest,lpSource)
# define FSTRLEN(lpString) farStrlen(lpString)
# else
# define FSTRCPY(lpDest,lpSource) strcpy(lpDest,lpSource)
# define FSTRLEN(lpString) strlen(lpString)
# endif /* MIXED_MODEL */

# define STRCMP(npStr1,npStr2) stricmp(npStr1,npStr2)

# define STRDUP( npString ) strdup( npString )
# define MALLOC( bytes ) malloc( bytes )
# define REALLOC( ptr, bytes ) realloc( ptr, bytes )
# define FREE( ptr ) free( ptr )

# define BYTE_BITS 8
# define BITS(x) (BYTE BITS * sizeof(x))
#endif /* MS-DOS environment */

#ifdef TURBOC_ENV
# include <io.h>
# include <stdlib.h>
# include <alloc.h>
# include <dos.h>
# incLude <limits.h>

# define ERROR_CODE 1
# define SUCCESS_CODE 0
# define NEW_FILE (0_CREAT O_EXCL)

# define READ_FILE 0_RDONLY
# define WRITE_FILE 0_WRONLY
# define BINARY_FILE 0_BINARY
# define SHARED_FILE S_IWRITE S_IREAD
# define SYNC_FTLE 0
# define _LPN_MAX_ 128
# define _LFN_MAX_ ( 13 )
# define MAXUINT UINT_MAX

# define OPEN(name,mode,perms,pFd) 
((*pFd = open(name,mode,perms)) == -1 ? ERROR_CODE : SUCCESS_CODE )
# define READ(fd,buf,count,pNum) \
((*pNum = read(fd,buf,count)) == -1 ? ERROR_COOE : SUCCESS_CODE )
# define WRITE(fd,buf,count,pNum) \
((*pNum = write(fd,buf,count)) == -1 ? ERROR_COOE : SUCCESS_CODE )
# define LSEEK(fd,offset,whence,pPos) \
((*pPos = lseek(fd,offset,whence)) == -1L ? ERROR_CODE : SUCCESS_CODE)
# define CLOSE(fd) (close(fd) == -1 ? ERROR_CODE : SUCCESS_CODE)

# define CHSlZE(fd,lbytes)\
(chsize(fd,lbytes) == -1 ? ERROR_CODE : SUCCESS_COOE )
# define UNLINK(npName) \
(unlink(npName) == -1 ? ERROR_CODE : SUCCESS_CODE )
# define FSTAT(fd,buf) \
(fstat(fd,buf) == -1 ? ERROR_CODE : SUCCESS_CODE )

# define FSTRCPY(lpDest,lpSource) strcpy(lpDest,lpSource)
# define FSTRLEN(lpString) strlen(lpString)
# define STRCMP(npStr1,npStr2) stricmp(npStr1,npStr2)

# define STRDUP( npString ) strdup( npString )
# define MALLOC( bytes ) malloc( bytes )
# define REALLOC( ptr, bytes ) realloc( ptr, bytes )
# define FREE( ptr ) free( ptr )

# define BYTE_BITS 8
# define BITS(x) (BYTE_SITS * sizeof(x))
#endif /* TurboC environment */

#ifdef SCOUNIX_ENV
# include <stdlib.h>
# include <values.h>
# include <mnttab.h>
# include <limits.h>

# define ERROR_CODE 1
# define SUCCESS_CODE 0
# define NEW_FILE (0_CREAT I 0_EXCL)
# define READ_FILE 0_RDONLY
# define WRITE_FILE 0_WRONLY
# define BINARY_FILE 0
# define SYNC_FILE 0_SYNC
# define SHARED_FILE 0666
# define _LPN_MAX_ LPNMAX
# define _LFN_MAX_ LFNMAX
# define MAXUINT UINT_MAX
extern int open(char *,int,int);
extern int close( int );
extern long lseek( int, long, int );

extern int read( int, char *, unsigned );
extern int write( int, char *, unsigned );
extern int fstat(int,struct stat *);
extern int unlink( char * );

# define OPEN(name,mode,perm,pFd) 
((*pFd = open(name,mode,perms)) == -1 ? ERROR_CODE : SUCCESS_CODE )
# define READ(fd,buf,count,pNum) \
((*pNum = read(fd,buf,count)) == -1 ? ERROR_CODE : SUCCESS_CODE )
# define WRITE(fd,buf,count,pNum) \
((*pNum = write(fd,buf,count)) == -1 ? ERROR_CODE : SUCCESS_CODE )
# define LSEEKC(fd,offset,whence,pPos) \
((*pPos = lseek(fd,offset,whence)) == -1L ? ERROR_CODE : SUCCESS_CODE)
# define CLOSE(fd) (close(fd) == -1 ? ERROR_CODE : SUCCESS_CODE)

# define CHSIZE(fd,lbytes) ERROR_CODE
# define UNLlNK(npName) 
(unlink(npName) == -1 ? ERROR_CODE : SUCCESS_CODE )
# define FSTAT(fd,buf) \
(fstat(fd,buf) == -1 ? ERROR_CODE : SUCCESS_CODE )

# define FSTRCPY(lpDest,lpSource) strcpy(lpDest,lpSource)
# define FSTRLEN(lpString) strlen(lpString)
# define STRCMP(npStr1,npStr2) strcmp(npStr1,npStr2)

# define STRDUP( npString ) strdup( npString )

# define MALLOC( bytes ) malloc( bytes )
# define REALLOC( ptr, bytes ) realloc( ptr, bytes )
# define FREE( ptr ) free( ptr )
#endif /* some variation of Unix environment */

#ifndef READ
1 = 0; /* didn't get an environment specifier */
#endif

#define HANDLE_MASK ((FHANDLE)(1 << (BITS(FHANDLE) - 2) ))

#define USER_FILE ((BOOL)0)
#define CLOSED_FILE ((int)-1)

#define INCR_COUNT ( 5 )
#define BAD_FTYPE ((B00L)(1 << (BITS( BOOL) - 1)))

#define APPEND_UP ((short) 1)
#define OVERWR_OP ((short) 2)
#define CHSIZE_OP ((short) 4)

#define DATA_AREA 256
#define XFER_SIZE (sizeof(TRollBackStruct) + sizeof(short) + \
_LPN_MAX_ + _LFN_MAX_ + DATA_AREA )

/*===== MODULE STATIC, GLOBAL VARIABLES =====*/
static TFile NEAR *npFileArray = (TFile NEAR *)0;
static int nArrayElements = 0;
static int nElementsInUse = 0;
static int nLastError = ENONE;

/*===== MODULE STATIC FUNCTIONS =====*/

static FHANDLE NEARFNCT AddFileToTable( char FAR *, int, int, BOOL);
static BOOL NEARFNCT WriteTransRecord( short, unsigned int,\
long, TFile NEAR * );

#ifdef MIXED_MODEL
static int NEARFNCT farStrlen(char FAR * );
static void NEARFNCT farStrcpy(char FAR *, char FAR * );
#endif /* MIXED_MODEL */

#ifdef NEED_FLUSH
static BOOL NEARFNCT FlushFile( TFile NEAR * );
#else
# define FlushFile( lpFile ) TRUE
#endif /* NEED_FLUSH */
/*FDB**************************************************
Function: VFOpen
Inputs: far pointer to null-terminated file name
int specifying operation to perform for open
int specifying permission settings for file
Outputs:
Return: a valid handle for the file or BADFHANDLE
Assumptions: the file name MUST be a fully qualified, valid name
for the current environment
See Also:
******************************************************/

FHANDLE FARFNCT VFOpen( char FAR *lpcName, /* name of data file */\
int mode, /* open flags (O_RDONLY...) */\
int perms /* permissions */\
)
{
FHANDLE index;

nLastError = ENONE;
errno = 0;

if ((index = AddFileToTable( lpcName, mode, perms,
USER_FILE )) == BADFHANDLE)
return BADFHANDLE;

return index HANDLE_MASK;
}
/*FDB***************************************************
Function: VFClose
Inputs: file handle from VFOpen()
Outputs:
Return: TRUE if file is closed else FALSE
Assumptions:
See Also: VFOpen
*******************************************************/

BOOL FARFNCT VFClose( FHANDLE FileHandle )
{
int index = FileHandle & ~HANDLE_MASK;
BOOL ret;
TFile NEAR *npFile;

nLastError = ENONE;
errno = 0;


if (FileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = FALSE;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = FALSE; /* file not open */
}
else if (FILETYPE( npFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = FALSE; /* trying to close non-user file */
}
else if (FILEINDEX( npFile ) != BADFHANDLE)
{
nLastError = EACTIVE;
ret = FALSE; /* there's an active transaction */
}
else if ((ret = (CLOSE( FILEDESCRIPTOR( npFile ) ) ==
ERROR_CODE ? FALSE : TRUE )) == TRUE)
{
FILEDESCRIPTOR( npFile ) = CLOSED_FILE;
FILEINDEX( npFile ) = BADFHANDLE;
FREE( FILENAME( npFile ) );
--nElementsInUse;
}
else nLastError = ECLOSE;
return ret;
}
/*FDB**************************************************
Function: OpenTransact
Inputs far pointer to null-terminated file name
int specifying type of transaction file to be created
Outputs:
Return: BADFHANDLE if unable to (1)rollback end existing transaction
file or (2) if the file cannot be created
Assumptions: if specified file exists, a prior transaction failed before
completion. Therefore, a rollback of the files contents
is attempted. If this succeeds, the transaction file is
deleted and recreated. Otherwise, BADFHANDLE is returned.
Also, the transaction file name MUST be a fully qualified,
valid name for the current environment.
See Also: CloseTransact, DeleteTransact
******************************************************/

FHANDLE FARFNCT OpenTransact( char FAR *lpcTransName, \
BOOL bTransType )
{
FHANDLE index;

nLastError = ENONE;
errno = 0;

if (bTransType != NORMAL_TRANS && bTransType != DEL_ON_CLOSE)
{
nLastError = ETTYPE;

return BADFHANDLE;
}
else if (RollBack( lpcTransName ) == FALSE)
return BADFHANDLE; /* nLastError set by RollBack() */
else if ((index = AddFileToTable( lpcTransName,
NEW_FILE WRITE_FILE BINARY_FILE SYNC_FILE,
SHARED_FILE,
bTransType )) == BADFHANDLE)
return BADFHANDLE; /* nLastError set by AddFileToTable() */

FILEUSECOUNT( npFileArray + index ) = 0;

return index HANDLE_MASK;
}
/*FDB*************************************************
Function: CloseTransact
Inputs: file handle of open transact file from TransactOpen()
Outputs:
Return: TRUE if file is closed and/or deleted else FALSE
Assumptions: If the transaction file was opened as DEL_ON_CLOSE,
the file will be deleted after closure. Otherwise, the
caller is responsible for calling DeleteTransactFile().
See Also: OpenTransact, DeleteTransactionFile
******************************************************/
BOOL FARFNCT CloseTransact( FHANDLE TransHandle )
{
int index = TransHandle & ~HANDLE_MASK;
BOOL ret;
TFile NEAR *npFile;

nLastError = ENONE;
errno = 0;

if (TransHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = FALSE;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = FALSE; /* file not open */
}
else if (FILETYPE( npFile ) == USER_FILE)
{
nLastError = EUSER;
ret = FALSE; /* trying to close a user file */
}
else if (FILEUSECOUNT( npFile ) != 0)
{
nLastError = EACTIVE;
ret = FALSE; /* there's an active transaction */
}
else if ((ret = (CLOSE( FILEDESCRIPTOR( npFile ) ) ==
ERROR_CODE ? FALSE : TRUE )) == TRUE)
{
FILEDESCRIPTOR( npFile ) = CLOSED_FILE;
FILEUSECOUNT( npFile ) = BADFHANDLE;
if (FILETYPE( npFile ) == DEL_ON_CLOSE &&

UNLINK( FILENAME( npFile ) ) == ERROR_CODE)
{
nLastError = EUNLINK;
ret = FALSE;
}
FREE( FILENAME( npFile ) );
--nElementsInUse;
}
else nLastError = ECLOSE;
return ret;
}
/*FDB*************************************************
Function: DeleteTransactionFile
Inputs: far pointer to null-terminated transaction file name
Outputs:
Return: TRUE if file is removed from the file system else FALSE
Assumptions: non-existence of the file is equivalent to deletion. Also,
the transaction file name MUST be a fully qualified, valid
name for the current environment.
See Also: OpenTransact, CloseTransact
******************************************************/

BOOL FARFNCT DeleteTransactionFile( char FAR *lpcTransName )
{
char NEAR *npFileName;
int index;
TFile NEAR *npFile;

#ifdef MIXED_MODEL
char nearName[ _LPN_MAX_ + _LFN_MAX_ ];

FSTRCPY( nearName, LpcTransName );
npFileName = nearName;
#else
npFileName = lpcTransName;
#endif

nLastError = ENONE;
errno = 0;

for (index = 0,
npFile = npFileArray;
index < nArrayElements;
++index, ++npFile)
{
/* can use strcmp() since both string pointers are */
/* guaranteed to be no larger than the default */
/* pointer size for the model in use. */
if (FILENAME( npFile ) != (char NEAR *)0 &&
STRCMP( FILENAME( npFile ), npFileName ) == 0 &&
FILEDESCRIPTOR( npFile ) != CLOSED_FILE)
{
nLastError = ENOTCLOSED;
return FALSE;
}
}

if (UNLINK( npFileName ) == ERROR_CODE)
{

nLastError = EUNLINK;
return FALSE;
}

return TRUE;
}
/*FDB**************************************************
Function: BeginTransaction
Inputs: handle to file opened using VFOpen()
handle of transaction file opened unsing OpenTransact()
Outputs:
Return: TRUE if a transaction is started else FALSE
Assumptions:
See Also: VFOpen, OpenTransact
******************************************************/

BOOL FARFNCT BeginTransaction( FHANDLE fileHandle, \
FHANDLE transHandle )
{
int fileIndex = fileHandle & ~HANDLE_MASK,
transIndex = transHandle & ~HANDLE_MASK;
BOOL ret = TRUE;
TFile NEAR *npUserFile, NEAR *npTransFile;

nLastError = ENONE;
errno = 0;

if (fileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = FALSE;
}
else if (FILEDESCRIPTOR((npUserFile = npFileArray + fileIndex)) ==
CLOSED_FILE)
{
npLastError = ECLOSED;
ret = FALSE; /* file not open */
}
else if (FILETYPE( npUserFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = FALSE; /* fileHandle does not ref a user file */
}
else if (FILEINDEX( npUserFile ) != BADFHANDLE)
{
nLastError = EACTIVE;
ret = FALSE; /* there's an active transaction */
}
else if (transHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = FALSE;
}
else if (FILEDESCRIPTOR((npTransFile = npFileArray + transIndex)) ==
CLOSED_FILE)
{
nLastError = ECLOSED;
ret = FALSE; /* file not open */
}

else if (FILETYPE( npTransFile ) == USER_FILE)
{
nLastError = EUSER;
ret = FALSE; /* transHandle does not ref a trans file */
}
else {
FILEINDEX( npUserFile ) = transIndex;
++FILEUSECOUNT( npTransFile );
}
return ret;
}
/*FDB**************************************************
Function: EndTransaction
Inputs: handle for a file opened using VFOpen
handle of a transaction file opened using TransactOpen
Outputs:
Return: TRUE if an active transaction between the handles is
terminated successfully else FALSE
Assumptions: a BeginTransaction() call using the same handles was
successful and no EndTransaction() has been executed
using these handles
See Also: VFOpen, OpenTransact, BeginTransaction
*******************************************************/

BOOL FARFNCT EndTransaction( FHANDLE fileHandle, \
FHANDLE transHandle )
{
int fileIndex = fileHandle & ~HANDLE_MASK,
transIndex = transHandle & ~HANDLE_MASK;
BOOL ret = TRUE;
TFile NEAR *npUserFile, NEAR *npTransFile;

nLastError = ENONE;
errno = 0;

if (fileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = FALSE;
}
else if (FILEDESCRIPTOR((npUserFile = npFileArray + fileIndex)) ==
CLOSED_FILE)
{
nLastError = ECLOSED;
ret = FALSE; /* file not open */
}
else if (FILETYPE( npUserFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = FALSE; /* fileHandle does not ref a user file */
}
else if (FILEINDEX( npUserFile ) != transIndex)
{
nLastError = ENOTTRACK;
ret = FALSE; /* file's transactions going someplace else */
}
else if (transHandle == BADFHANDLE)
{
nLastError = EHANDLE;

ret = FALSE;
}
else if (FILEDESCRIPTOR((npTransFile = npFileArray + transIndex)) ==
CLOSED_FILE)
{
nLastError = ECLOSED;
ret = FALSE; /* file not open */
}
else if (FILETYPE( npTransFile ) == USER_FILE)
{
nLastError = EUSER;
ret = FALSE; /* transHandle does not ref a trans file */
}
else {
FILEINDEX( npUserFile ) = BADFHANDLE;
--FILEUSECOUNT( npTransFile );
}
return ret;
}
/*FDB*************************************************
Function: VFRead
Inputs: handle of a file opened using VFOpen()
far pointer to a buffer
unsigned int # of bytes to be read into the buffer
Outputs: the first n bytes of the buffer are filled with data
from the file where n is the return value of the fnct
Return: -1 on failure else n where n is the # of bytes placed
into the buffer
Assumptions: the buffer to be filled is at least as large as the
number of bytes to be read
See Also: VFOpen
******************************************************/

int FARFNCT VFRead( FHANDLE FileHandle,\
VOID FAR *lpvBuffer,\
unsigned int numBytes )
{
int index = FileHandle & ~HANDLE_MASK;
int ret;
TFile NEAR *npFile;
long lOffset;

nLastError = ENONE;
errno = 0;

if (FileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = -1;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = -1; /* file not open */
}
else if (FILETYPE( npFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = -1; /* trying to read non-user file */

}
else if (FILEOFFSET( npFile ) == -1L)
{
nLastError = EPOS;
ret = -1;
}
else if (LSEEK( FILEDESCRIPTOR( npFile ), FILEOFFSET( npFile ),
0, &lOffset ) == ERROR_CODE 
lOffset != FILEOFFSET( npFile ))
{
nLastError = ESEEK;
ret = -1;
}
else if (READ( FILEDESCRIPTOR( npFile ), lpvBuffer, numBytes,
&ret ) == ERROR_CODE)
nLastError = EREAD;
return ret;
}
/*FDB************************************************
Function: VFWrite
Inputs handle of file to be read opened using VFOpen()
far pointer to buffer to be written
# of bytes in buffer to be written
Outputs:
Return: # bytes written to file
Assumptions:
See Also:
*****************************************************/

int FARFNCT VFWrite( FHANDLE fileHandle,\
VOID FAR *lpvBuffer,\
unsigned int numBytes )
{
int index = fileHandle & ~HANDLE_MASK;
int ret;
TFile HEAR *npFile;
long lFileOffset, lFileLength;

nLastError = ENONE;
errno = 0;

if (fileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = -1L;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = -1L /* file not open */
}
else if (FILETYPE( npFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = -1L; /* trying to seek a non-user file */
}
else if (FILEOFFSET( npFile ) == -1L)
{
nLastError = EPOS;

ret = -1;
}
else if (LSEEK( FILEDESCRIPTOR( npFile ), 0L, 2,
&lFileLength ) == ERROR_CODE 
LSEEK( FILEDESCRIPTOR( npFile ),
FILEOFFSET( npFile ), 0, &lFileOffset) == ERROR_CODE 
lFileOffset != FILEOFFSET( npFile ))
{
nLastError = ESEEK;
ret = -1;
}
else if (LFileOffset > lFileLength)
{
nLastError = EPOS;
ret = -1;
}
else if (FILEINDEX( npFile ) == BADFHANDLE)
{
/* user file is NOT being tracked */
if (WRITE( FILEDESCRIPTOR( npFile ), lpvBuffer, numBytes,
&ret) == ERROR_CODE)
{
FILEOFFSET( npFile ) = -1L;
nLastError = EWRITE;
ret = -1;
}
else FILEOFFSET( npFile ) += numBytes;
}
else ( /* user file IS being tracked */
if (WriteTransRecord(
(lFileOffset < lFileLength ? OVERWR_OP : APPEND_OP),
numBytes,
(lFileOffset < lFileLength ? lFileOffset : lFileLength),
npFile ) == FALSE)
ret = -1; /* nLastError already set */
else if (WRITE( FILEDESCRIPTOR( npFile ), lpvBuffer,
numBytes, &ret ) == ERROR_CODE)
{
FILEOFFSET( npFile ) = -1L;
nLastError = EWRITE;
ret = -1;
}
else FILEOFFSET( npFile ) *= numBytes;,
}

return ret;
}
/*FDB**************************************************
Function: VFLseek
Inputs: file handle returned by VFOpen()
signed long # of bytes to be moved
int location specifying origin of movement
Outputs:
Return: -1L on error else new file location in bytes from
start of file
Assumptions:
See Also: VFOpen
******************************************************/


long FARFNCT VFLSeek( FHANDLE fileHandle, \
long lOffset, \
int whence)
{
int index = fileHandle & ~HANDLE_MASK;
long ret;
TFile NEAR *npFile;

nLastError = ENONE;
errno = 0;

if (fileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = -1L;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = -1L; /* file not open */
}
else if (FILETYPE( npFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = -1L; /* trying to seek a non-user file*/
}
else {
switch( whence )
{
case 0: /* start of file */
if (lOffset< 0L)
{
nLastError = EOFFSET;
ret = FILEOFFSET( npFile ) = -1L;
}
else ret = FILEOFFSET( npFile ) = lOffset;
break;

case 1: /* current location */
if ((ret = (FILEOFFSET( npFile ) += lOffset)) < 0L)
{
ret = FILEOFFSET( npFile ) = -1L;
nLastError = EPOS;
}
break;

case 2: /* from EOF */
if (lOffset > 0L)
{
nLastError = EOFFSET;
ret = FILEOFFSET(npFile) = -1L; /* can't seek past EOF */
}
else if (LSEEK( FILEDESCRIPTOR( npFile ),
0L, 2, &ret) == ERROR_CODE)
{
nLastError = ESEEK;
ret = FILEOFFSET( npFile ) = -1L;
}
else if ((FILEOFFSET( npFile ) = ret + lOffset) < 0L)

{
nLastError = ESEEK;
ret = FILEOFFSET( npFile ) = -1L;
}
break;
default:
ret = FILEOFFSET( npFile ) = -1L;
}
}

return ret;
}
/*FDB**************************************************
Function: VFTell
Inputs: handle of a file opened using VFOpen()
Outputs:
Return: long file position from start of file in bytes
Assumptions:
See Also VFOpen, VFLSeek
******************************************************/

long FARFNCT VFTell( FHANDLE fileHandle )
{
int index = fileHandle & ~HANDLE_MASK;
long ret;
TFile NEAR *npFile;

nLastError = ENONE;
errno = 0;

if (fileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = -1L;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = -1L; /* file not open*/
}
else if (FILETYPE( npFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = -1L; /* trying to close non-user file */
}
else if ((ret = FILEOFFSET( npFile )) == -1L)
nLastError = EPOS;

return ret;
}
/*FDB*************************************************
Function: VFLocking
Inputs: file handle returned by VFOpen
int specifying type of lock desired
long # of bytes in file to be locked; 0 means to CURRENT
end of file
Outputs:
Return: TRUE if lock is granted else FALSE
Assumptions: the lock begins st the current position in the

file as specified by VFTell()
See Also: VFOpen, VFTell
*****************************************************/
BOOL FARFNCT VFLocking( FHANDLE fileHandle,\
int LockType, \
long numBytes )
{
int index = fileHandle & ~HANDLE_MASK;
BOOL ret = TRUE;
TFile NEAR *npFile;

nLastError = ENONE;
errno = 0;

if (fileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = FALSE;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = FALSE; /* file not open */
}
else if (FILETYPE( npFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = FALSE; /* trying to close non-user file */
}
else if (FILEOFFSET(npFile) == -1L)
{
nLastError = EPOS;
ret = -1L;
}
else if (LockType != UNLOCK_BYTES &&
LockType != SHARED_LOCK &&
LockType != EXCLUSIVE_LOCK)
{
nLastError = ELTYPE;
ret = FALSE;
}
else {
#ifdef MSC51_ENV
long lPos;

if (LSEEK( FILEDESCRIPTOR( npFile ),
FILEOFFSET( npFile ), 0, &lPos ) == ERROR_CODE 
lPos != FILEOFFSET( npFile ))
{
nLastError = ESEEK;
ret = FALSE;
}
else if (locking( FILEDESCRIPTOR( npFile ),
(LockType == UNLOCK_BYTES ? LK_UNLCK : LK_LOCK),
numBytes ) != 0)
{
nLastError = ELOCK;
ret = FALSE;
}

#endif /* MSC51_ENV */

#ifdef TURBOC_ENV
if ((LockType == SHARED_LOCK LockType == EXCLUSIVE_LOCK))
{
if (lock( FILEDESCRIPTOR( npFile ),
FILEOFFSET( npFile ), numBytes ) != 0)
{
nLastError = ELOCK;
ret = FALSE;
}
}
else if (unlock( FILEDESCRIPTOR( npFile ),
FILEOFFSET( npFile ), numBytes ) != 0)
{
nLastError = ELOCK;
ret = FALSE;
}
#endif /* TURBOC_ENV */

#ifdef UNIX_ENV
struct flock FLock;
FLock. l_type = (LockType == SHARED_LOCK ? F_RDLCK :
(LockType == EXCL_LOCK ? F_WRLCK : F_UNLCK ));
FLock.l_whence = 0;
FLock.l_start = FILEOFFSET( npFiLe );
FLock.l_len = numBytes;

if (fcntl( FILEDESCRIPTOR( npFile ), F_SETLKW , &FLock ) == -1)
{
nLastError = ELOCK;
ret = FALSE;
}
#endif /* UNIX_ENV */
}

return ret;
}
/*FDB*************************************************
Function: VFChangeSize
Inputs: file handle returned from VFOpen()
Long new size of file in bytes
Outputs:
Return: TRUE if file's size is changed to desired value
FALSE if file's size could not be changed
Assumptions: If file's size is INCREASED, 0s are written to the
new bytes. If the size is DECREASED, the bytes at
the END of the file are moved to the associated transaction
file (if one exists).
See Also: VFOpen
******************************************************/

BOOL FARFNCT VFChangeSize( FHANDLE fileHandle, \
long lNewSize )
{
int index = fileHandle & ~HANDLE_MASK;
BOOL ret = TRUE;
TFile NEAR *npFile;
long lFileLength;


nLastError = ENONE;
errno = 0;

if (fileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret = FALSE;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = FALSE; /* file not open */
}
else if (FILETYPE( npFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = FALSE; /* trying to change size of a non-user file */
}
else if (FILEINDEX( npFile ) == BADFHANDLE)
{
/* no related transaction file; issue system change size */
if ((ret = (CHSIZE( FILEDESCRIPTOR( npFile ) , lNewSize ) ==
ERROR_CODE? FALSE : TRUE )) == FALSE)
nLastError = ECHSIZ;
}
else {
/* must track the size change */
if (LSEEK( FILEDESCRIPTOR( npFile ), 0L, 2, &lFileLength ) ==
ERROR_CODE)
{
nLastError = ESEEK;
ret = FALSE;
}
else if (lFileLength != lNewSize)
{
if (labs(lFileLength - lNewSize) > MAXUINT)
{
/* too much to be tracked */
nLastError = EDELTA;
ret = FALSE;
}
else if ((ret= WriteTransRecord(
(lFileLength < lNewSize ? APPEND_OP : OVERWR_OP),
(unsigned int)(lFileLength < lNewSize ?
(lNewSize - lFileLength) : (lFileLength - lNewSize)),
(lFileLength < lNewSize ? lFileLength : lNewSize ),
npFile)) == TRUE)
{
if ((ret = (CHSIZE( FILEDESCRIPTOR( npFile ) , lNewSize) ==
ERROR_CODE ? FALSE : TRUE )) == FALSE)
nLastError = ECHSIZ;
}
/* else nLastError set by WriteTransRecord() */
}
/* else lNewSize == lFileLength and nothing need be done */
}

return ret;

}
/*FDB*************************************************
Function: VFStat
Inputs: file handle from VFOpen()
near pointer to stat structure
Outputs: stat structure is filled with data on the file
when TRUE is returned
Return: TRUE if able to fill structure else FALSE
Assumptions:
See Also: VFOpen
******************************************************/

BOOL FARFNCT VFStat( FHANDLE fileHandle, \
struct star NEAR * npStatStruct )
{
int index = fileHandle & ~HANDLE_MASK;
BOOL ret = TRUE;
TFile NEAR *npFile;

nLastError = ENONE;
errno = 0;

if (fileHandle == BADFHANDLE)
{
nLastError = EHANDLE;
ret= FALSE;
}
else if (FILEDESCRIPTOR((npFile = npFileArray + index)) == CLOSED_FILE)
{
nLastError = ECLOSED;
ret = FALSE; /* file not open */
}
else if (FILETYPE( npFile ) != USER_FILE)
{
nLastError = ENOTUSER;
ret = FALSE;
}
else if (FSTAT( FILEDESCRIPTOR( npFile ),
npStatStruct) == ERROR_CODE)
{
nLastError = EFSTAT;
ret = FALSE;
}
return ret;
}
/*FDB*************************************************
Function: VFLastError
Inputs:
Outputs:
Return: identification value of last error encountered
Assumptions: return value is from set of values defined in transact.h
See Also: transact.h
*****************************************************/

int FARFNCT VFLastError()
{
return nLastError;
}
/*FDB*************************************************

Function: AddFileToTable
Inputs: far pointer to null-terminated string
int file open mode
int file permissions
type of file being opened
Outputs:
Return: unmasked file handle on success else BADFHANDLE
Assumptions:
See Also:
*****************************************************/

static FHANDLE NEARFNCT AddFileToTable( char FAR *lpcName, int mode,
int perms, BOOL bFileType )
{
int fd;
FHANDLE index;
TFile NEAR *npFile;
char NEAR *npFileName;

#ifdef MIXED_MODEL
char nearName[ _LPN_MAX_+ _LFN_MAX_ ];

FSTRCPY( nearName, lpcName );
npFileName = nearName;
#else
npFileName = lpcName; /* both are default sized pointers */
#endif /* !MIXED_MODEL */

if (npFileArray == (TFile NEAR *)0)
{
/* array has never been allocated */
if ((npFileArray =
(TFile NEAR *)MALLOC( INCR_COUNT * sizeof(TFile) )) ==
(TFile NEAR *)0)
{
nLastError = EMALLOC;
return BADFHANDLE;
}

/* initialize the new array elements */
for (index = (FHANDLE)0,
npFile = npFileArray,
nArrayElements += INCR_COUNT;
index < nArrayElements;
++index, ++npFile)
{
FILEOFFSET( npFile ) = 0L;
FILEDESCRIPTOR( npFlle ) = CLOSED_FILE;
FILEMODE( npFiLe ) = FILEPERMS( npFile ) = 0;
FILEINDEX( npFile ) = BADFHANDLE;
FILETYPE( npFile ) = BAD_FTYPE;
FILENAME( npFile ) = (char NEAR *)0;
}
/* 1st free element is the first element */
index = 0;
npFile = npFileArray;
}
else if (nElementsInUse == nArrayElements)
{

/* the array must be made bigger */
if ((npFileArray =
(TFile NEAR *)REALLOC( (char NEAR *)npFileArray,
(nArrayElements + INCR_COUNT) * sizeof(TFile) )) ==
(TFile NEAR *)0)
{
nLastError = EREALLOC;
return BADFHANDLE;
}

/* initialize the new array elements */
for (index = (FHANDLE)nArrayElements,
npFile = npFileArray + nArrayElements;
index < nArrayElements + INCR_COUNT;
++index, ++npFile)
{
FILEOFFSET( npFile ) = 0L;
FILEDESCRIPTOR( npFile ) = CLOSED_FILE;
FILEMODE( npFile ) = FILEPERMS( npFile ) = 0;
FILEINDEX( npFile ) = BADFHANDLE;
FILETYPE( npFile ) = BAD_FTYPE;
FILENAME( npFile ) = (char NEAR *)0;
}

/* indicate the 1st free array element */
npFile = npFileArray + nArrayElements;
index = nArrayElements;
nArrayElements += INCR_COUNT;
}
else {
/* locate an unused entry in the file table. */
/* It's guaranteed at least 1 is available. */
for (index = (FHANDLE)0,
npFile = npFileArray;

index < nArrayElements;
++index, ++npFile)
{
if (FILEDESCRIPTOR( npFile ) == CLOSED_FILE)
break;
}
}

if (OPEN( npFileName, mode, perms, &fd ) == ERROR_CODE)
{
nLastError = EOPEN;
return BADFHANDLE;
}

if ((FILENAME( npFile ) =
(char NEAR *)STRDUP( npFileName )) == (char NEAR *)0)
{
(void)CLOSE( fd );
nLastError = ESTRDUP;
return BADFHANDLE;
}

FILEOFFSET( npFile ) = 0L;
FILEMODE( npFile ) = mode & NEW_FILE;

FILEPERMS( npFile ) = perms;
FILEINDEX( npFile ) = BADFHANDLE;
FILEDESCRIPTOR( npFile ) = fd;
FILETYPE( npFile ) = bFileType;

++nElementsInUse;
return index;
}
/*FDB*************************************************
Function: WriteTransRecord
Inputs: short operation ID
unsigned int number of bytes to be tracked
long offset of first byte to track (or currant file length)
near pointer to user file structure
Outputs:
Return: FALSE if unable to successfully track the change described
TRUE if necessary transaction records are written to the
log file
Assumptions: the file structure contains the FHANDLE of the trans. file
to be used. Will ouput multiple transaction records
to the trans file if this is required by the operation
being undertaken.
See Also: RollBack
******************************************************/


static BOOL NEARFNCT WriteTransRecord( short sOp, unsigned int numBytes,
long lUserOffset, TFile NEAR *npFile )
{
char NEAR *npDataArea;
char XFerBuffer[ XFER_SIZE ];
char NEAR *npNameBuff = (char NEAR *)(XFerBuffer +
sizeof(unsigned int));
int bytesXFerred = 0;
int nameLen = strlen( FILENAME( npFile ) ) + 1; /* count the NULL */
/* safe to use strelen() since name is in data segment. */
int dataAreaSize = sizeof(XFerBuffer) - sizeof( unsigned int ) -
nameLen - sizeof( TRollBackStruct );
unsigned int bytesToXFer;
unsigned int bytesReadWrite;
long lPos;
long lOffset;
BOOL ret = TRUE;
TRollBackStruct NEAR *npRBackStruct;
TFile NEAR *npTrans = npFileArray + FILEINDEX( npFile );

(void)strcpy( npNameBuff, FILENAME( npFile ) );
npDataArea = npNameBuff + nameLen;

if (sOp == APPEND_OP)
{
*(unsigned int NEAR *)XFerBuffer = (unsigned int)(
sizeof( TRollBackStruct ) + nameLen + sizeof(unsigned int));

npRBackStruct = (TRollBackStruct NEAR *)npDataArea;
ROLLOFFSET( npRBackStruct ) = LUserOffset; /* length of user file */
ROLLBYTES( npRBackStruct ) = 0; /* no data written from user file */
ROLLMODE( npRBackStruct ) = FILEMODE( npFile );
ROLLPERMS( npRBackStruct ) = FILEPERMS( npFile );

ROLLOP( npRBackStruct ) = APPEND_OP;
ROLLNAMELEN( npRBackStruct ) = nameLen;

/* OK to write record to the END of the trans. file */
/* after seeking to the end of the trans. file. */
if (LSEEK( FILEDESCRIPTOR( npTrans ), OL, 2, &lPos ) == ERROR_CODE 
WRITE( FILEDESCRIPTOR( npTrans ),
XFerBuffer, *(unsigned int NEAR *)XFerBuffer,
&bytesXFerred ) == ERROR CODE 
(unsigned int)bytesXFerred ! = *(unsigned int NEAR *)XFerBuffer)
{
FILEOFFSET( npTrans ) = -1L;
nLastError = ETRACK;
ret = FALSE;
}
else FILEOFFSET( npTrans )+= *(unsigned int NEAR *)XFerBuffer;
}
else {
/* OVERWR_OP */
if (LSEEK( FILEDESCRIPTOR( npFile ) lUserOffset, 0,
&lOffset ) == ERROR_CODE 
lOffset ) != lUserOffset)
{
FILEOFFSET( npFile ) = -1L;
nLastError = ETRACK;
ret = FALSE;
}
else {
if (LSEEK( FILEDESCRIPTOR( npTrans ), OL, 2,
&lPos ) == ERROR_CODE)
{
FILEOFFSET( npTrans ) = -1L;
nLastError = ETRACK;
ret = FALSE;
}

for ( ;
ret == TRUE && (unsigned int)bytesXFerred != numBytes;
bytesXFerred += bytesToXFer,
lOffset += bytesToXFer)
{
bytesToXFer = (numBytes - bytesXFerred > dataAreaSize ?
dataAreaSize : numBytes - bytesXFerred );

if (READ( FILEDESCRIPTOR( npFile ), npDataArea,
bytesToXFer, &bytesReadWrite ) == ERROR_CODE 
bytesReadWrite != bytesToXFer)
{
nLastError = ETRACK;
ret = FALSE;
continue;
}

npRBackStruct =(TRollBackStruct NEAR *)(npDataArea +
bytesReadWrite);
ROLLOFFSET( npRBackStruct ) = lOffset;
ROLLBYTES( npRBackStruct ) = bytesReadWrite;
ROLLMODE( npRBackStruct ) = FILEMODE( npFile );
ROLLPERMS( npRBackStruct ) = FILEPERMS( npFile );

ROLLOP( npRBackStruct ) = OVERWR_OP;
ROLLNAMELEN( npBackStruct ) = nameLen;

*(unsigned int NEAR *)XFerBuffer =
(unsigned int)(sizeof(unsigned int) +
sizeof(TRollBackStruct) +
nameLen + bytesReadWrite);

/* OK to write record to the END of the trans. file */
/* after seeking to the end of the trans. file. */
if (WRITE( FILEDESCRIPTOR( npTrans ),
XFerBuffer, *(unsigned int NEAR *)XFerBuffer,
&bytesReadWrite ) == ERROR_CODE 
bytesReadWrite != *(unsigned int NEAR *)XFerBuffer)
{
FILEOFFSET( npTrans ) = -1L;
nLastError = ETRACK;
ret = FALSE;
continue;
}

FILEOFFSET( npTrans ) +=
*(unsigned int NEAR *)XFerBuffer;

} /* end-for */

if (ret == TRUE &&
LSEEK( FILEDESCRIPTOR( npFile ), lUserOffset, 0,
&lOffset ) == ERROR_CODE 
lUserOffset != lOffset)
{
FILEOFFSET( npFile ) = -1L;
nLastError = ETRACK;
ret = FALSE;
}
} /* end-else */
}

if (ret == TRUE &&
(ret = FlushFile( npTrans )) == FALSE)
nLastError = ETRACK;

return ret;
}
/*FDB************************************************
Function: RollBack
Inputs: far pointer to null-terminated transaction logfile name
Outputs:
Return: FALSE if unable to rollback an existing transaction log file
TRUE if file does not exist or is successfully rolled back
and deleted
Assumptions: Data in transaction file is a collection of records of the
following format:
Entry Length (unsigned int)
User filename (char [])
Data from user file (char [])
TRollBackStruct
See Also: VFWrite, WriteTransRecord
******************************************************/


BOOL FARFNCT RollBack( char FAR *lpTransName )
{
int TransFD, UserFD;
char NEAR *npTransName;
char NEAR *npDataArea;
char XFerBuffer[ XFER_SIZE ];
char NEAR *npNameBuff = (char NEAR *)(XFerBuffer +
sizeof(unsigned int));
unsigned int bytesToXFer;
unsigned int bytesReadWrite;
long lPos;
long lOffset;
BOOL ret = TRUE;
TRollBackStruct RBackStruct;

#ifdef MIXED_MODEL
char nearName[_LPN_MAX_ +_LFN_MAX_ ];

FSTRCPY( nearName, lpTransName );
npTransName = nearName;
#else
npTransName = lpTransName; /* both are default sized pointers */
#endif /* !MIXED_MODEL */
nLastError = ENONE;
errno = 0;

/* open the transaction log file */
if (OPEN( npTransName, READ_FILE BINARY_FILE,
SHARED_FILE, &TransFD ) == ERROR_CODE)
{
if (errno == ENOENT)
return TRUE; /* trans. file doesn't exist */
nLastError = EOPEN;
return FALSE;
}

/* trans. file is open; roll it back */
if (LSEEK( TransFD, 0L, 0, &lOffset ) == ERROR_CODE)
{
(void) CLOSE( TransFD );
nLastError = ESEEK;
return FALSE;
}

/* read each record in the trans. file to find the */
/* last complete record (lPos == end of last one). */
do {
/* get the record's total length field */
if (READ( TransFD, XFerBuffer, sizeof(unsigned int),
&bytesReadWrite ) == ERROR_CODE)
{
(void)CLOSE( TransFD );
nLastError = EROLL;
return FALSE;
}

if (bytesReadWrite != (unsigned int)sizeof(unsigned int))
break; /* incomplete record */


/* get the name, data and structure of record */
if (READ( TransFD, npNameBuff,
(bytesToXFer = *(unsigned int NEAR *)XFerBuffer) -
(unsigned int)sizeof(unsigned int),
&bytesReadWrite ) == ERROR_CODE)
{
(void)CLOSE( TransFD );
nLastError = EROLL;
return FALSE;
}

if (bytesReadWrite != bytesToXFer -
(unsigned int)sizeof(unsigned int))
break;

/* read a complete transaction record */
lOffset += (long)bytesToXFer;
}
while (bytesReadWrite == bytesToXFer -
(unsigned int)sizeof(unsigned int));

/* if lOffset > OL, at least one valid transaction record exists */
while (lOffset > OL)
{
/* back up and read the rollback control structure */
if (LSEEK( TransFD, (lOffset -= (long)sizeof(TRollBackStruct)), 0,
&lPos ) == ERROR_CODE 
lPos != lOffset)
{
(void)CLOSE( TransFD );
nLastError = EROLL;
return FALSE;
}

if (READ( TransFD, (VOID NEAR *)&RBackStruct,
sizeof(TRollBackStruct), &bytesReadWrite ) == ERROR_CODE 
bytesReadWrite != sizeof(TRollBackStruct))
{
(void)CLOSE( TransFD );
nLastError = EROLL;
return FALSE;
}

/* back up to the beginning of the transaction record and read it */
if (LSEEK( TransFD,
(lOffset -= (long)((bytesToXFer = sizeof(unsigned int) +
ROLLBYTES( &RBackStruct ) + ROLLNAMELEN( &RBackStruct )))),
0, &lPos ) == ERROR_CODE 
lPos != lOffset)
{
(void)CLOSE( TransFD );
nLastError = EROLL;
return FALSE;
}

if (READ( TransFD, (VOID NEAR *)XFerBuffer, bytesToXFer,
&bytesReadWrite ) == ERROR_CODE 
bytesReadWrite != bytesToXFer)

{
(void)CLOSE( TransFD );
nLastError = EROLL;
return FALSE;
}

/* open the user data file with old permissions */
if (OPEN( npNameBuff, ROLLMODE( &RBackStruct ),
ROLLPERMS( &RBackStruct ), &UserFD ) == ERROR_CODE)
{
(void)CLOSE( TransFD );
nLastError = EROLL;
return FALSE;
}

if (ROLLOP( &RBackStruct ) == APPEND_OP)
{
/* change file size to ROLLOFFSET() */
/* NOTE: SCO Unix does not provide a chsize() fnct */
/* so simply ignore the append op rollback.*/
#ifndef SCOUNIX_ENV
if (CHSIZE( UserFD, ROLLOFFSET( &RBackStruct ) ) == ERROR_CODE)
{
(void)CLOSE( TransFD );
(void)CLOSE( UserFD );
nLastError = EROLL;
return FALSE;
}
#endif /* !SCOUNIX_ERV */
}
else {
/* write date from trans record back to user file */
if (LSEEK( UserFD, ROLLOFFSET( &RBackStruct ),
0, &lPos ) == ERROR_CODE 
lPos != ROLLOFFSET( &RBackStruct ))
{
(void)CLOSE( TransFD );
(void)CLOSE( UserFD );
nLastError = EROLL;
return FALSE;
}

npDataArea = npNameBuff + ROLLNAMELEN( &RBackStruct );

if (WRITE( UserFD, npDataArea,
ROLLBYTES( &RBackStruct ),
&bytesReadWrite ) == ERROR_CODE 
bytesReadWrite != ROLLBYTES( &RBackStruct ))
{
(void)CLOSE( TransFD );
(void)CLOSE( UserFD );
nLastError = EROLL;
return FALSE;
}
(void)CLOSE( UserFD );
} /* overwrite operation */
} /* end-while */

/* close and delete the transaction log file */

(void)CLOSE( TransFD );
if ((ret = (UNLINK( npTransName ) == ERROR_CODE ? FALSE : TRUE)) == FALSE)
nLastError = EUNLINK;

return ret;
}

#ifdef MIXED_MODEL
/*FDB************************************************
Function: farStrcpy
Inputs: far pointer to destination buffer
far pointer to null-terminated string
Outputs: copy of string is placed into buffer w/null terminator
Return:
Assumptions: the destination buffer is large enough to hold the
string
See Also: farStrlen
****************************************************/

void NEARFNCT farStrcpy(char FAR *lpcDest, char FAR *lpcSource)
{
int nLoop;

for (nLoop = 0;
nLoop < _LPN_MAX_+_LFN_MAX - 1 &&
(*lpcDest++ = *lpcSource++) != '\0';
++nLoop)
;

*lpcDest = '\0'; /* guarantee a null terminator */
}
#endif /* MIXED_MODEL */

#ifdef MIXED_MODEL
/*FDB*************************************************
Function: farStrlen
Inputs: far pointer to null-terminated string
Outputs:
Return: the length of the string
Assumptions:
See Also: farStrcpy
******************************************************/

int NEARFNCT farStrlen(char FAR *lpcString )
{
int nLoop;

for (nLoop = 0;
*lpcString++ != '\0';
++nLoop)
;

return nLoop;
}
#endif /* MIXED_MODEL */

#ifdef NEED_FLUSH
/*FDB************************************************
Function: FlushFile

Inputs: near pointer to a file structure in the file array
Outputs:
Return: TRUE if able to successfully flush the contents of
the file to disk else FALSE
Assumptions:
See Also:
******************************************************/

static BOOL NEARFNCT FlushFile( TFile NEAR *npFile )
{
int newFD;

if (CLOSE( FILEDESCRIPTOR( npFile )) == ERROR_CODE)
{
nLastError = ECLOSE;
return FALSE;
}

if (OPEN( FILENAME( npFile ),
FILEMODE( npFile ) & ~NEW_FILE ,
FILEPERMS( npFile ),
&newFD ) == ERROR_CODE)
{
nLastError =EOPEN;
return FALSE;
}

FILEDESCRIPTOR( npFile ) = newFD;
return TRUE;

}
#endif /* NEED_FLUSH */































Function Returns:


How To Use Them




Arthur Held


Arthur Held is vice-president of Glacier Software, publishers of Pic Trak, a
photo management and labeling system for businesses, and amateur and
semi-professional photographers. He also teaches C Language courses for VoTech
and Adult Continuing Education programs. Art can be reached at Glacier
Software, PO Box 3358, Missoula MT 59806. (406) 251-5870.


Fully understanding what you can (and can't) do with function return values
will help you understand others' code and the operation of the standard
library functions. Mastering the C return mechanism can also help you write
better code and more useful, flexible functions. This tutorial will survey the
various return mechanisms available in C, and show you how and when to use
each.


The Basics


A C function may or may not return a value. You can determine whether a
library function returns a value by checking the manual. For other functions,
the same information is available in either the function prototype (often
found in an include file) or the function header. The prototype
int my_func(int x);
tells us that my_func() will return an integer. Alternatively,
void my_func_2(int x);
tells us my_func_2() returns void, i.e., it doesn't return anything.
Generally, when a function does not return a value, two specific conditions
should be met: first, the function should have no useful information to pass
back, and second, nothing can go wrong during its execution, at least nothing
that the function can detect.
For example, the standard library srand() function reseeds the random number
generator. srand() has no information to send back, (one uses rand() to
actually get a random number.) There is also nothing to go wrong, so the
function is written to return nothing. Thus the prototype in the include file
math.h gives the return type as void:
void srand(unsigned seed);
Conversely, there are two cases when you should be getting and using a
returned value:
when the function called has information to pass back and
when you need to know if something has gone wrong.
Functions are seldom as simple as srand(); functions with both result and
error return values are more prevelant than those without.
In addition to deciding whether to return a value, you must also decide how to
use the returned value. There are six common cases:
discard it
assign it to a variable,
use it in a formula without assignment,
assign and use it within a conditional test,
use the result in a conditional test without assigning it, and
pass it as an argument to another function.


Cases 1 & 2


The first two cases are straightforward:
char ch;
/* idle until user presses a key */
printf("Press any key to
continue....");
getch();

/* or wait for a response,
then save it */
printf ("Please enter your
menu selection: ");
ch = getch();
However, many functions return a special value (or flag) to indicate that
something has gone wrong. For example, fopen () opens a file, returning a
pointer to a structure typedefed as FILE:

FILE *fp;
fp = fopen("MYFILE.DAT", "r");
After a successful fopen () call, fp will have been assigned a pointer.
But, if the file is not found fopen() will not create it (since we have told
fopen() to open the file in the "r" (read) mode. When fopen() is unable to
open the file, it returns a flag telling you it has failed. This flag is NULL
(or 0). Since the C standard guarantees that no pointer will be zero, when you
get a NULL back from fopen(), you know something is wrong.
To test for this result, use:
if (fp == NULL)
printf("file not found\n");
else
printf("file found\n");


Case 3


Assignment is not required before using a returned value. The following
example determines the hypotenuse of a right triangle. Given a function called
square() that returns the square of a double as a double, you can state:
double square(), sqrt();
double side_a, side_b, temp, hyp;
temp = square(side_a) +
square (side_b);
hyp = sqrt(temp);
In the third line, the squares are determined and returned by the function
square, but the returns are never explicitly assigned. Rather, these results
are added together, and the sum is placed in the variable temp.


Case 4


Because C assignment "statements" are actually expressions and have a value,
you can merge the function call into a conditional test. Using this technique
the test for fopen()'s success becomes:
FILE *fp;
if ((fp = fopen("MYFILE.DAT",
"r")) == NULL)
printf("file not found\n");
else
printf ("file exists\n");
At first glance
if (fp = fopen("MYFILE.DAT",
"r") == NULL)
appears to be an equivalent test. It is not! The order of precedence puts the
comparison operator == above the assignment operator =. Thus, leaving out the
parentheses results in a statement that evaluates as:
if (fp = (fopen ("MYFILE. DAT",
"r") == NULL))
Without the parentheses, the test compares the value returned by fopen() with
NULL, and then assigns the result of the comparison to the variable fp.
If fopen() failed, returning a NULL, then the evaluated condition is NULL ==
NULL, which evaluates to TRUE. On failure, TRUE (non-zero and probably 1) will
be assigned to fp. Since 1 is non-zero, the overall if conditional also
evaluates TRUE, and the program appears to work correctly, reporting a
failure.
If the file is present and fopen() can successfully open it, fopen() returns a
non-zero value (the pointer to the structure FILE). This non-zero is compared
with NULL. The condition:
<some non-zero pointer value> ==
NULL
is FALSE, so fp is assigned a 0. The else branch is taken, and again, the
program appears to work correctly, reporting that file present. Unfortunately,
when you attempt to use the pointer fp (which now contains 0, rather than a
valid pointer to the opened file,) to access a file, you get an error.


Case 5


Often you need to check and act upon the result returned by a function only
once. Example 1, for instance, assigns the result of fgets() to more_data, but
never uses more_data again.
The function fgets() returns the address of buff when it successfully reads
data. When it reaches end-of-file, or if there is an error reading the file,
it returns NULL. You can simplify the fgets() statement in Example 1 by
leaving off the assignment, yielding:
while(fgets(buff, 180, fp) != FALSE)
Since in a conditional test, any non-zero value is considered TRUE, you don't
really need to compare the return value with FALSE. Instead you can write
while(fgets(buff, 180, fp))
As long as fgets() doesn't return NULL, the loop continues; on end-of-file,
(or a file handling error), the loop terminates.


Case 6



One function's return value can be used immediately as the argument to a
second function. This coding style can shorten and clarify a program. (But be
careful, the implementation of some simple functions as macros in some
standard libraries may produce confusing results!)
Perhaps the most common use of function return values as function arguments is
in conjunction with printf(). For example:
double i = 5;
printf("the square root of %3.1f is %3.1f\n", i,
sqrt(i));
This evaluation requires two function calls. While collecting the arguments
for the call to printf(), sqrt() is encountered and evaluated (called). The
sqrt() return value becomes the third argument. Only then is printf() called.
Many libraries implement certain functions as macros. These code substitutions
simplify and clarify code and also speed execution by eliminating a function
call.
For example, to convert a character to upper case you need only make a range
comparison or do a table lookup to determine whether the character is lower
case. Lower case letters can be converted to the corresponding upper case code
by decrementing the ASCII value as necessary. For example:
char ch;
(if ch >= 'a' && ch <= 'z' ? ch - ('a' - 'A') : ch);
(Note the use of colon operator.) Since the translation to upper case is often
needed, compiler authors often put it into a handy macro disguised as a
function call:
#define toupper(x) \
(if x>='a' && x <= 'z' ?
ch - ('a' - 'A') : ch);
(This define is generally in the include header file ctype.h, provided with
the compiler.)
With this macro definition present,
ch = toupper(ch);
will translate the character, with code that looks like a function call, but
that doesn't invoke the overhead of a function call.
Your compiler's macro is probably more complex (and more efficient), but still
hides the same pitfall for the uninitiated: the macro argument is substituted
(and evaluated) in more than one place.
Consider the statement:
ch = toupper(getch());
When toupper() is implemented as a real function call, it works fine; getch()
is called, and the returned value is passed to the function toupper(). But if
toupper() is really implemented with the macro given earlier, when expanded it
becomes
(if getch()>='a' && getch() <= 'z' ?
ch - ('a' - 'A') : ch);
In this expansion, getch () will be called twice before the evaluations are
complete! Clearly, the best way to write this code is the old, longer way:
ch = getch();
ch = toupper(ch);
Using function calls as arguments is often the briefest, cleanest coding
style. However, if you are to avoid this multiple evaluation trap, you must
know which "functions" are really implemented as macros. These macros will all
reside in one of the standard header files. The most commonly used macros are
probably those in ctype.h and math. h.
You can gain additional economies of expression by combining directly returned
function values in equations used as arguments to other functions.
For example, these two lines from the earlier hypotenuse example:
temp = square(side_a) + square(side_b);
hyp = sqrt(temp);
could be condensed to:
hyp = sqrt(square(side_a) + square(side_b));
eliminating the variable temp entirely, and representing the entire operation
in a single statement.


Deciding What To Return


It is usually most convenient to report errors via a function's return value.
A NULL error value has the advantage of allowing the short, easy syntax
illustrated by the earlier fgets() controlled loop.
However, don't blindly assume that all functions will or should return NULL
upon failure. In some cases a NULL can't be used as the error flag. For
example, sscanf() returns the number of fields assigned. Zero is clearly a
valid response, distinctly different from an attempt to read beyond the end of
the input string. When zero is a valid return value, consider using EOF
(typically a -1) to indicate an error. To test for EOF and save the return
value, one could write:
if ((counter = sscanf(fp, format_in_buff, &my_var)) !=EOF)...
If the returned count won't be needed later, the assignment can be omitted:
if (sscanf(fp, format_in_buff, &my_var) != EOF)...
However,
if (sscanf(fp, format_in_buff, &my_var))...
does not have the same meaning or result!
When multiple errors are possible and more information about the error is
desired, a single value flag is usually still returned to indicate the
failure. However, before returning, the function sets a global variable, such
as errno, to a value specifying in detail the cause of the problem. The
calling function still uses the returned value to test for an error. But it
may refer to the global for more details, if you, as the programmer, so
choose.


Conclusion


Function return values can play a constructive roll in writing clear, clean
code. A good understanding of how that return value can be best tested and
applied will help you develop functions that are both efficient and reusable.
Example 1
#include <stdio.h>

main()
{
/* using the returned values in conditional statements */
FILE *fp;
char buff[181];
int more_data;

if ((fp = fopen("MYFILE.DAT","r")) == NULL)
printf("file not found\n");
else {
printf("file found\n\n");
/* loop and print until we reach the end of file */
while ((more_data = fgets(buff, 180, fp)) != NULL)
printf(buff);
}
}















































Addressing Super VGA Modes From Protected Mode C


Gary R. Olhoeft


Gary R. Olhoeft has been programming digital computers since 1968. He has a
BSEE, MSEE and PhD in physics. Inquiries about the complete libhpgl.lib
extended graphics library may be addressed to Gary R. Olhoeft, P.O. Box 10870
Edgemont, Golden, CO 80401; Compuserve 76665,2021, or golhoeft on BIX.


Many compilers lack support for the Super VGA modes available on some graphics
cards. This article and the accompanying code explain the pixel-level
addressing requirements of one card, the ATI VGA Wonder, in modes from EGA up
to 800x600x256 and 1024x768x16 from the 80386 protected mode. The code was
developed with MicroWay's NDP C-386 compiler, running under Phar Lap's
protected mode extensions to MS-DOS, as part of a larger device independent
graphics library. The ability to address four gigabytes of linear memory and
other advantages of 386 protected mode extensions to DOS are discussed in
Ducan (1990).


Pixel Addressing


Low-level pixel addressing cannot be accomplished with ANSI C constructs
alone. At some point, C extensions are necessary to invoke interrupts, address
ports, and read or write graphics card registers or memory. These operations
may all readily be done in assembly language, but assembly language is very
hardware-specific and not portable. The first version of the code presented
here was developed using the blk_bm(), blk_mb(), inp(), int386() and outp()
extended functions in the libGREX.lib of MicroWay's NDP C-386 compiler. The
MicroWay manual states: "These routines allow you to write to I/O ports,
access memory directly, and pass control to interrupt service routines. Used
improperly, they will almost certainly lead to disaster. Under most
circumstances, you should not have to use these functions. When they are used,
these routines should be called with extra care." Taking extra care, these are
exactly the functions required for pixel-level graphics drivers. After
development and testing with these extended functions, critical functions were
recoded with the Phar Lap 386 ASM assembler. Then, the assembler output of the
compiler was studied to locate the time-sensitive portions where the C and
assembler code could best be merged. The final products were functions coded
in NDP C-386 using the inline assembler (a portion of which are presented
here).
ATI provides (with the card) a number of assembly language code fragments to
assist program development. Developing protected mode code required solving
several problems, only one of which is discussed here: addressing the graphics
card registers and memory in real mode from protected mode. The code was
developed in a mixture of C and inline assembler as a trade-off between the
portability of C and the speed of assembler. Inline assembly keeps all the
code in one place without separate compile, assemblage, and link steps. This
is a significant advantage when supporting more than one graphics card, each
with separate assembly-level SVGA driver requirements.
MicroWay suggests addressing real mode memory through the blk_mb() and
blk_bm() functions. I wanted more selective control at the byte, word and
double word level, so I developed the code to address real mode memory more
directly. An example is the poke() function that puts a single byte into real
mode memory:
#define poke(addr, val) asm(" push es");\
asm(" mov ax, 034h");\
asm(" mov es, ax");\
ebx = addr; cl = val;\
asm(ebx, cl, " mov byte ptr es:[ebx], cl");\
asm(" pop es")
implemented as a #define using inline assembler. A commented version appears
in Listing 2. Listing 1 also shows the peek() function.
Listing 3 shows how to substitute and poke an 8-bit byte, 16-bit word or
32-bit double word into real memory from protected mode.
The simplest example is illustrated with the 256 color MCGA mode 0x13 320x200
that uses one byte per pixel. To write color (0-255) to pixel (x,y), the code
is:
poke(655360 + y*colx + x, color);
/* colx = 320 pixels across the screen */
where x and y are pixel screen coordinates; 0, 0 at upper left.
Not quite as simple, but given as examples in several texts, the 16-color EGA
modes (0x10 640x350, 0x12 640x480 and 0x54 800x600) are addressed through the
EGA registers (where colx is 640 or 800) (Listing 4).
Still more complicated are the higher color or resolution 640x480x256,
800x600x256, and 1024x768x16 modes. The real mode graphics memory starting at
address 655360 (0xA000000) only stretches for 64K (65536) bytes in size. Yet
these modes require 307,200 (640x480), 480,000 (800x600), or 393,216
(1024x768) bytes (the last is two pixels per byte) to fill a graphics screen.
Addressing the graphics memory in the graphics card requires sending the data
through the 64K region starting at real memory address 655360. Sending through
more than 64K requires breaking the data into 64K blocks and paging the
graphics card. Going through the same exercise as before, Listing 5 shows the
code to write color (0-255) in 640x480 or 800x600 256-color mode to x,y.
The variable old_page is used to determine whether the current and previous
coordinates are on the same or different memory pages. After the paging code,
note the similarity to the 320x200x256 mode. Listing 6 gives the code to write
color (0-15) in 1024x768 mode to x,y.
However, in this version the code changes to accommodate two pixels per byte.
Since only one pixel is changing, the byte containing both neighboring current
pixels must be read in, it must be determined which half of the byte to
change, and both (original and changed) pixels must be written back out as one
byte.
As shown, these code fragments plot a single pixel at a time. If a group of
pixels are to be plotted at once (as a series of points defining a line vector
or plane image), the code can be optimized. For example, in raster plotting in
the 1024x768 mode, paging is only important when y is changing, not when x is
changing. Similarly in 256 color mode, instead of writing one byte at a time,
four pixels along the x-axis can be combined into one 32-bit double word that
can be moved much faster than four individual bytes. On a 25-MHz 80386 using
raster methods on the ATI VGA Wonder, the entire screen can be filled with an
image in 0.17 second (320x200x256 MCGA) to 0.83 second (1024x768x16 SVGA) to
4.28 seconds (640x480x16 VGA).
The remainder of Listing 1 shows how to open graphics modes and generalize a
pixel write for any mode (mode independent graphics). The code also shows the
differences between setting the color palettes in the various modes. All of
the 256-color modes use the vga_palette() function, and the 16-color modes
(except 1024x768) use the set_palette() function from the MicroWay libGREX.
lib. The 1024x768x16 mode uses the ext_palette() function shown in the
listing.
The complete library uses this type of approach to provide device independent
graphics for several graphics cards of different manufacturers, with windowing
and global user scaling (instead of pixel counting), WYSIWYG CRT vector
graphics to HPGL plotters/printers, rotatable and scalable fonts, mixed vector
and raster graphics, and more. Only four library functions (the vector pixel
driver, the raster image driver, the graphics mode function, and the window
clear function) need modification to add capabilities of new graphics cards.
Most graphics cards will obey the EGA 16-color register modes, with allowances
made for the differing mode numbers (for example, the 800x600x16 mode on the
ATI is 0x63, but the Orchid is 0x30 and Paradise is 0x58). Most graphics cards
set the video mode much like the standard IBM modes (using BIOS interrupt 10h
and function 00h) as shown in the listing, but some (for example, Everex and
Video 7) set the video mode differently. The 1024x768x16 modes may or may not
be paged. The 256-color modes use different memory paging boundaries and
addresses, requiring separate treatments for each card beyond the IBM standard
MCGA 320x200x256 (for example, ATI uses 64K pages but Paradise uses 4K pages).
The Video Electronics Standards Association (VESA) is attempting to
standardize these super VGA graphics modes, and VESA compatible cards are now
available. The IBM 8514/A and Texas Instruments 340x0 TIGA graphics interfaces
are different due to the on board graphics co-processor. However, the same
types of problems addressed in these examples will also guide the solution to
driving those graphics boards (although some, such as the Truevision ATVista
34010 board, may be addressed directly from protected mode). 


Suggested reading:


Duncan, Ray, ed., 1990, Extending DOS, Reading, MA, Addison-Wesley, 432p.
Ericson, Bo, 1990, "VESA VGA BIOS Extensions," Dr. Dobb's Journal, v.14, n.4,
p.65-70.
Kliewer, Bradley D., 1988, EGA/VGA A Programmer's Reference Guide, NY,
McGraw-Hill, 269p.
Richter, Jake and Smith, Bud, 1990, Graphics Programming For The 8514/A,
Redwood City, CA, M&T Books, 366p.
Stevens, Roger T., 1988, Graphics Programming In C, Redwood City, CA, M&T
Publishing Inc., 639p.

Listing 1
/* Copyright 1990 by Gary R. Olhoeft
* This code may be freely copied for personal,
* non-commercial use. compile using MicroWay's
* NDP C-386 compiler:
* cc graphics.c -w -v -rt2 -n2 -n3 -lGREX.LIB
*/
#include <stdio.h>
#include <dos.h>

#include <grex.h>
void plot_pixel();
void graphics();
void spectrum();
void ext_palette();

union REGS reg; /* required by inline assembly */
int v_mode, v_color, v_rowy, v_colx, page;
unsigned short ati_extreg;
char ega_palette[17] = (0,1,9,11,3,19,2,18,6,54,38,
52,4,36,41,13,0};
int modes[] = {0x0d,0x0e,0x10,0x12,0x13,0x54,
0x62,0x63,0x65};
void main()
{
int i,x,y,key;
unsigned char color;
for (i=0; i<9; i++)
{
graphics(modes[i]); /* cycle through ATI VGA
Wonder graphics modes */
printf("Mode %d %dx%dx%d\n",v_mode,v_colx+1,
v_rowy+1,v_color+1);
/* printing shrinks as pixel count increases */
spectrum();
/* draw banded color lines on screen */
for (x = 0; x<=v_colx>>2; x++)
{
for (y = 0; y<=v_rowy; y++)
{
color = y % v_color;
plot_pixel(x+y,y,color);
}
}
/* pause loop until keypress;
exit with ^C or Esc */
if ((key=pauseb())==27) {
set_video_mode(0x02); exit();
}
}
set_video_mode(0x02); /* restore text mode */
exit();
}
void graphics(mode)
unsigned char mode;
{
unsigned char buffer[2];
switch (mode) /* note: not all possible modes shown */
{
case 0x0d: /* EGA */
v_colx = 320;
v_rowy = 200;
v_color = 16;
break;
case 0x0e: /* EGA */
v_colx = 640;
v_rowy = 200;
v_color= 16;
break;

case 0x10: /* EGA high resolution */
v_colx = 640;
v_rowy = 350;
v_color = 16;
break;
case 0x12: /* VGA */
v_colx = 640;
v_rowy = 480;
v_color = 16;
break;
case 0x13: /* MCGA */
v_colx = 320;
v_rowy = 200;
v_color = 256;
break;
case 0x54: /* ATI SVGA */
v_colx = 800;
v_rowy = 600;
v_color = 16;
break;
case 0x62: /* ATI SVGA */
v_colx = 640;
v_rowy = 480;
v_color = 256;
break;
case 0x63: /* ATI SVGA */
v_colx = 800;
v_rowy = 600;
v_color = 256;
break;
case 0x65: /* ATI SVGA */
v_colx = 1024;
v_rowy = 768;
v_color = 16;
break;
default:
mode = 0x02; /* text mode */
v_colx = 80;
v_rowy = 25;
v_color = 1;
break;
}
v_mode = mode;
v_colx--; /* set ranges 0 to pixels-1 */
v_rowy--;
v_color--;

reg.b.ah = 0;
reg.b.al = mode;
int386(0x10,&reg,&reg); /* call BIOS video
interrupt to set mode */

blk_mb(buffer, 0x34, 786448, 2);
ati_extreg = buffer[0]+256*buffer[1];
/* find ATI extended_reg address
(could use peek() instead)*/
page = 99; /* force paging first plot cycle */
}


/* plot pixel of color at (x,y) display units */
void plot_pixel(x,y, color)
int x,y;
unsigned char color;
{
#define peek(addr, val) asm(" push es"); \
asm(" mov ax, 034h"); asm(" mov \
es, ax"); ebx = addr; asm(ebx, " mov \
cl, byte ptr es:[ebx]", cl); val = cl; asm\
(" pop es")
#define poke(addr, val) asm(" push es"); \
cl = val; asm(" mov ax, 034h"); \
asm(" mov es, ax"); ebx = addr; \
asm(ebx, cl, " mov byte ptr es:[ebx], \
cl"); asm(" pop es")
/* required by inline assembly */
reg$eax unsigned short ax;
reg$eax unsigned char al;
reg$ah unsigned char ah;
reg$ecx unsigned short cx;
reg$ch unsigned char ch;
reg$ecx unsigned char cl;
reg$edx unsigned short dx;
reg$edx unsigned char dl, val;
reg$ebx unsigned ebx, addr;
int i,vcol,yvx;
if ((y < 0) (y > v_rowy) (x < 0) 
(x > v_colx)) return;
/* clip physical display boundaries */
y = v_rowy-y; /* put (0,0) at lower left corner of plotter */
vcol = v_colx+1;
yvx = y*vcol + x;
switch (v_mode)
{
case 0x65: /* ATI 1024x768x16 */
ch = (char)(y >> 6);
if (ch!=page) /* only change page if different */
{
dx = ati_extreg; /* location of ATI card external register */
asm(" cli "); /* disable interrupts */
asm(" mov al,0b2h"); /* page select */
asm(dx," out dx,al"); /* ATI extended register */
asm(" inc dl");
asm(" in al,dx");
asm(" mov ah,al");
asm(" and ah,0e1h"); /* page mask */
asm(ch," or ah, ch"); /* ch = memory page desired */
asm(" mov al,0b2h"); /* page select */
asm(" dec dl");
asm(" out dx,ax");
asm(" sti "); /* enable interrupts */
page = ch;
}
addr = 655360 + ((y << 9) % 65536) + (x >> 1);
peek(addr, val); /* read existing color of pixel pair */
if (x % 2) val = color (val & 0xF0); /* change addressed pixel */
else val = (color << 4) (val & 0x0F);
poke(addr, val); /* write pixel pair */
break;

case 0x13: /* MCGA 320x200x256 */
addr = 655360 + yvx;
poke(addr, color); /* write direct to real video memory */
break;
case 0x62: /* ATI 640x480x256 */
case 0x63: /* ATI 800x600x256 */
ch = (unsigned char)(yvx >> 15);
if (ch!=page)
{
dx = ati_extreg; /* location of ATI card external register */
asm(" cli "); /* disable interrupts */
asm(" mov al,0b2h"); /* page select */
asm(dx," out dx,al"); /* ATI extended register */
asm(" inc dl");
asm(" in al,dx");
asm(" mov ah,al");
asm(" and ah,0e1h"); /* page mask */
asm(ch," or ah,ch"); /* ch = memory page desired */
asm(" mov al,0b2h"); /* page select */
asm(" dec dl");
asm(" out dx,ax");
asm(" sti "); /* enable interrupts */
page = ch;
}
addr = 655360 + (yvx % 65536);
poke(addr, color); /* write direct to real video memory */
break;
default:
/ * 0x10 EGA 640x350x16, 0x12 VGA 640x480x16, 0x54 ATI 800x600x16 */
asm(" push es"); /* save segment register */
asm(" mov ax, 034h"); /* use Phar Lap LDT to */
asm(" mov es, ax"); /* access real memory */
dx = 0x3ce; /* EGA graphics register */
ebx= 655360 + (yvx >> 3); /* memory position of pixel */
asm(ebx," mov cl, byte ptr es:[ebx]"); */ load EGA registers */
ax = color << 8;
asm(dx, ax, " out dx, ax"); /* set color */
ax = 0x0F01;
asm(dx, ax," out dx, ax"); /* enable */
ax = 0x0003; /* 0x00 = replace, 0x10 OR, 0x18 XOR, 0x08 AND */
asm(dx, ax," out dx, ax"); /* pixel write mode */
ax = ((0x80 >> (x%8)) << 8 ) + 8;
asm(dx, ax," out dx, ax"); /* bit mask (8 pixels/byte) */
asm(ebx," mov byte ptr es:[ebx], 255"); /* write EGA regs */
asm(" pop es"); /* restore segment register */
}
}

void spectrum() /* create color spectrum palette */
{
int i;
if (v_mode == 0x65) /* 1024x768x16 mode only */
{
ext_palette( 0, 0, 0, 0); /* black */
ext_palette( 1, 63, 0, 0); /* red */
ext_palette( 2, 63, 21, 0);
ext_palette( 3, 63, 42, 0);
ext_palette( 4, 63, 63, 0); /* yellow */
ext_palette( 5, 42, 63, 0);

ext_palette( 6, 21, 63, 0);
ext_palette( 7, 0, 63, 0); /* green */
ext_palette( 8, 0, 63, 21);
ext_palette( 9, 0, 63, 42);
ext_palette(10, 0, 63, 63); /* cyan */
ext_palette(11, 0, 42, 63);
ext_palette(12, 0, 21, 63);
ext_palette(13, 0, 0, 63); /* blue */
ext_palette(14, 21, 0, 63);
ext_palette(15, 42, 0, 63); /* magenta */
}
else
{
if (v_color == 255) /* all 256 color modes */
{
vga_palette(0,0,0,0);
for (i=1; i<=64; i++) { vga_palette(i,63,i-1,0); }
for (i=65; i<=128; i++) { vga_palette(i,128-i,63,0); }
for (i=129; i<=192; i++) { vga_palette(i,0,63,i-129); }
for (i=193; i<=255; i++) { vga_palette(i,0,255-i,63); }
}
else
{ /* 16 color EGA modes except 1024x768 */
set_palette(ega_palette);
}
}
}
void ext_palette(i,r,g,b) /* write to external palette registers */
unsigned char i,r,g,b;
{
#define outp(p,v) dx = p; al = v; asm(dx,al," out dx,al")
reg$eax unsigned char al,v;
reg$edx unsigned short dx,p;
outp(0x3C8, i);
outp(0x3C9, r);
outp(0x3C9, g);
outp(0x3C9, b);
outp(0x3C8, i<<4);
outp(0x3C9, r);
outp(0x3C9, g);
outp(0x3C9, b);
}


Listing 2
asm(" push es"); /* save segment register */
asm(" mov ax, 034h"); /* use the Phar Lap Local Descriptor Table (LDT) */
asm(" mov es, ax"); /* segment selector 034h to access real memory */
ebx = addr; /* real memory address desired */
cl = val; /* byte value to poke */
asm(ebx, cl, " mov byte ptr es:[ebx], cl"); /* poke it */
asm(" pop es") /* restore segment register */


Listing 3
8-bit byte: cl = val;
asm(ebx, cl, " mov byte ptr es:[ebx], cl");
16-bit word: cx = val;
asm(ebx, cx, " mov word ptr es:[ebx], cx");

32-bit double word ecx = val;
asm(ebx, ecx, " mov dword ptr es:[ebx], ecx");


Listing 4
asm(" push es"); /* save es */
asm(" mov ax, 034h"); /* use Phar Lap LDT to access real */
asm(" mov es, ax");
dx = 0x3CE; /* EGA graphics address register */
ebx= 655360 + ((y*colx+x) >> 3); /* memory position of pixel */
asm(ebx, " mov cl, byte ptr es:[ebx]"); /* load EGA registers */
ax = color << 8;
asm(dx, ax, " out dx, ax"); /* set color */
ax = 0x0 F01;
asm(dx, ax, " out dx, ax"); /* enable */
ax = 0x0003; /* 0x00 = replace, 0x10 OR, 0x18 XOR, 0x08 AND */
asm(dx, ax, " out dx, ax"); /* pixel write mode */
ax = ((0x80 >> (x % 8)) << 8 ) + 8;
asm(dx, ax, " out dx, ax"); /* bit mask (8 pixels/byte) */
asm(ebx, " mov byte ptr es:[ebx], 255"); /* write EGA registers */
asm(" pop es"); /* restore es */


Listing 5
ch = (unsigned char)((y * colx + x) >> 15); /* find current page */
if (ch != old_page) /* only change page if different */
{
dx = ati_extreg; /* location of ATI card external register */
asm(" cli "); /* disable interrupts */
asm(" mov al,0b2h"); /* page select */
asm(dx," out dx,al"); /* ATI extended register */
asm(" inc dl");
asm(" in al,dx");
asm(" mov ah,al");
asm(" and ah,0e1h"); /* page mask */
asm(ch," or ah,ch"); /* ch = memory page desired */
asm(" mov al,0b2h"); /* page select */
asm(" dec dl");
asm(" out dx, ax");
asm(" sti "); /* enable interrupts */
old_page = ch;
}
addr = 655360 + ((y * colx + x) % 65536);
poke(addr, color); /* write direct to real video memory */


Listing 6
ch = (char)(y >> 6);
if (ch != old_page) /* only change page if different */
{
dx = ati_extreg;
asm(" cli ");
asm(" mov al,0b2h");
asm(dx," out dx,al");
asm(" inc dl");
asm(" in al,dx");
asm(" mov ah,al");
asm(" and ah,0e1h");
asm(ch," or ah,ch");

asm(" mov al,0b2h");
asm(" dec dl");
asm(" out dx,ax" );
asm(" sti ");
old_page = ch;
}
addr = 655360 + ((y << 9) % 65536) + (x >> 1);
peek(addr, val); /* read existing color of pixel pair */
if (x % 2) val = color (val & 0xF0); /* change left pixel */
else val = (color << 4) (val & 0x0F); /* change right pixel */
poke(addr, val); /* write pixel pair */




















































Developing A C Simulation Library


Richard Mustakos


Rich Mustakos has a B.S. in civil engineering from the Virginia Military
Institute and currently works for MRJ, Inc. He has been an active programmer
for 10 years and doing simulation modeling since 1983. You may reach him at
2404 Alsop Ct., Reston, VA 22091 or on CompuServ 76470,33.


I recently was faced with writing a simulation project that would work on a
multitasking operating system. I had not programmed under this particular
operating system in over six years and was unfamiliar with the tools and
system calls. This project was also under a tight deadline.
Since I have been programming using Turbo C on MS-DOS systems, I decided the
best approach would be to develop and debug the project on an MS-DOS system
under Turbo C and port the code to the target system.
To exploit the target system's multitasking features, I needed to emulate the
system calls under MS-DOS. Once I had acquired enough information about the
target system, I put together a short specification for the emulator and then
coded it. 
After the emulator and the initial development of the project under MS-DOS, I
ported the project to the target system (another long story in its own right).
Once the port was running, I returned to the emulator and examined what I had
written. 
The emulator's major thrust was to provide co-routine "process" control and
timing capabilities, and inter-"process" communications links that mirrored
the target system's calls.
After examining the emulation software, I realized that the emulator very
nearly constituted a discrete event simulation library. I then wrote a small
requirements specification (see the sidebar "Requirements Specification") and
developed a simulation library.


The Simulation Library


The simulation library consists of two header files and one code file. The
first header file, CSIM. H (Listing 1), is used by programmers in their
simulations. It includes only those portions of the simulation library
interface that must be visible to the programmer's project. 
The second header file, CSIMINT. H (Listing 2) includes all of the internals
for the simulation library and should not be #included in the programmer's
project. This header is #included in the library code files. It includes the
prototypes for all routines that are visible to the programmer, the routines
and variables that are used exclusively inside the simulation library, and all
structures and variables that are not visible to the programmer's project.


Data Structures


The _simprocessT structure (Listing 2) holds all information pertaining to a
process and connections between processes, including all pointers and
general-purpose flag registers. Since the C specification does not ensure that
all registers will be totally restored after a subroutine call, the entire
register set must be stored, and then restored. The remaining fields next and
prev are pointers to other simprocess structures; init records the process's
level of initialization. proc_id is the process id number and is started
sequentially from 1; kill_flag notifies the task handler that this process is
marked to be stopped, and wait_flag notifies it that the process is waiting on
a post; waitpost is a pointer to a post and keeps track of which post the
process is waiting for (waitpost is NULL if the process is not waiting);
start_time is the time of the process's next action in sim_system_time units.
The _simprocessT structure also serves as the basis for a process stack.
The _postT structure (Listing 2) is used for interprocess communications and
to effect process synchronization. The structure contains pointers to other
posts, its ID number, by which processes reference the post after it has been
initialized, the post's name (which is used for initialization), a void
pointer (value) that points to the memory to be passed in interprocess
communications, and a pointer to an array of process pointers that keep track
of all processes that are waiting for this post to be set.


Hidden Globals


The processlist instance is both the current process and task_handler's link
into the list of processes. kbprocess links to the process that handles
keyboard activity. totalprocs holds the number of processes that have been
produced, or produced and stopped. old_vector stores the old value of the
interrupt vector that is being used for context switching, and which will be
restored if an orderly exit is made. 
Globals init, glbl_sp, glbl_ss, glbl_bp and glbl_fc are used during
initialization to store current register information, and to pass the routine
to be used by the process to the task handler. systimer points to the memory
location of the system clock. postlist points into the circular list of posts,
and generally points to the last post to be used. totalposts records the
number of posts that have been created in the system up to this point; there
is no way to delete posts. 


Control Routines


The system control routines include install_handler, task_handler, sim_start,
set_time_ratio and exit_processing (Listing 3). The routine install_handler
installs the task_handler and saves the old interrupt vector for
exit_processing to reinstall. It also tests to see if the installation was
successful, though it can be fooled. If the task_handler was already
installed, the routine will return an error.
The routine task_handler is the heart of the simulation system. It
accomplishes context switching, context initialization and context
destruction, checks for user input, updates system time, and determines which
is the next process to gain control. Almost all of these functions are shaped
by the method used for context switching, which uses the Turbo C concept of
the interrupt routine type. Routines declared as interrupt type must be
installed as interrupts, When the associated interrupt is invoked, all of the
CPU registers will be saved on the stack (where the programmer can get to them
and save them elsewhere if desired).
The task_handler is a type interrupt routine which, when called through an
interrupt, places all available register values on the stack, where they are
accessible as local variables. These variables are used to fill the
appropriate fields in the _simprocessT structure when switching out of a
context. The same stack variables are overwritten with values from a process
_simprocessT fields for a particular process when a different context becomes
"current."
In several areas, I have chosen to err of the side of caution, creating some
potential redundancies. For example, each process context has its own stack
area, located in memory directly above _simprocessT structure for that
process. When each process is started, the following occurs: the stack and
base pointer fields in the process's structure and actual registers are
initialized to the top of its stack, the CS (code segment) and IP (instruction
pointer) fields are set to point to the routine specified for this process,
all the registers are transferred out of the structure to the stack and an
interrupt return instruction is executed. Keeping copies in both the structure
and on the stack and replacing those on the stack before the interrupt return
may be unnecessary, but makes me feel safer.


Context Initialization


The task_handler routine will initialize a context if called while the global
_sim_init_val equals 1. Three pre-conditions must be established before
requesting a context initialization. First, glbl_fc should contain the address
of the routine that will be run when this process starts running. Second,
processlist should be set to the address of the _simprocessT structure that
will contain information on this process. Finally, the stack pointer, stack
segment and base pointer registers must point to the top of stack space for
this process. After enforcing these conditions, the proper values are placed
in the process's structure, the structure field init is set to 1, and the
task_handler returns control to the calling routine (start_process). This
portion of initialization generally occurs before the simulation starts, but
these actions will still be performed even if a process is created after the
simulation starts.
The second part of initialization occurs once simulation has started. At this
time, _sim_init_val is 0, and the structure field init is 1. These flags cause
the task_handler to load the code segment and instruction pointer registers
with values pulled out of the fields in the structure and the field init is
set to 0. After the interrupt return has completed, the process starts
executing as though it were called from another routine. At this point the
process is up and running on its own.
stop_process is called when a process must be destroyed. This routine finds
the _simprocessT structure for the process to be stopped and sets the field
status.kill_flag to 1. After stop_process has changed the value of
status.kill_flag for that process, the task_handler removes the offending
process from the list of processes and frees its memory.
At each transfer of control, task_handler looks for the next process to be
given control. It can decide in one of three ways. If a key has been pressed,
task_handler automatically passes control to the keyboard routine (assuming
one has been designated with start_kb_process). If there are no pending
keystrokes and _sim_time_ratio is 0, task_handler will start the process with
the lowest value of start_time and advances _sim_system_time to match
start_time. If _sim_time_ratio is not 0, the difference between now and the
last time update is multiplied by _sim_time_ratio and added to
_sim_system_time. The process (if any) with a start_time less than or equal to
_sim_system_time becomes the current process, and _sim_system_time is set to
start_time. If no process has a start_time less than or equal to
_sim_start_time, task_handler continues to loop through the list of processes
while actual system time goes by, updating _sim_system_time at the rate of
_sim_time_ratio until a processes start_time is reached.
The last system control routines are sim_start and exit_processing. sim_start
starts the actual simulation processes by setting the global variable init to
0. An interrupt is then executed to start task_handler. The routine
exit_processing cleans up after the simulation library and may additionally
signal any errors on exit. To shut down the simulation system, the interrupt
that was initially changed to hold the address of task_handler is returned to
the value in old_vector. Finally, all the process and post memory structures
are freed.
The process control routines are start_kb_process, start_process and
stop_process. The major difference between the routines start_kb_process and
start_process is that start_kb_process signals to task_handler that the
process it is about to initialize is to be the designated keyboard handler.
These routines ensure that the task_handler has been installed and allow it to
perform the first level of initialization.
The start_kb_process and the start_process routines call getvect ( ) to
determine if the task_handler has been installed. getvect( ) checks that the
address of the selected interrupt is the same as the address of task_handler.
If the returned address is not the same as task_handler, then install_handler
is called to install task_handler.

When start_kb_process and start_process set up a process, they allocate memory
equal to the size of a _simprocessT structure plus a stack (sized in
paragraphs). The address of the structure is normalized as it would be in a
huge memory model program. The stack segment is set to the normalized segment
address of the structure, and the stack and base pointers are set to the
normalized offset address of the structure, plus the size of the stack in
bytes, establishing the stack space as a unique piece of memory for each
process.
The code segment and instruction pointer fields of the _simprocessT structure
for the current process are set to point to the beginning of the routine to be
used by the process. Next, the new structure is spliced into the processlist.
Finally, these routines save the current stack and base pointers in global
variables, set the actual stack and base pointers to the newly-created stack,
call task_handler to initialize the process structure, and clean up the stack
and base pointers from global variables.
stop_process is one of the simpler routines since task_handler performs the
majority of the operations required to kill a process. stop_process merely
finds the correct _simprocessT structure and sets the status.kill_flag to 1.
The routines wait_until_time and wait_for_time directly control timing of
processes. wait_until_time determines the interval between the current time
and the desired start time. If the interval is negative, exit_processing is
called with an appropriate number. If the interval is positive, then
wait_until_time calls wait_for_time to request the delay. wait_for_time
determines the time to wake the process, sets the process's start_time and
then executes an interrupt to task_handler. task_handler waits until
_sim_system_time is equal to the process's start time before returning control
to the process.


Communications And Synchronization


The final set of routines are those that perform interprocess communications
and synchronization: init_post, set_post, get_post and wait_post.
init_post can produce a new post or verify the existence of an old post. When
called, init_post checks all existing posts to determine if the incoming
string argument names an existing post. If so, the ID for the matching post is
returned to the calling process. If no post exists for that name, a new post
with initial values set to NULL is linked into the list of posts. The new
post's ID number is returned to the calling routine.
set_post is more complicated than init_post.set_post searches the list of
posts for one with the proper ID. If the candidate post is currently set, a
zero time wait is performed, allowing a simultaneous read of the old value to
occur. If the value is still set, the routine returns the ID of the post to
signal the error. Otherwise, the value address is set to the address supplied
by the user.
If one or more processes are waiting for this post, the status.wait_flag field
in the first waiting process structure is assigned 0, notifying task_handler
that the process has started. Additionally, any processes found waiting on
this post are placed into the NULL-terminated array of process pointers linked
to the _postT structure.
get_post searches the list of posts for one with the appropriate ID. If the
matching post's value field is NULL, get_post returns NULL, signifying that
the post is not set. Otherwise, get_post returns the address stored in value.
wait_post functions much like get_post, but when wait_post encouters a NULL
value field, it sets the process's status.wait_flag to 1, signaling
task_handler to ignore this process. The post's address is placed in the
process structure's waitpost field, and the address of the process is placed
in the _postT's waiting array. wait_post then waits for the value field of the
post to become something other than NULL. When this occurs, the post value is
NULLed and returned to the calling program. The process status.wait_flag is
then set to 0 and normal processing continues.


Using the Library


The library's programming interface consists of visible routines declared in
CSIM.H.
int start_kb_process (void
(*func)(void),int stacksize);
int start_process (void
(*func)(void), int stacksize);
Both routines initialize a process to start the routine func, with a stack
that is 16 times stacksize bytes of memory long. The routine start_kb_process
also attaches keyboard input to the process. Both routines can be called
before or during a simulation run and allow processes to be spawned during the
simulation.
int set_time_ratio (float ratio);
This routine sets the simulation running speed to allow a maximum of ratio
seconds of simulation time to occur during each second of real time. If the
ratio is zero, the simulation will progress as fast as it can. This process
may be called anytime before or during a simulation; its effects are
immediate.
void sim_start (void);
This routine starts the simulation, passing control out of the calling routine
to the task handler. This routine should be called only once to start the
simulation and does not return control to the calling routine.
void exit_processing (int condition);
This routine stops the simulation. exit_processing cleans up the interrupt
table and deallocates all process memory. This routine prints the value of
condition to provide some debug information to the programmer. This routine
can only be called once, but can be called anywhere in a program.
void stop_process (int process_id);
This routine stops the process identified by process_id. This routine should
only be called after a simulation has started. You can also stop a process by
sending it a signal that tells it to stop itself.
int wait_until_time
(long unsigned starttime);
To delay until a particular simulation time is reached, a process should put
itself to "sleep" by calling this routine with the "wakeup" time, start_time,
in seconds. Control is passed to task_handler and the simulation continues.
Generally, control will return to this routine when CurrentTime is equal to
start_time for this process. This routine should only be called after
sim_start has been called.
int wait_for_time (float delaytime);
To stop for a certain, predetermined time, a process should call this routine
with the delay time in seconds. Control passes to task_handler and is returned
to the calling routine after delaytime seconds of simulation time have
elapsed. This routine should only be called after sim_start has been called.
int init_post (char *postname);
This routine causes the system to check if a post with the name contained in
postname exists. If the post does not exist, one is created and its ID is
returned. If the post does exist, its ID is returned to the calling program.
Control is always returned to the calling program with no change in simulation
time. This routine may be called at any time.
int set_post (int post_id,
void * pointer);
After a post has been initialized, information can be passed to other
processes by calling set_post. The arguments are the integer ID of the
required post and a pointer to a memory buffer. When called, set_post
surrenders control of the system for zero time, allowing simultaneous events
to occur. This routine returns 0 if successful, the ID of the post if the post
has already been set, and the negative ID if the post was not found. This
routine should only be called after sim_start has been called.
void *get_post (int post_id);
A process must call get_post to get information from a post identified as the
argument post_id. If the post is not set, a NULL pointer is returned. If the
post has been set, the pointer the post was set to is returned, and the post
is reset. When this routine is called, control remains with the routine, and
no simulation time passes. This routine should only be called after sim_start
has been called.
void *wait_post (int post_id);
This routine can be used both to pass information and to synchronize
processes. The argument post_id identifies the post to be checked. If the post
has been set, control is returned immediately to the calling process. If the
post is not set, control does not return to the calling process until another
process sets the post this process is waiting for. If there is no post with
that ID, the routine returns a NULL to the calling routine, otherwise the
routine returns the pointer in the value field of the post, and this field is
then reset to NULL. This routine should only be called after sim_start has
been called.


An Example


Discrete event simulation can simulate a single queue, multi-server teller
system (see Listing 4). The routine generator sends customers into the
simulation and the three copies of the process teller then service and release
them.
Using these routines requires knowledge of their internal structure (to keep
from doing things that are 'bad'). All processes start with a stack that does
not come from MS-DOS. The process should never finish with a normal exit. If a
process is to be exited, it must kill itself and then wait for some period of
time to allow the task_handler to delete it from memory. The system does not
know when a simulation is over, so programmers should include a process that
shuts down the simulation:
void ender(void)
{
wait_for_time(8 _HOURS_);
exit_processing(0);

}
This process allows the simulation to run for up to eight hours and then stops
it. If this process is started as the keyboard process, it will be awakened
when a key is pressed, calling exit_ processing which shuts down the
simulation system. The routine generator initializes a post for passing
information to the other processes. The generator process then spends the rest
of the simulation looping through a section of code that introduces a customer
into the simulation, calls set_post to notify the teller processes of the
customer's arrival and waits up to four and a half minutes before restarting
the loop.
The teller process initializes the same post that the generator process
initialized, giving the processes a way to communicate with each other. In
particular, the teller processes will wait until they are notified that a
customer has arrived by being sent the named signal "Customers". After
initializing this post, the routine goes into an endless loop where it spends
the rest of the simulation either waiting for notification that a customer has
entered the simulation, or serving the customers, an operation that takes at
least four minutes but no more than 10 minutes. Whenever this process is given
the named signal, it checks to see if any customers are still in the queue. If
there are customers, the process sends the named signal so that another teller
process will see it and serve the next customer.
The only portion of this simulation that I have not shown is the actual
creation of the processes and the start of the simulation. In this section we
declare and initialize the global variables customers and tellers, and set the
simulation time ratio to 1,000:1, completing our general initialization. To
initialize our processes, a generator, three tellers processes and an ender
process are started. I have set the size of the stacks for all the processes
to 64 paragraphs of memory, or one Kb of stack space per process. (A good rule
of thumb is that if things act strange, increase the stack space. I analyze
all the routines that are called to come up with an approximate stack size,
and then add another 0x20 or 0x30 paragraphs to that size. Since the processes
all have the same stack size, I only check to see if the last one returns an
error. If there is not enough room for one of the earlier processes, there
will not be enough room to generate the ender process.) To actually start the
simulation, call sim_start. Although I have a return after that point in main,
note that control never returns from sim_start to main.
I have included another demo routine, grfcdemo.c (Listing 5). grfcdemo.c shows
the library's ability to be integrated into a graphics environment.


Further Discussion




Resource Control and Contention


Resources are another area of simulation. A resource is any item in limited
supply that is required by one or more processes but cannot be used by more
than one process at the same time.
Ideally, a resource can be requested by multiple processes at the same time,
and actually delivered to a particular process based on some prioritization
scheme, either FIFO or FIFO by priority, or whatever scheme is appropriate.
A crude method of performing this scheme is to use set_post and wait_post
inside another routine to allocate and deallocate resources from processes,
and to use a global variable to track how many of the resources are available
at any one time.


Swapping Processes Out of Memory


One of this system's limitations is that only as many processes may be active
as can fit into memory. Having more active processes would be possible if the
_simprocessT structure and the process stack could be transfered out of DOS
memory, either onto disk or into expanded or extended memory. Swapping the
structure out of MS-DOS memory risks corrupting the base and stack pointers
and the stack segment pointer. This corruption would occur if the process were
loaded into a different area of memory after a swap. This problem is not
insurmountable, since the new register values could be calculated on the fly.


A Continuous Simulation Library


Continuous simulation is possible using this library. If each process sets its
own step time and loops through updating itself and waiting for that amount of
time, the same effect can be achieved as with true continuous modeling. An
advantage of continuous simulation over continuous modeling is that each
simulation process may set a different step time as required by the real-world
process it is modeling, thereby reducing the overall CPU load while not
degrading the resolution of the simulation. 
Requirements Specification


Process Control 


Start a simulated process. 
Stop a simulated process. 
Wait for a certain interval of simulation time to pass.
Wait until a certain simulation time. 


Process Synchronization 


Wait for a named signal from another process.
Check for a named signal from another process.
Send a named signal to another process.


User Interaction


Provide for a dedicated user interface process.
Provide for process to request user input.
Processing should halt at the first possible time for user input.


Data Hiding



Programmer may access, but not change simulation time.
Simulation structures and variables should not be visible to the users
program.


Miscellaneous


Simulation should run at some ratio to real time for human viewing and
interaction, or at the processor's speed for efficiency.
Pass a block of memory to another process.
Discrete Event Simulation
Discrete event simulation is a method of modeling what happens in the real
world. Its aim is to break the actions of the simulated process into a series
of sequential steps, each of which takes a certain amount of time. For
example, discrete event simulation could model a customer being serviced by a
bank (see Listing 4). The three events are: a customer arrives at the bank, a
customer gains the attention of a teller, and finally, the customer leaves.
After the customer arrives at the bank, he must wait some amount of time
before he gains the attention of a teller, even if that amount of time is
zero. Once the customer has the teller's attention, it takes time for the
transaction to occur. Finally, the customer leaves, and returns after some
longer length of time.
Generally in this type of simulation, what is known is the amount of time
required by the teller to service the transaction, the amount of time between
visits by the customer and the number of customers. The banker is interested
in the minimum number of tellers that may be hired to service customers. The
banker is also interested in not forcing customers to wait longer then a
certain average time.
To solve this problem for the banker, a simulation may be written that has
customers arriving at a certain rate, requiring a certain amount of time to be
serviced by a teller, and a variable number of tellers, starting at one and
working up to some reasonable upper limit of tellers. The simulation would try
each number of tellers for a period of time and accumulate statistics on the
average line length and wait time for customers. This simulation would allow
the banker to select an appropriate number of tellers to hire for this bank
site.
When the simulation is run, it moves from event to event. The simulation
doesn't spend any processing time looking at what occurs between events, since
what happens to the customer when he is waiting in line or being serviced has
no effect on the outcome of the simulation.

Listing 1 (CSIM.H)
/* Simulation time available to programmer */

extern long unsigned CurrentTime;

/* Starts the simulation after processes have benn create */

void sim_start(void);

/* Visible routine for process control */

int start_kb_process(void (*func)(void), int stacksize);
int start_process(void (*func)(void), int stacksize);
void stop_process(int process_id);
int my_process_id(void);

/* Visible routines for control of process time utiliva*/

int wait_until_time(long unsigned starttime);
int wait_for_time(float delaytime);

/* Sets the sim time to real time ratio - maximum running speed.
If this is set to 0.0, the system will run as fast as possible */

int set_time_ratio(float ratio);

/* Visible routines for the handling of posts */

int init_post(char *postname);
int set_post(int posthandle, void *pointer);
void *get_post(int posthandle);
void *wait_post(int posthandle);

*/ Exiting the system, and cleaning up the interrupts
reports the value of the exit condition before
exiting, useful for signalling certain errors */

void exit_processing(int condition);

#define _SECONDS_____LINEEND____
#define _MINUTES_ *60.0
#define _HOURS_ *3600.0



Listing 2 (CSIMINT.H)
#define _INTRPT_ 0x7f

/* This structure is used to hold status information
on a process */

struct _statusT
{
int kill_flag : 2;
int wait_flag : 2;
int fill_flag : 12;
};

/* This structure holds all of the information that
defines a processes context */

struct _simprocessT
{
struct _simprocessT *next;
struct _simprocessT *prev;
int init;
int proc_id;
struct _statusT status;
unsigned long start_time;
void *waitpost;
int reg_ax;
int reg_bx;
int reg_cx;
int reg_dx;
int reg_es;
int reg_ds;
int reg_si;
int reg_di;
int reg_bp;
int reg_sp;
int reg_ss;
int reg_cs;
int reg_ip;
int reg_flag;
};

typedef struct _simprocessT *pprocess;

/* this is the interprocess communications structure, for
communications and synchronization */

struct _postT
{
struct _postT *next;
struct _postT *prev;
char name[30];
int handle;
void *value;
pprocess *waiting;
};

/* Process header block list, which is circular */


struct _simprocessT *processlist;

/* This process responds to the keyboard */

struct _simprocessT *kbprocess;
int set_kb_process = 0;

/* Total number of process */

int totalprocs = 1;

/* List of all current posts */

struct _postT *postlist;

/* Total number of posts in the system */
int unsigned totalposts;

/* Storage for the original interrupt vector,
which is replaced by task_handler */

void interrupt (*old_vector)
();

/* Globals used by setup_task */
int _sim_init_val = 2;
int glbl_sp;
int glbl_ss;
int glbl_bp;
void (*glbl_fc)(void)=NULL;

/* Quick way to the system timer */

long unsigned *systimer = MK_FP(0x0040, 0x006c);

/* Simulation Time used by system */

long unsigned _sim_system_time;

long unsigned _last_update_time;

/* Simulation Time ratio used by system */

float _sim_time_ratio = 0;

/* Hidden routines handle installation of task
switcher, and actual task switching */

int install_handler(void);
void interrupt task_handler(unsigned bp, unsigned di,
unsigned si, unsigned ds,
unsigned es, unsigned dx,
unsigned cx, unsigned bx,
unsigned ax, unsigned ip,
unsigned cs, unsigned flags);
void sim_start(void);

/* Simulation time available to programmer */


long unsigned CurrentTime; /* Visible to programs */

/* Visible routine for process control */

int start_kb_process(void (*func)(void), int stacksize);
int start_process(void (*func)(void), int stacksize);
void stop_process(int process_id);
int my_process_id(void);

/* Visible routines for control of
process time utilivation */

int wait_until_time(long unsigned starttime);
int wait_for_time(float delaytime);

/* Sets the sim time to real time ratio - maximum running
speed. If this is set to 0.0, the system will run as
fast as possible */

int set_time_ratio(float ratio);

/* Visible routines for the handling of posts */

int init_post(char *postname);
int set_post(int posthandle, void *pointer);
void *get_post(int posthandle);
void *wait_post(int posthandle);

/* Exiting the system, and cleaning up the interrupts
reports the value of the exit condition before
exiting, useful for signalling certain errors */

void exit_processing(int condition);


Listing 3 (CSIM.C)
#include <alloc.h>
#include <bios.h>
#include <dos.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "csimint.h"

int start_kb_process(void (*task)(void), int stacksize)
{
int proc_id;
set_kb_process = 1;
proc_id = start_process(task, stacksize);
set_kb_process = 0;
return(proc_id);
}

int start_process(void (*task)(void), int stacksize)
{
char *stack;
struct _simprocessT *process;
int *off_seg = (int *)&stack;

unsigned int stackbytes;
int oldinit;

/* Check to see if tack handler has been installed */

if (getvect(_INTRPT_) != task_handler)
/* if it has not been installed, attempt to */

if (install_handler())
/* if it doesn't install, report err & exit */
{
printf("error installing task handler \n");
exit_processing(1);
}
if (stacksize > 0x0fdf)
/* Max stack size in paragraphs (SEGMENT counts) */
{
printf("stack size error: proc id : %d",
totalprocs);
exit_processing(2);
}
stackbytes = stacksize * 16;

/* Setup Process header and stack */

process = (struct_simprocessT *
)calloc(sizeof(struct _simprocessT) +
stackbytes, 1);
/* make sure there was enough memory */
if (process == NULL)
return(2);

stack = (char *)(process + 1);

/* 'Normalize' the bottom of stack pointer, that is
make sure the offset is less then 16, as would be
the case if the program where compiled in the huge
memory model */
if (off_seg[0] > 16)
for (; off_seg[0] > 16;
off_seg[1]++, off_seg[0] -= 16)
;
/* remember, the stack actually starts high and
builds its way down, so put the pointer at the top.*/
stack += stackbytes - 1;

/* finish setup of stack and base pointers */

process->reg_sp = FP_OFF(stack);
process->reg_ss = FP_SEG(stack);

process->reg_bp = FP_OFF(stack);

process->proc_id = totalprocs++;
process->init = 1;

/* Setup start jump */
process->reg_cs = FP_SEG(task);
process->reg_ip = FP_OFF(task);

/* link into current processes */
if (processlist == NULL)
{
processlist = process;
processlist->next = process;
processlist->prev = process;
}
else
{
process->prev = processlist;
process->next = processlist->next;
process->next->prev = process;
processlist->next = process;
processlist = process;
}
if (set_kb_process)
kbprocess = processlist;
/* make sure we can get back to where we were... */
glbl_ss = _SS;
glbl sp= _SP;
glbl_bp = _BP;
oldinit = _sim_init_val;

/* tell the task handler what routine the
 process uses */

glbl_fc = task;

/* tell the task handler that this proc is initing */

_sim_init_val = 1;

/* set up the new stak and base pointers and ... */
_SS = process->reg_ss;
_SP = process->reg_sp;
_BP = process->reg_bp;

/* have the task_handle do the rest */
geninterrupt(_INTRPT_);

_SS = glbl_ss;
_SP = glbl_sp;
_BP = glbl_bp;
glbl_fc = NULL;
_sim_init_val = oldinit;
processlist = processlist->prev;
return(0);

}

void stop_process(int process_id)
{
struct_simprocessT *process;
int go;

go = 1;
process = processlist;

while (go)

{
if (process_id == process->proc_id)
{ 

process->status.kill_flag = 1;
process->start_time = _sim_system_time;
go = 0;
}
if (process->next == processlist)
{
if (go)
printf(
 "\nattempted to stop non-existant proc\n");

go = 0;
}
else
{
process = process->next;
}
}
}

void sim_start(void)
{
_sim_init_val = 0;
_last_update_time = *systimer;
geninterrupt(_INTRPT_);
}

void exit_processing(int condition)
{
struct_simprocessT *pholder;
setvect(_INTRPT_, old_vector);
printf("Exit processing, code : %d\n", condition);
processlist->prev->next = NULL;
do
{
pholder = processlist;
processlist = processlist->next;
free(pholder);
} while (processlist != NULL);
exit(condition);
}

int install_handler(void)
{
void *new_vector;

_sim_init_val = 2;
old_vector = getvect(_INTRPT_);
setvect(_INTRPT_, task_handler);
new_vector = getvect(_INTRPT_);
if (new_vector == old_vector)
return(1);
return(0);
void interrupt task_handler(unsigned bp, unsigned di,
unsigned si, unsigned ds,
unsigned es, unsigned dx,

unsigned cx, unsigned bx,
unsigned ax, unsigned ip,
unsigned cs, unsigned flags)
{
int notfound;
struct_simprocessT *procpntr;

if (_sim_init_val == 2)
/* if this ever happens, it means that the task
handler was installed before it should have
been, so de-install it and stop the program */
{
exit_processing(10);
}
if (_sim_init_val == 0) /* sim_start has been run */
{
processlist->reg_bp = _BP;
processlist->reg_sp = _SP;
processlist->reg_ss = _SS;

if (processlist->status.kill_flag)
{
if (processlist == kbprocess)
/* You can,t kill the kbproc */
processlist->status.kill_flag = 0;
else
{
if (processlist->next == processlist)
/* no more procs */
exit_processing(20);

/* break list */
processlist->prev->next =
processlist->next;

/* move in front of break */
processlist = processlist->prev;

/* free dead proc */
free(processlist->next->prev);

/* fix break */
processlist->next->prev = processlist;
}
}
if ((kbprocess != NULL) && (bioskey(1)))
processlist = kbprocess;
else
{
if (_sim_time_ratio ! = 0.0)
{
for (notfound = 1; notfound;)
{
if ((kbprocess != NULL) &&
(bioskey(1)))
{
processlist = kbprocess;
notfound = 0;
}

else
{
processlist = processlist->next;
if ((*systimer -
_last_update_time) >
(1 /_sim_time_ratio))
{
_sim_system_time +=
(*systimer -
 _last_update_time) *
_sim_time_ratio;
_last_update_time = *systimer;
}
if ((processlist->start_time <=
 _sim_system_time) &&
(processlist->status.
 wait_flag == 0))
notfound = 0;
}
}
_sim_system_time = processlist->start_time;
}
else
{
if (processlist->status.wait_flag)
{
for (procpntr = processlist->next;
procpntr &&
(procpntr->status.wait_flag) &&
(procpntr->status.kill_flag);)
{
procpntr = procpntr->next;
if (procpntr == processlist)
procpntr = NULL;
}
if (procpntr == NULL)
{
fprintf(stderr, "waitlock\n");
exit_processing(1);
}
_sim_system_time =
procpntr->start_time;
}
else
_sim_system_time =
processlist->start_time;
procpntr = processlist->next;
while (procpntr != processlist)
{
if ((!procpntr->status.wait_flag) &&
(_sim_system_time >
procpntr->start_time))
_sim_system_time =
procpntr->start_time;
procpntr = procpntr->next;
}
processlist = processlist->next;
while ((processlist->status.wait_flag 
processlist->status.kill_flag) 

(processlist->start_time >
_sim_system_time))
processlist = processlist->next;
_last_update_time = *systimer;
}
}
CurrentTime = _sim_system_time / 18.2;
disable();
_SP = processlist->reg_sp;
_SS = processlist->reg_ss;
_BP = processlist->reg_bp;
enable();

if (processlist->init == 1)
/* process is in its 1st step, so initialize it */
{
ax = processlist->reg_ax;
bx = processlist->reg_bx;
cx = processlist->reg_cx;
dx = processlist->reg_dx;
es = processlist->reg_es;
ds = processlist->reg_ds;
si = processlist->reg_si;
di = processlist->reg_di;
bp = processlist->reg_bp;
ip = processlist->reg_ip;
cs = processlist->reg_cs;
flags = processlist->reg_flag;
processlist->init = 0;
}
}
else /* sim_start has not been run */
{
if (_sim_init_val == 1)
{
processlist->reg_ax = ax;
processlist->reg_bx = bx;
processlist->reg_cx = cx;
processlist->reg_dx = dx;
processlist->reg_es = es;
processlist->reg_ds = ds;
processlist->reg_si = si;
processlist->reg_di = di;
processlist->reg_ip = FP_OFF(glbl_fc);
processlist->reg_cs = FP_SEG(glbl_fc);
processlist->reg_flag = flags;
processlist->reg_bp = _BP;
processlist->reg_sp = SP;
processlist->reg_ss = _SS;
processlist->init = 1;
cs = cs;
ip = ip;
bp = bp;
}
}
}

int set_time_ratio(float ratio)
{

_sim_time_ratio = ratio;
return(0);
}

int my_process_id(void)
{
return(processlist->proc_id);
}

int wait_until_time(long unsigned starttime)
{
if (starttime < (_sim_system_time / 18.2))
{
printf("Waiting for a time that is past in %d",
processlist->proc_id);
exit_processing(-1);
}
wait_for_time(float)(starttime * 18.2 -
_sim_system_time) / 18.2);
return(0);
}
int wait_for_time(float delaytime)
{
if (_sim_time_ratio == 0)
{
processlist->start_time = _sim_system_time +
delaytime * 18.2;
}
else
processlist->start_time = _sim_system_time +
(long)(18.2 * delaytime);

geninterrupt(_INTRPT_);
return(0);
}

int init_post(char *postname)
{
struct _postT *post;

for (post = postlist; post != NULL; post = post->next)
{
if (strcmp((char *)&post->name, postname) == 0)
return(post->handle);
}
post = (struct _postT *)calloc(sizeof(struct _postT),
1);
if (post == NULL)
return(- 1);

post->handle = totalposts++;
post->value = NULL;
strcpy((char *)&post->name, postname);

post->waiting = calloc(1, sizeof(void *));
post->waiting[0] = NULL;

post->next = postlist;
post->prev = NULL;


postlist->prev = post;
postlist = post;
return(post->handle);
}
int set_post(int posthandle, void *pointer)
{
struct _postT *post;
struct _simprocessT *procpntr;
int loop;
for (post = postlist; post != NULL; post = post->next)
{
if (post->handle == posthandle)
{
if (post->value != NULL)
{
wait_for_time(0);
if (post->value != NULL)
return(post->handle);
}
if (post->waiting[0] != NULL)
{
post->waiting[0]->status.wait_flag = 0;
post->waiting[0]->start_time =
_sim_system_time;
loop = 0;
procpntr = processlist->next;
while (procpntr != processlist)
{
if ((procpntr->status.wait_flag) &&
(procpntr->waitpost == post))
{
post->waiting[loop] = procpntr;
loop++;
}
procpntr = procpntr->next;
}
post->waiting[loop] = NULL;
}
post ->value = pointer;
wait_for_time(0);
return(0);
}
}
return(-posthandle);
}

void *get_post(int posthandle)
{
struct _postT *post;
void *ret;

for (post = postlist; post != NULL; post = post->next)
{
if (post->handle == posthandle)
{
ret = post->value;

if (ret == NULL)

return(ret);

post->value = NULL;
return(ret);
}
}
return(NULL);
}

void *wait_post(int posthandle)
{
struct _postT *post;
void *ret;
int loop;

for (post = postlist; post != NULL; post = post->next)
{
if (post->handle == posthandle)
{
ret = post->value;

if (ret == NULL)
{
for (loop = 0; post->waiting[loop]; )
loop++;
post->waiting = realloc(post->waiting,
(loop + 2) *
sizeof(void *));
post->waiting[loop] = processlist;
post->waiting[loop + 1] = NULL;
while (ret == NULL)
{
processlist->status.wait_flag = 1;
processlist->waitpost = postlist;
wait_for_time(0);
ret = post->value;
post->value = NULL;
}
processlist->waitpost = NULL;
return(ret);
}
post->value = NULL;
return(ret);
}
}
return(NULL);
}


Listing 4 (bankdemo.c)
#include "csim.h"
#include <bios.h>
#include <conio.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void generator(void);
void tellor(void);
void ender(void);


int customers = 0;
int tellors = 1;

int main(void)
{
clrscr();
start_process(generator, 0x0040);

randomize();

start_process(tellor, 0x0040);
start_process(tellor, 0x0040);
start_process(tellor, 0x0040);

if (start_kb_process(ender, 0x0040))
exit_processing(3);
set_time_ratio(1000.0);
sim_start();
return(0);
}

void generator(void)
{
int post_id;
post_id = init_post("Customers");
while (1)
{
customers++;
set_post(post_id, &customers);
gotoxy(1, 1);
printf("customers = %d time %d", customers,
CurrentTime);
wait_for_time((rand() % (int)(4.5_MINUTES_)));
}
}

void tellor(void)
{
int number;
int post_id;
number = tellors++;
post_id = init_post("Customers");
while (1)
{
gotoxy(2, 1 + 2 * number);
printf("0");
wait_post(post_id);
if (customers)
{
gotoxy(2, 1 + 2 * number);
printf("1");
if (customers > 1)
set_post(post_id, &customers);
customers--;
gotoxy(1, 1);
printf("customers = %d time %ld", customers,
CurrentTime);
wait_for_time(4_MINUTES_+ rand() %

(int)(6_MINUTES_));
gotoxy(1, 1);
printf("customers = %d time %ld", customers,
CurrentTime);
}
}
}

void ender(void)
{
wait_for_time(8_HOURS_);
exit_processing(0);
}


Listing 5 (grfcdemo.c)
#include "csim.h"
#include <bios.h>
#include <conio.h>
#include <graphics.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
void mover(void);
void ender(void);

int color = 1;
int maxx, maxy;

int main(void)
{
int loop;
int gr_driver, gr_mode, gr_error;

clrscr();
randomize();

for (loop = 0; loop < 4; loop++)
if (start_process(mover, 0x0080))
exit_processing(loop);

if (start_kb_process(ender, 0x0040))
exit_processing(3);

gr_driver = DETECT;
gr_mode = VGAHI;

initgraph(&gr_driver, &gr_mode, "");
gr_error = graphresult();
if (gr_error != 0)
{
printf("%s \n", grapherrormsg(gr_error));
exit(1);
}
maxx = getmaxx();
maxy = getmaxy();

sim_start();
return(0);

}

void mover(void)
{
int mycolor;
int x, y, oldx, oldy, dir;
if (color == 7)
color += 2;
mycolor = color++;
color %= 16;
if (color == 0)
color++;
x = oldx = (rand() % (maxx - 10)) + 5;
y = oldy = (rand() % (maxy - 10)) + 5;
dir = rand() % 4;
while (1)
{
switch (rand() % 6)
{
case 0:
dir += 7;
dir = dir % 8;
break;
case 1:
dir++;
dir = dir % 8;
break;
case 2:
dir += 2;
dir = dir % 8;
break;
case 3:
dir += 6;
dir = dir % 8;
break;
default:
break;
}
switch (dir)
{
case 0:
x += 4;
if (x > (maxx - 5))
dir = 4;
break;
case 1:
y += 3;
if (y > (maxy - 5))
dir = 6;
x += 3;
if (x > (maxx - 5))
dir = 5;
break;
case 2:
y += 4;
if (y > (maxy - 5))
dir = 6;
break;
case 3:

x -= 3;
if (x < 5)
dir = 0;
y += 3;
if (y > (maxy - 5))
dir = 7;
break;
case 4:
x -= 4;
if (x < 5)
dir = 0;
break;
case 5:
y -= 3;
if (y < 5)
dir = 2;
x -= 3;
if (x < 5)
dir = 3;
break;
case 6:
y -= 4;
if (y < 5)
dir = 2;
break;
case 7:
x += 3;
if (x > maxx - 5)
dir = 4;
y -= 3;
if (y < 5)
dir = 3;
break;
}
setlinestyle(SOLID_LINE, 0, 3);
setcolor(BLACK);
line(oldx, oldy, x, y);
setlinestyle(SOLID_LINE, 0, 1);
setcolor(mycolor);
line(oldx, oldy, x, y);
oldx = x;
oldy = y;
wait_for_time(0);
}
}

void ender(void)
{
while (!bioskey(1))
wait_for_time(100);
closegraph();
exit_processing(0);
}






































































Standard C


Character Classification Functions




P.J Plauger


P.J. Plaunger has been a prolific programmer, textbook author, and software
enterprenuer. He is secretary of the ANSI C standards committee, X3J11, and
convenor of the ISO standards committee, WG14. His latest book is Standard C
which he co-authored with Jim Brodie.


Last month, I began the long trek through the Standard C library. I discussed
the header <assert.h>, how to use it and how it can be implemented. The next
stop on the journey, in alphabetical order at least, is the header <ctype.h>.
Here is what the C standard has to say about this header:


4.3 Character Handling <ctype. h>


The header <ctype.h> declares several functions useful for testing and mapping
characters. [Footnote: See "future library directions" (4.13.2).] In all cases
the argument is an int, the value of which shall be representable as an
unsigned char or shall equal the value of the macro EOF. If the argument has
any other value, the behavior is undefined.
The behavior of these functions is affected by the current locale. Those
functions that have implementation-defined aspects only when not in the "C"
locale are noted below.
The term printing character refers to a member of an implementation-defined
set of characters, each of which occupies one printing position on a display
device; the term control character refers to a member of an
implementation-defined set of characters that are not printing characters.
[Footnote: In an implementation that uses the seven-bit ASCII character set,
the printing characters are those whose values lie from 0x20 (space) through
0x7E (tilde); the control characters are those whose values lie from 0 (NUL)
through 0x1F (US), and the character 0x7F (DEL).]


Forward References: EOF (4.9.1), localization (4.4).




4.3.1 Character Testing Functions


The functions in this section return nonzero (true) if and only if the value
of the argument c conforms to that in the description of the function.


4.3.1.1 The isalnum Function




Synopsis


#include <ctype.h>
int isalnum(int c);


Description


The isalnum function tests for any character for which isalpha or isdigit is
true.


4.3.1.2 The isalpha Function





Synopsis


#include <ctype.h>
int isalpha(int c);


Description


The isalpha function tests for any character for which isupper or islower is
true, or any character that is one of an implementation-defined set of
characters for which none of iscntrl, isdigit, ispunct, or isspace is true. In
the "C" locale, isalpha returns true only for the characters for which isupper
or islower is true.


4.3.1.3 The iscntrl Function




Synopsis


#include <ctype.h>
int iscntrl(int c);


Description


The iscntrl function tests for any control character.


4.3.1.4 The isdigit function




Synopsis


#include <ctype.h>
int isdigit(int c);


Description


The isdigit function tests for any decimal-digit character (as defined in
2.2.1).


4.3.1.5 The isgraph Function




Synopsis



#include <ctype.h>
int isgraph(int c);


Description


The isgraph function tests for any printing character except space (' ').


4.3.1.6 The islower Function




Synopsis


#include <ctype.h>
int islower(int c);


Description


The islower function tests for any character that is a lower-case letter or is
one of an implementation-defined set of characters for which none of iscntrl,
isdigit, ispunct, or isspace is true. In the "C" locale, islower returns true
only for the characters defined as lower-case letters (as defined in 2.2.1).


4.3.1.7 The isprint Function




Synopsis


#include <ctype.h>
int isprint(int c);


Description


The isprint function tests for any printing character including space (' ').


4.3.1.8 The ispunct Function




Synopsis


#include <ctype.h>
int ispunct(int c);



Description


The ispunct function tests for any printing character that is neither space ('
') nor a character for which isalnum is true.


4.3.1.9 The isspace Function




Synopsis


#include <ctype.h>
int isspace(int c);


Description


The isspace function tests for any character that is a standard white-space
character or is one of an implementation defined set of characters for which
isalnum is false. The standard white-space characters are the following: space
(' '), form feed ('\f '), new-line ('\n '), carriage return ('\r '),
horizontal tab ('\t '), and vertical tab ('\v '). In the "C" locale, isspace
returns true only for the standard white-space characters.


4.3.1.10 The isupper Function




Synopsis


#include <ctype.h>
int isupper(int c);


Description


The isupper function tests for any character that is an upper-case letter or
is one of an implementation-defined set of characters for which none of
iscntrl, isdigit, ispunct, or isspace is true. In the "C" locale, isupper
returns true only for the characters defined as upper-case letters (as defined
in 2.2.1).


4.3.1.11 The isxdigit Function




Synopsis


#include <ctype.h>
int isxdigit(int c);


Description


The isxdigit function tests for any hexadecimal-digit character (as defined in
3.1.3.2).



4.3.2 Character Case Mapping Functions




4.3.2.1 The tolower Function




Synopsis


#include <ctype.h>
int tolower(int c);


Description


The tolower function converts an upper-case letter to the corresponding
lower-case letter.


Returns


If the argument is a character for which isupper is true and there is a
corresponding character for which islower is true, the tolower function
returns the corresponding character; otherwise the argument is returned
unchanged.


4.3.2.2 The toupper Function




Synopsis


#include <ctype.h>
int toupper(int c);


Description


The toupper function converts a lower-case letter to the corresponding
upper-case letter.


Returns


If the argument is a character for which islower is true and there is a
corresponding character for which isupper is true, the toupper function
returns the corresponding character; otherwise the argument is returned
unchanged.


History


Character handling has been important since the earliest days of C. Many of us
were attracted to the DEC PDP-11 because of its rich set of character
manipulation instructions. When Ken Thompson moved UNIX to the PDP-11/20, he
gave us a great vehicle for manipulating streams of characters in a uniform
style. When C came along, it was only natural that we should use it to write
programs preoccupied with walloping characters.
This was truly a new style of programming. C programs tended to be small and
devoted to a single function. The tradition until then was to write huge
monoliths that offered a spectrum of services. C programs read and wrote
streams of human-readable characters. The tradition until then was to have
programs communicate with each other via highly structured binary files. They
spoke to people by producing paginated reports with embedded carriage
controls.

Those of us who wrote character manipulation programs before C wrote mostly in
assembly language. A few of us more daring souls used FORTRAN as well. That
took dedication, however. FORTRAN had few facilities, and fewer standards, for
trafficking in characters.
So the early toolsmiths writing in C under UNIX began developing idioms at a
rapid rate. We often found ourselves sorting characters into different
classes. To identify a letter, we wrote
if ('A' <= c && c <= 'Z'
 'a' <= c && c <= 'z')
.....
To identify a digit, we wrote
if ('0' <= c && c <= '9')
..... 
And to identify white space, we wrote
if (c == ' ' c == '\t' c
== '\n')
.....
Pretty soon, our programs became thick with tests like this. Worse, some
became thick with tests almost like this. Opinions differed on the best way to
write a range test. Only a few diehards avoided the operators > and >= as
religiously as I still do. You can contrive to write the same idiom a number
of different ways. That slows comprehension and increases the chance for
errors.
Opinions also differed on the makeup of certain character classes. White space
has always suffered notorious variability. Should you lump vertical tabs in
with horizontal tabs and spaces? If you include new lines (which are actually
ASCII line feeds), should you also include carriage returns (which UNIX
reserves for writing overstruck lines)? The easier it is to get tools to work
together, the more you want them to agree on conventions.
The natural response was to introduce functions in place of these tests. That
made them at once more readable and more uniform. The idioms for letter, digit
and white space became
if (isalpha(c))
.....
and
if (isdigit(c))
.....
and
if (isspace(c))
.....
It wasn't long before a dozen-odd functions like these came into being. They
soon found their way into the growing library of C support functions. More and
more programs began to use them instead of reinventing their own idioms. The
character classification functions were so useful, they seemed almost too good
to be true.
They were. A typical text processing program might average three calls on
these functions for every character from the input stream. The overhead of
calling so many functions often dominated the execution time of the programs.
That led some programmers to back off from using the standard functions that
had evolved. It led others to develop a set of macros to take their place.
C programmers tend to like macros. They let you write code that is as readable
as calling functions but is much more efficient. You just have to be ready for
a few surprises:
The macro may expand into much more code than a function call, even if it
executes faster than the function call. If your program expands the macro in
many places, it can grow surprisingly larger.
The macro may expand to a subexpression that doesn't bind as tight as a
function call. This is an unacceptable surprise and always has been. A liberal
use of parentheses in the macro definition can eliminate such nonsense.
The macro may expand one of its arguments to code that is executed more than
once or not at all. A macro argument with side effects will cause surprises.
While some C programmers consider such surprises acceptable, modern practice
avoids them. Only two Standard C library functions, getc and putc, permit such
unsafe behavior.
So the challenge in those early days was to produce a set of macros to replace
the character classification functions. Because they were used a lot, they had
to expand to compact code. They also had to be reasonably safe to use. What
evolved was a set of macros that used one or more translation tables. Each
macro took the form:
#define isxxx(c) (ctyptab[c] & XXXMASK)
The character c indexes into the translation table ctyptab. Different bits in
each table entry characterize the index character. If any of the bits
corresponding to the mask XXXMASK are set, the character is in the tested
class. The macro expands to a compact expression that is nonzero for all the
right arguments.
One drawback to this approach is that the macro generates bad code for all the
wrong arguments. Expand it with an argument not in the expected range and it
accesses storage outside the translation table. Depending on the
implementation, the error can go undetected or it can terminate execution with
a cryptic message.
On a machine that represents type char the same as signed char, this is a
common error. The function call isprint(c) looks safe enough. But say c has
type char and holds a value with the sign bit set. The argument will be a
negative value almost cetainly out of range for the function. Few programmers
know to write the safer from isspace((unsigned char)c).
Nevertheless, translation tables remain the basis for many modern
implementations of the character classification functions. They help the
implementor provide efficient macros, even in the presence of multiple
locales. And these functions remain important to the modern C programmer. You
should use them wherever possible to sort characters into classes. They
greatly increase your chances of having code that is both efficient and
correct across varied character sets.


Character Classifications


Classifying characters is not as easy as it appears. First you have to
understand the classes. Then you have to understand where all the common
characters live within the class system. Then you have to decide where to tuck
the less than common characters. Then you need some understanding of how
everything changes when you move to an implementation with a different
character set. Finally, you need to be aware of how the classes can change
when the program switches out of the "C" locale.
Let's start at the beginning. The classes defined by the character
classification functions are:
digit -- one of the ten decimal digits '0' through '9'
hexadecimal digit -- a digit or one of the first six letters of the alphabet
in either case, 'a' through 'f' and 'A' through 'F'
lower case letter -- one of the letters 'a' through 'z', plus possibly others
outside the "C" locale
upper case letter -- one of the letters 'A' through 'Z', plus possibly others
outside the "C" locale
letter -- one of the lower case or upper case letters, plus possibly others
outside the "C" locale
alphanumeric -- one of the letters or digits
graphic -- a character that occupies one print position and is visible when
written to a display device
punctuation -- a graphic character that is not an alphanumeric, including at
least the 29 such characters used to represent C source text
printable -- a graphic character or the space character ' '
space -- the space character ' ', one of the five standard motion control
characters (form feed, newline, carriage return, horizontal tab, or vertical
tab), plus possibly others outside the "C" locale
control -- one of the five standard motion control characters, backspace,
alert (or bell), plus possibly others.
Note that two of these classes are open-ended even in the "C" locale. An
implementation can define any number of additional punctuation or control
characters. In ASCII, for example, punctuation also includes characters such
as @ and $. Control characters include all the codes between decimal 1 and 31,
plus the delete character, whose code is 127.
If you find all these classes confusing, take heart. So do I. I need a diagram
to sort them all out. Figure 1 (taken from P.J. Plauger and Jim Brodie,
Standard C, Microsoft Press, 1989) shows how the character classification
functions relate to each other. 
The characters in the rounded rectangles are all the members of the basic C
character set. These are the characters you use to represent an arbitrary C
source file. The C standard requires that every target character set contain
all of these characters. Every target character set must also contain the null
character, whose code is zero.
I have added single and double plus signs under some of the function names. A
single plus sign indicates that the function can represent additional
characters outside the "C" locale. A double plus sign indicates that the
function can represent additional characters even in the "C" locale. 
A target character set can contain members that fall in none of these classes.
The null character is best left out of all classes, for example. The same
character must not, however, be added at more than one place in the diagram.
If it is a lower case letter, it is of course also in several other classes by
inheritance. But a character must not be considered both punctuation and
control, for example.
As you can see from the diagram, nearly all the functions can change behavior
in a program that alters its locale. Only isdigit and isxdigit remain
unchanged. If your code intends to process the local language, this is good
news. The locale will alter islower, for example, to detect any additional
lower case letter. 

If your code endeavors to be locale independent, however, you must program
more carefully. Supplement any tests you make with the character
classification functions to weed out any extra characters that sneak in. Or
get all your locale independent testing out of the way before your program
changes out of the "C" locale.
If neither of these options is viable, you may have to revert part or all of
the locale for a region of code. Begin the region with
#include <locale.h>
#include <stdlib.h>
#include <string.h>
.....
char *ls = setlocale(LC_TYPE, "C");
if (ls)
{
char *ss = malloc(strlen(ls) + 1);
ls = strcpy(ss, ls);
}
And end the region with
if (ls)
{
setlocale(LC_CTYPE, ls);
free(ls);
}
If the region contains no calls to setlocale, you can eliminate the code that
allocates, copies and frees the locale string. If the region is large,
however, or if the code will be maintained by others in the future, play it
safe. You are better off making the code robust than saving a few
microseconds.
The important message is that Standard C introduces a new era. You can now
write code more easily for cultures around the world, which is good. But you
must now write code with more forethought. If it can end up in an
international application, it may someday process characters undreamed of by
early C programmers. Trust the character classification functions to contain
the problem, to help you with it, and to delineate what can change.


Summary


I've reviewed the evolution of the character classification functions in the
Standard C library. I've shown you how they relate to each other. And I've
indicated how the functions can change between implementations and between
locales.
Next month, I will discuss implementation issues for the functions and macros
defined in <ctype.h>. I will also present code for the header and the
functions. None of it is complex, but keeping it portable and adaptable to
changing locales is a delicate matter. Stay tuned. 




































Doctor C's Pointers(R)


Puzzles, Part 1




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementors of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091
or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.


If you have a solid background in one of various high-level languages, you can
come to grips with much of C relatively quickly. However, there are quite a
few "dark corners" of the language, preprocessor and library. In this issue,
I'll begin a series of articles that examines a few of these darker corners
involving functions. I've included the answers to the problems presented, but
as a check to see how you are doing try to answer them on your own. (Even if
you are a seasoned assembly language programmer, C still has enough
idiosyncrasies of its own that you must master.)
Another reason for my presenting these puzzles is that they provide lots of
interesting but unrelated bits of information. As such, they don't necessarily
warrant a column of their own.


The Puzzles


1. If there is no function prototype in scope when a function is called, what
does the compiler do? Why is it a bad idea to omit function declarations when
those functions are called?
2. What is the order of evaluation in the expression a() - b() * c()?
3. Try compiling the following program. Explain why it runs without error.
#include <stdio.h>

main()
{
/*1*/ printf("Hello\n");
/*2*/ (printf)("Hello\n");
/*3*/ (*printf)("Hello\n");
/*4*/ (**printf)("Hello\n");
/*5*/ (***printf)("Hello\n");
}
4. Given the declaration int i, j = 2;, describe everything you can about
expression i = (*v[++j]) (j).
5. When function f is called, what is the type of each argument actually
passed to it? How must the function f be defined for the linkage to work
reliably?
void f(char, short, ...);
char c;
short s;
float f;

f(c, s, f);
6. Given int f(void);, what is the value of the following expressions?
sizeof(f)
sizeof((f))
sizeof(f())
sizeof(&f)


The Solutions


1. When the compiler comes across a call to a function for which there is no
function prototype in scope, it assumes that function has a return type of int
and generates code accordingly. If the function actually has some other return
type the behavior is undefined. The program might work, for example, where a
long was returned and types long and int have the same representation.
However, if an eight-byte double were returned and an int has four bytes, the
return value would be misinterpreted.
Since the compiler sees no prototype, it has no information about the number
and type of arguments expected. As such, it cannot diagnose invalid argument
lists nor implicitly convert argument types as appropriate. Furthermore, if
the function actually expects a variable number of arguments, ANSI C states
that a call to it without a prototype (containing the ellipses notation) in
scope, results in undefined behavior.
Any arguments having "narrow" type (char, short, or float) are converted to
their "wide" equivalents (int, int, or double, respectively.) Of course,
unsigned narrow types are converted to their unsigned wide equivalents.
A few compilers and static analysis checkers (i.e., lint) issue an
informational message when an undeclared function is called. This is a
valuable capability, and it usually points out the omission of the
corresponding #include directive, a problem that would otherwise go undetected
and potentially require hours of debugging.
2. There is considerable misunderstanding about operator precedence and order
of evaluation. The vast majority of C programmers I have come across think the
two aspects are related if not the same. They are NOT!

Regarding precedence, by inserting the implied grouping parentheses, the
expression a() - b() * c() becomes a() - (b() * c()). That is, the
multiplication operator has higher precedence than the subtraction. However,
this simply tells the compiler how to group subexpressions when it is building
its parse tree. Specifically, it provides no information as to the order in
which the three functions are called. That is up to the order of evaluation, a
property specific to each operator.
In short, only five operators (&&, , ?:, (), and comma) give any guarantee
about order of evaluation of their operands, and multiplication and
subtraction are not in that set. For all other operators the order of
evaluation of their operands is undefined. That is, the order need not be
documented and it need not be reproducible even in different places in the
same source file during the same compilation.
The bottom line is that the order in which the three functions are called is
undefined. In fact, unless the functions somehow interact with each other via
global data or pointers to local data, their order of calling won't affect you
and that is by far the most common case.
Adding grouping parentheses to an expression can never change the evaluation
order of subexpressions.
3. All of the calls to printf are equivalent. Here's why. The function call
operator requires a postfix expression to precede it. This expression must
denote the function to be called. In the first case, this expression is simply
printf, the function's name. In case 2 we have the same situation since a
parenthesized expression has the same type and value as the unparenthesized
expression.
An expression that designates a function is converted to the address of that
function except when it is the operand of the function call operator (amongst
other places). In cases 3-5, this expression is the operand of the indirection
operator and, therefore, it is converted to a function pointer. Then, that
pointer is dereferenced giving a function designator expression. And in cases
4 and 5, this expression is again converted to a pointer which is again
dereferenced, etc. In all cases we eventually finish up with an expression
that designates a function and is the operand of the function call operator
resulting in that function being called.
4. The expression (...)(j) is a call to some function with one argument. And
although an int expression is passed in, without seeing the prototype for that
function we don't know if the int will be passed as is or converted to
something like a long, for example. Similarly, we don't know the function's
return type but for the example to compile, the return type must be
assignment-compatible with int since that's the type of the object it's being
assigned to. (This requires the return type to be one of the arithmetic
types.)
The expression *v[++j] designates the function to be called. Therefore, v[++j]
is a pointer to that function and v either points to an element of an array of
functions or is the name of such an array. Since j uses the prefix ++
operator, the function called is that pointed to by v[3]. However, the value
of the argument passed to that function is undefined. (It would either be 2 or
3.) The order in which a function's arguments are evaluated is undefined. And
whether the function designating expression is evaluated before or after the
argument list (or even in between arguments), is also undefined.
On the other hand, the expression (*v[j]) (++j) always results in 3 being
passed to the function, however, it is undefined as to which function is
actually being called -- it could be the one pointed to by either v[2] or
v[3].
5. With the introduction of function prototypes, ANSI C provides a way for
compilers to deal with narrow types directly, without widening. In this
example, the prototype void f(char, short, ...); indicates that given narrow
types for the first two arguments, the compiler is permitted to not widen them
as traditionally required. That is, it is up the implementor whether they are
widened or not. ANSI C does not require that they be kept narrow. A compiler
may chose to widen both, widen neither or widen one but not the other. As long
as you define the function exactly the same using the new definition style
(including the ellipses), you are guaranteed the caller and callee will agree.
Whatever the compiler's strategy, it must be the same for compiling the
prototype and the corresponding function definition. As such, the strategy is
transparent to the programmer. (Of course, you must find out what that
strategy is if you are writing the called function in some language other than
C.)
In the case of the third and subsequent arguments, ANSI C requires they be
widened (without exception) as defined by K&R.
See what your compiler's strategy is in this case. Is it documented or did you
have to work it out for yourself?
6. The sizeof operator computes the size of an object of a given type. Since
it requires an operand (either an expression or type name) having an object
type, it cannot handle function or incomplete types. As such, sizeof(f) is
invalid just as sizeof(int ()) is, assuming the declaration int f();.
Similarly, sizeof ((f)) is also invalid since a parenthesized expression has
the same type and value as the unparenthesized expression.
In Solution 3, we learned there are instances in which function designating
expressions are not converted to function pointers. One such situation is when
they are the operand of the sizeof operator. Therefore, sizeof(f) does not
produce the size of a pointer to such a function, as some substandard
compilers have reported in the past.
In sizeof(f()), the size of the function's return type is determined. This is
possible provided the return type is not incomplete (as is the case with
void). The function, could however, return a pointer to an incomplete type.
In the final case, sizeof(&f), the type of the operand is pointer to function
returning some object or incomplete type, and that is permitted. However, when
I tested this, several compilers, including some claiming ANSI C conformance,
incorrectly rejected this expression apparently because they treat the & as
being redundant. As far as I can tell, it always is except in this one case.
Next month we'll continue with this series by looking at arrays and
subscripting.














































Questions & Answers


More On Reading Bytes As ints, Dynamic Linking On PCs




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707.


You may fax questions for Ken to (919) 493-4390. When you hear the answering
message, press the * button on your telephone. Ken also receives email at
kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP).


Q
I would be very glad if you could help me with a preprocessor problem.
I am trying to keep all external variables in just one .H file. The
preprocessor should then produce the appropriate declarations in the main file
and the external references in all other modules. This works just fine so far
(Listing 1).
The problem arises as soon as I would like to initialize an external variable
in the .H file. A program line in the .H file like
EXTERN i = 5 ;
leads to a link error saying "symbol defined more than once" (MSC 5.1). It
would be nice collecting all global variables in just one
int i = 5 ;
in the main file and
extern int i ;
in all other modules.
One solution could be :
#ifdef MAIN
int i = 5 ;
#else
extern int i ;
#endif
but it isn't very elegant.
Andreas Lang
A
There is no write-once solution. In this regard, variable definitions and
declarations are similar to functions and prototypes; you need to repeat the
prototype, even though the function already is listed.
I don't try to initialize in a header file, but I do include a copy of the
header file in the source file where the variable is declared. For example,
the header file would be:
"extern.h"
extern int i; /* External reference */
and the main source file would have:
#include "extern. h"
int i = 5; /* External definition */
You can have both an extern reference and a definition in the same source file
and the compiler should check for agreement between them.
Q
Using C library functions, how can I show on the screen the ASCII character
for each of these seven control codes or hexadecimal values?
BEL 07 CR 0d
BS 08 SUB 1a
HT 09 ESC 1b
LF 0a
I want to display the seven characters the way printed charts picture them.
But with printf(), for example, I beep, backspace, tab, etc.
Any suggestions would be welcome.
Dale Wharton
Montreal QC
A
The answer to your question depends on what you mean by "C library functions".
ANSI standard library functions will not allow you to display these
characters. Therefore, you must use compiler-dependent functions. Many
compilers provide a version of the putch() function that will write directly
to the screen (using the BIOS) without going through DOS. The function you
need may be named conout() or something similar.
If your compiler does not have such a function, but does have a function that
allows you to call interrupts (it might be called sysint or int86 or intcall),
you can build your own conout(). When calling the interrupt function, invoke
INT 0X10 and set the AH register to 10 (output a character), AL to the
character to output, CX to 1 (number of characters to output) and BH to 0
(display page -- normally 0). Failing that, you could write the character
directly into screen memory, using a properly initialized far pointer.



Readers' Replies:




Bytes Into ints


In the July 1990 of The C Users Journal, Larry Myers of Raleigh, North
Carolina wrote you asking about stuffing character values into declared
integer variables. You suggested he try the memcpy() function or bit shifting
to achieve his ends.
Using the memcpy() function brings a considerable amount of overhead into his
program to do something so basic. And while bit shifting does the job, the
resulting code is not self explanatory, nor is it portable across platforms
that may use more than 16 bits as the size of an integer. Finally, neither
method provides a sensible way of retrieving either of the newly stuffed
character values from the integer storage location.
I might have suggested the use of a union instead. It uses no extra overhead
as does a function, it is immediately understandable to anyone having to
maintain the code later, it is independent of either character or integer
size, and it provides easy access to the character values after storage.
I have provided a short test program (Listing 2) and the associated output to
illustrate. The union sets up memory storage to accommodate both an integer
type variable and a character type variable, declared as an array of size two.
In the IBM compatible world, this translates into a single two-byte memory
allocation, which is defined as type uSTUFF. The two variable types overlap
each other perfectly, and the union allows us to access either component
individually, as well as readably.
One final thing Larry might remember is that the Intel 80X86 CPUs work with
bytes of data in what is called "big endian" format (also referred to as a
"left-to-right" architecture). Computers conforming to the "byte-addressable"
model (and IBM PCs and compatibles do so) assume that the address of a
variable type is the same as the address of the "first" byte of the variable,
which is the byte lowest in memory (closest to memory location zero).
In "big endian" format, the first byte of a variable is the highest order
byte, which means that when we look at the variable in two-byte chunks, we see
the lower order byte on the left and the higher order byte on the right. This
is only important if Larry wants to treat the newly created stuffed union as
an integer for calculation purposes. If so, then he must watch the order in
which each character is stuffed into the union.
Rod Hutson
Edmonton, Alberta, Canada
Thanks for your reply. I don't condone trying to play games with chars and
ints. As you mention, it can be inherently nonportable. Although memcpy does
bring in extra overhead, if you're only going to do it a few times, it won't
hurt. For example, to match your listing, I would use memcpy (&iInteger, "AB",
2) to copy the two bytes. If you are going to be doing this game a lot
throughout your code, then using the union as you propose is a good idea. (KP)


Printers


I am an avid reader of The C Users Journal, and have been for a good three
years now. As such I have always enjoyed your question and answer column.
However, having just read the June 1990 issue I must make a comment on your
column, as it seems to me that one of your answers is in error.
First regarding the proper C statements to cause formatted output to go to the
line printer: your answer gives the program in Listing 3 and states that on
MS-DOS that the line printer device is called LST.
Although my experience with MS-DOS is actually with IBM PC- DOS (I have used
several versions of it over the years on several different machines --
currently using PC-DOS 3.30 on a DTK PC/XT clone) rather than a generic MS-DOS
-- I have not run into one which calls the printer device by that name. All
that I have seen call it PRN, although you can also use LPT1, or LPT2 or LPT3
if you wish to be specific.
Anyway, I was puzzled by this so I tried your program with both Turbo C 2.0
and Microsoft QuickC 2.0 and found that both of them write their output to a
file named LST, as I expected. If one opens the device PRN, however, the
output does go to the parallel printer port, as desired:
fopen ("PRN", "w");
A further "improvement" might be to dispense with the fopen and fclose calls
entirely. This is easily done on MS-DOS systems (at least when using Turbo C
or MS C) by writing to the standard printer device, stdprn:
fprintf (stdprn, "The answer is %10d", z);
Both Turbo C and MS C provide an open FILE device by this name which points to
the PRN device. The resulting program then becomes Listing 4.
Please do not take this comment as being unduly critical of your column -- it
is always very educational and well worth reading! (Besides this is probably
about the hundredth such letter on this subject that you will receive!)
Frederick C. Smith
Stoneham, MA
Mea culpa. Your's is not the only letter I've received. The LST name just came
up in my memory slot and sounded so right. It happens that's the name on some
compilers on some other systems. (Reader's are invited to guess which ones.)
The MS-DOS device reserved names (all of which can be used in an fopen) are:
CON console
AUX first serial port
COM1 " " "
COM2 second serial port
LPT1 first parallel printer
PRN first parallel printer
LPT2 second parallel printer
LPT3 third parallel printer
NUL dummy device
Although stdprn is "semi-standard", it's not ANSI standard, and therefore I
tend to shy away from it. In addition to being ANSI incompatible, using stdprn
makes it harder to change the output device. Listing 5 shows how the function
could be written using a variable passed to fprintf, rather than stdprn. This
listing permits redirection to either the printer, a disk file (or other
device listed above), or to the NUL file (a dummy device -- all output written
to a NUL file is thrown away).
Listing 6 shows a portion of Listing 5 rewritten to use the stdprn value that
"might" be in <stdio.h>.
Along this same line, for some of our readers, let me review a little about
file pointers. The ANSI standard states that there are three predefined file
pointer values:
stdin standard input /* Default: keyboard */
stdout standard output /* Default: screen */
stderr standard error /* Default: screen */
These three flies (or devices) are already opened when your C program begins.
When you call printf, you are writing to stdout.
printf("Hello world\n");
is the same as:
fprintf(stdout, "Hello world");
When you call scanf, you are reading from stdin.
int i;
scanf("%d",&i );
is the same as:

int i;
fscanf(stdin, "%d", &i);
To write to the standard error output, you need to call fprintf, as:
fprintf(stderr, "This is an error");
The stderr output is set to the screen and under MS-DOS cannot be redirected.
It should only be used for those messages which are absolutely critical to be
output. (KP)
I refer to the Q and A column in the July, 1990 C Users Journal. Part two of
the first question asks for a method of addressing a printer other than using
stdprn. Your answer was, under MS-DOS, to use the MODE command to designate a
printer as either LPT1 or LPT2.
However, what do you do if you have two printers attached to one computer and
you want to select them individually in your program? The answer was given to
me by one of the technical people at Borland. You can select a printer, in
your program, as follows:
FILE *fp1;
FILE *fp2;

fp1 = fopen("LPT1", "wb");
fprintf(fp1,"Hello printer ONE!");
fclose(fp1);
If two printers are connected you can send output to the second printer with:
fp2 = fopen("LPT2", "wb");
fprintf(fp2,"Hello, printer TWO!");
fclose(fp2);
As you can see each printer can be addressed as you would any file. In this
case the file names are LPT1 and LPT2.
Jack Steiner
Cedar Grove, New Jersey
Thanks for your reply. This is another possible reason for not using stdprn.
The printers could be opened in either binary wb mode or text w mode,
depending on how the printers are set up. The binary mode would not translate
\n characters to CR/LF combinations, while the text mode would. (KP).
In reference to your reply to Bill Whitcraft (The C Users Journal, Volume 8,
No.6, page 73).
Bill used WRITELN in Pascal, which automatically ends the line with a carriage
return. C's printf( ) does not do this, so Bill should be advised to include
the \n escape character.
John Deurbrouck
Mountlake Terrace, Washington
Thanks. Make that code:
fprintf (file_printer, "The answer is %10d\n", z);
(KP)


Obfuscated Code


I was wondering where I could send a submission for the Obfuscated Code
contest sponsored by The C User's Journal.
The code in Listing 7 is a validation test for a code object in the voice
processing metalanguage VOCAL (Voice Oriented Computer Application Language).
In English, the code in Listing 7 performs the following operation:
If the block under test {blk[BN]} claims to be referenced by an invalid block
or ...
If the referencing block {blk[blk[BN]->pt.parent]} has been deleted or...
The referencing block is not a transfer class and ...
The block under test is not in the global mode and...
The referencing block is not in the global mode and...
The referencing block is not in the same mode as the block under test.
Mark the block under test for deletion.
Well, it made sense at the time and, for some reason we can't fathom, actually
works.
Mark Assousa
Charlotte, NC
Thanks for the contribution. This is a "pretty-printed" version of your
listing. I had to be sure the parentheses matched up.(KP)
[Editor's Note: We don't really "sponsor" the contest, though Don Libes does
report the results each year. The contest is run by Landon Noll and Larry
Bassel. To enter, send email to ...!amdahl!obfuscate or surface mail to Landon
Curt Noll, Amdahl Corp., 1250 E. Arques Ave. M/S 316, P.O. Box 3470,
Sunnyvale, CA, 94088-3470. Please note that entries must be in a certain
format. See the Jury 1989 issue of CUJ for details.]


FORTRAN And C


In response to Steve Nelson's letter in the June issue, about linking FORTRAN
to C, there are several ways around this problem. (1) translate, (2) use
externals, (3) buy compatible C and FORTRAN compilers and (4) have the client
buy NKR Research's FORTRAN.
My personal favorite is to use a FORTRAN to C translator. You can have the
client do the translation or you can do it yourself. If you have the client do
the translation you can lose control of the quality, but then you don't have
to sign a lot of secrecy papers.
The big draw back of using externals is that you will have to write a lot of
new code. And you will have to do a lot of coordinating between you and your
client. But you should have to do it only once.
There are two companies that have compatible FORTRAN and C compiler that I
know of -- Micro Way and Watcom. They are designed for the 32-bit processor.
NKR Research claims that you can call from other languages without interfacing
subroutines. I also got a similar response from Borland's Tech staff.
Eric Teeter

Brooklyn, Wisconsin


MS-DOS to UNIX Files


The C Users Journal is great -- one of my favorites, and I always enjoy your
column! I have one comment however. Your reply to Jaspal Singh's question
(Q?/A? June 1990) about extraneous carriage returns (Control-Ms) in MS-DOS
files when they are moved onto a UNIX system was accurate and informative, but
I think that there is a better solution than the little program given. Most
UNIX shells provide a translate function (tr) that should take care of this
quickly and efficiently. The command:
tr -d '\015' < msdos.file > unix.file
will delete all Control-Ms from the MS-DOS file. And the sequence:
tr -d '\015' < msdos.file tr '\032' '\004' >
unix.file
should combine that with translation of the Control-Z (MS- DOS EOF) to
Control-D (I have not actually tried this!). My apologies for the
"bang-dash-dot-backslash" syntax so typical of UNIX. Some of your readers can
probably point out a way of translating both characters in a single translate
command that makes use of some arcane "feature" of the shell.
Ahhhh....isn't UNIX wonderful!! Perhaps there should be an "Obfuscated UNIX
Command Line" Contest??
Roger Hanscom
Livermore, CA
In reference to your reply to Jaspal Singh, if the program is converted on the
MS-DOS machine, here is a simpler solution! (Listing 8)
I hope that this is useful to you.
John Deurbrouck
Mountlake Terrace, Washington
Deurbrouck's program opens the input file in text mode and writes the output
file in binary mode. On text mode input, CR/LF combinations are converted to
single \n. On binary mode output, there is no conversion of the \n. His
program does not remove the EOF (control-Z) character found in MSDOS text
files. Depending on how you move the file, the EOF may not be a problem. (KP)


Dynamic Linking


Your column in the July 1990 C Users Journal had a request for dynamic linking
of C code on the IBM PC. The TopSpeed C compiler provides this in the current
version; I wish Borland and Microsoft would follow suit. There is also an
English compiler suite (Fortran and other languages) that I saw reviewed
recently that has something similar; I'm at home now, and don't recall the
exact company name offhand. The compiler review might have been in a recent
issue of Computer Language.

Listing 1
/*--- test.h ---------------------------------------------*/
#ifdef MAIN
# define EXTERN
#else
# define EXTERN extern
#endif

EXTERN int i ;

/*--- test1.c (main file) --------------------------------*/

#define MAIN

#include "test.h"

void main (void) ;
void test2 (void) ;

void main(void)
{
i = 3 ;
test2 () ;
}

#undef MAIN

/*--- test2.c --------------------------------------------*/
#include "test.h"

int printf(const char *, ...);


void test2 (void)
{
printf("i is %d\n", i) ;
}
/* -------------------------------------------------------*/


Listing 2
#include <stdio.h>

main()
{
int iInteger;
char cChar1, cChar2;

typedef union stuff
{
int iInt;
char cChar[2];
} uSTUFF;

uSTUFF uStuff;

iInteger = OXFFFF;

cChar1 = 'A'; /* 0X41 in ASCII Char Set */
cChar2 = 'B'; /* 0X42 in ASCII Char Set */

printf ("\nstuff(): raw iInteger value is %x",
iInteger);

uStuff.iInt = iInteger;

printf ("\nstuff(): uStuff integer value is %x",
uStuff.iInt);

uStuff.cChar [0] = cChar1;

printf ("\nstuff(): uStuff integer value is %x",
uStuff.iInt);

uStuff.cChar[1] = cChar2;

printf ("\nstuff(): uStuff integer value is %x",
uStuff.iInt);

}

Output from above stuff() program:

stuff(): raw iInteger value is ffff
stuff(): uStuff integer value is ffff
stuff(): uStuff integer value is ff41
stuff(): uStuff integer value is 4241


Listing 3
/* Write to a printer -- the correct way */


#include <stdio.h>

function()
{
FILE *file_printer; /* pointer to a file */
int x, y, z;

x = 5;
y = y;
z = x + y;

/* open the printer */
file_printer = fopen ("PRN", "w");

/* print the line */
fprintf (file_printer, "The answer is %10d", z);

/* close the printer */
fclose (file_printer);
}


Listing 4
/* Write to printer, using stdprn */

#include <stdio.h>

function()
{
int x, y, z;

x = 5;
y = 6;
z = X + y;

fprintf (stdprn, "The answer is %10d", z);
}


Listing 5
/* Write to printer, file, or nowhere */

#include <stdio.h>
#define GO_TO_PRINTER 1
#define GO_TO_DISK_FILE 2

#define PRINTER_DEVICE "PRN"
#define NULL_DEVICE "NUL"
/* Keep these here for ease of change to other systems */

print_function(where_to_go, filename)
/* Prints on printer or a file */
int where_to_go; /* Where to print */
char *filename; /* Name of file (if not printer) */
{
FILE *file_printer; /* pointer to a file */
int x, y, z;

x = 5;

y = y;
z = x + y;

/* open the device */
if (where_to_go == GO_TO_PRINTER)
file_printer = fopen (PRINTER_DEVICE, "w");
else if (where_to_go == GO_TO_DISK_FILE)
file_printer = fopen(filename, "w");
else
/* Dump to a Nul file */
file_printer = fopen(NULL_DEVICE,"w");

/* print the line */
fprintf (file_printer, "The answer is %10d", z);

/* close the printer */
fclose (file_printer);
}


Listing 6
/* Write to printer, file, or nowhere using stdprn */

if (where_to_go == GO_TO_PRINTER)
file_printer = stdprn;
else if (where_to_go == GO_TO_DISK_FILE)
file_printer = fopen(filename, "w");
else
/* Dump to a Nul file */
file_printer = fopen(NULL_DEVICE,"w");

/* print the line */
fprintf (file_printer, "The answer is %10d", z);

/* close the printer, if it is not the real printer */
if (where_to_go != GO_TO_PRINTER)
fclose (file_printer);

}


Listing 7
/* check parent block */
if (
blk[BN]->pt.parent < 1 
blk[blk[BN]->pt.parent] == NULL 
(
(int) blk[blk[BN]->pt.parent]->u.fn < FN_XCLASS &&
(int) blk[BN]->pt.mode != 0 &&
(int) blk[blk[BN]->pt.parent]->u.mode != 0 &&
(int) blk[BN]->pt.mode !=
(int) blk [blk [BN]->pt.parent]->u.mode)
)
)
{
del = TRUE:
}



Listing 8
#include <stdio.h>
#include <stdlib.h> /*needed for exit() */

#define BUFSIZE 1024 /*maximum line size */
char buffer [BUFSIZE];

void main (int argc, char *argv[]); /* ANSI prototype */

void main(int argc,char *argv[])
{
FILE *in, *out;
if((in = fopen(argv[1],"rt")) == NULL)
{
printf("Could not open %s for input\n", argv[1]);
exit(1);
}
if((out = fopen (argv[2], "wb")) == NULL)
{
printf("Could not open %s for output\n",argv[2]);
exit(1);
}
while (!feof(in) && !ferror(out) && !ferror(in))
{
if(fgets(buffer, BUFSIZE, in))
fputs(buffer,out);
}
exit(0); /* closes file */
}


































On The Networks 


Controversy Brews On The Nets




Sydney S. Weinstein


Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and president
of Datacomp Systems, Inc., a consulting and contract programming firm
specializing in databases, data presentation and windowing, transaction
processing, networking, testing and test suites and device management for UNIX
and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron
Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the
Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do
Internet addressing).


All is not well, once again, in the land of the net. The current fervor over
censorship is impinging on the net.
Colleges and universities that function as archival sites on the net are
thinking twice about restricting access even further.
These restricitions do not affect the mainstream comp.sources.unix and
comp.sources.misc, but they are affecting alt.sources and the entire alternet
hierarchy. comp.sources.games might be next since it also could be construed
as improper research material.
Although this controversy appears to only affect the "Internet," since so much
of USENET now uses NNTP and the "Internet" as routing and delivery agents,
these access restrictions will eventually affect all net users.


An Update To 'What's The Net'


I have received several pieces of mail lately asking for the phone number to
the USENET BBS so users could download some of the software described in this
column. Unfortunately, no such phone number exists, because USENET Network
News is not a BBS.
USENET is a collection of computers interconnected through various networks.
USENET relies on some of the high-speed networks that make up the Internet.
The other networks are just lower speed dial-up connections using modems. In a
BBS, a single computer contains all the files, and users dial in to the
computer to access downloading information. Unlike BBS, USENET Network News
broadcasts an entire collection of files to all machines connected via a flood
fill algorithm. In USENET, a single computer, which is connected to several
other computers, receives a file (article) from one of its connections and
stores the file locally. The computer then sends the file to all the other
connections that have not received it. Thus a file is flooded across the
network. This actually is a pretty quick process, with most sites receiving an
article within eight hours of its posting anywhere in the world. (My site
seems to average about 15 to 20 minutes from posting to receipt on most
articles).
However, each site receives more information that it can keep. Each site
expires (deletes) older files. This expire period is usually around one week
but varies from site to site. By the time you read this column, the file
you're reading about is long gone. So how does this help to have a column
telling you what you missed?
Fortunately, many sites around the network have agreed to archive one or more
of the newsgroups. (A newsgroup is a way to classify the files, so they don't
appear as one big list. Currently about 700 newsgroups are in the network.)
Thus if you know the file's volume and issue number, you can retrieve it from
one of these archive sites.
The archive sites are scattered all over the country (and world), and vary
from time to time. The largest archive site is UUNET Communication Services.
UUNET is reachable via FTP for those on the Internet, via uucp for those with
accounts with UUNET, and via a 1-900-GOT-SRCS for those that only need
occasional access. UUNET charges for uucp and the 1-900 access. The 1-900
number is only for uucp access. Thus you need a UNIX system or a uucp
emulating package for other systems to access UUNET. Currently the 1-900
number charges 40 cents per minute and only works within the United States.
Some users ask where they can get the FTP (File Transfer Protocol) software to
allow me to access x via ftp. FTP is a protocol, so in addition to software,
you need a network attached. FTP is part of the collection of protocols
implemented on top of TCP/IP (Transmission Control Protocol/Internet
Protocol). It is used both in local area networks running TCP/IP and for wide
area networks using TCP/IP. If you are not already connected to the Internet
with a TCP/IP link, then you cannot use FTP to access any of the archives. As
a generalization, Internet TCP/IP links are routing computers connected by
leased lines running at DS0 (56KB/s) or T1 (1.544MB/s) rates. These leased
lines lead to other routers that are all interconnected by more leased lines,
forming a redundant interconnected network. There are no phone numbers to call
or modems to use. It's big, it's fast, and it's expensive.
I have received mail requests for sources on floppies or other media, but I
cannot accommodate those requests at this time. The best thing you can do, is
acquire a UNIX system or one that supports uucp and take me up on my offer to
mail you a list of system in your area. If you send me a self-addressed number
10 envelope with sufficient postage. (One ounce for smaller communities or two
ounces for urban areas.) Be sure to include your area code. I will be glad to
mail back to you a printout of sites currently on USENET (listed in the USENET
uucp maps) in your area. I can only mail printouts for those in the United
States and Canada. If you live outside the United States and Canada and have
access to electronic mail, via the Internet, MCI mail or Compuserve, I will be
glad to send you a list through electronic mail.
Once connected to the net, you can capture future postings or find a friendly
neighbor to retrieve missed postings from their archives. I would appreciate
hearing from those that have gotten connected via my lists, partially so I
know that you can actually find willing sites in those different areas, and
partially so I can rotate the sites I recommend asking for connections.


Back To The Sources


A comp.sources.unix posting worth reporting on is CVS from Dick Grunge at
Vrije Universiteit <dick@cs.vu.nl>. CVS is a front end for RCS (Revision
Control System, see my article in CUJ, Volume 7, Number 4) that supports
concurrent and independent use of an RCS directory. CVS allows several people
to work on the same set of files simutaneously without conflicting with each
other. CVS also uses its own commands. The Concurrent Version System is Volume
22, Issues 13 and 14.
Now if you wanted CVS, you'll definitely be interested in cvs-ber-liner. Brian
Berliner of Prisma, Inc. converted CVS from shell scripts into C and added
many features to support software release control functions. A copy of his
paper presented at the Winter 1990 USENIX Conference in Washington, DC is also
included in the distribution. Brian's rewrite of CVS with extensions and the
paper is Volume 22, Issues 15-21. Also, a patch to cvs-ber-liner was issued as
Volume 22, Issue 33.
Jonathan Kamens <jik@pit-manager.mit.edu> has written an errortable library
that displays an error message on the standard error stream composed of
program name, the error message derived from an error code, and any user
format arguments. The error-table library also includes run time hooks that
allow the messages to be diverted to another form of error output such as
syslog(3). Thus the same program could at times log the errors to stderr or
via syslog(3). Et is Volume 22, Issues 22-24; with a patch it is Issue 104.
Also from Jonathan Kamens and the Athena project are the et posting routines.
Volume 22, Issues 25-27 is undel2, the MIT Athena delete/undelete programs,
release 2. Delete marks files and directories for later permanent removal (by
renaming them with a prefix of .#). Files accidentally marked for deletion can
be recovered using the undelete command. Periodically, the system
automatically removes marked files and directories. Files also can be removed
on demand. Users can list files which are marked for removal but have yet to
be removed. Of course, once removed, these files are only available via your
backup tapes/floppies. This is the same method used by the Norton Utilities
when they were ported to UNIX. Patch 10 was posted in Issue 105, but don't
worry, the main distribution was already at patch 9, so this is the next patch
number in line.
Ben Smith from Byte has submitted the Byte UNIX Benchmarks in Volume 22,
Issues 28-32. These benchmarks measure the performance of UNIX systems. As
usual, take all benchmarks with a large grain of salt, but one more cannot
hurt.
One of the newer newsreaders, NN, presents the increasing flow of network news
in a format that allows quick browsing. NN6.4, written by Kim Storm
<storm@texas.dk> of TI Denmark, is Volume 22 Issues 36-57, with six patches.
NN is fast and provides a menu-based article selection with fast subject
automatic selection and killing. And this newer version uses the standard
.newsrc control file, so it co-exists better with other newsreaders. I use NN
and wuld recommend it to others.
From GNU (GNU's Not UNIX) comes the latest version of GNU AWK (version 2.11).
This version is compatible with nawk, the "new awk" described in the AWK book
and shipped with UNIX SVR4. Except for the copyleft rules common with GNU
software, it's a direct replacement for the new awk. GNU AWK is large; it's
Volume 22, Issues 87-102, with a patch in Issue 107. For those on MS-DOS PCs,
with patch 1, GNU AWK also supports that environment.
For mail to make it around the network in the uucp system, it needs source
routing, which tells the message how to get to its destination. The source
routing database is distributed over network news in the newsgroup
comp.mail.maps. Before using the maps, you must convert them to a paths file
based on your site's connections and location. A Pathalias program performs
that function. Pathalias v10 was posted by Rich Salz (the moderator of
comp.sources.unix) for Peter Honeyman, its author. Pathalias reads the map
data and determines the lowest cost route from one single host (yours) to all
other hosts in the maps. Pathalias10 is Volume 22, Issues 109-111.


Much New For DOS, Xenix In misc


Those running GCC (The Gnu C Compiler) under Xenix might have run into a small
problem with long names. The link library archiver symbol updater (ranlib) on
Xenix does not like externals with names longer than 40 characters. Steve
Bleazard <Steve.Bleazard@robobar.co.uk> submitted ranlib for comp.sources.misc
Volume 12, Issue 14. This drop in ranlib replacement can handle long
identifiers. At least Xenix's loader does appear to like long identifiers.
ECU, the extended CU command for SCO Xenix, has been extended to also support
SCO UNIX V/386. Now at v2.80, it supports a procedural language, multiple
built-in file transfer protocols and an extended set of interactive commands
and works with SCO releases, 2.3.1 or higher for Xenix and 3.2.0 for UNIX.
Warren Tucker submitted the update for Volume 12, Issues 53-85. Three patches
have been issued for it, Volume 12, Issue 102 and Volume 13, Issues 2 and 3.
To those used to UNIX's du command (which shows how much disk space a file
uses), Peter Lim <pnl@crdgw1.crd.ge.com> has contributed du-pc. This program
provides du type output for MS-DOS file systems. Options include summary
listings only, statistics, cluster size and recursive traversal of
subdirectories. du-pc is Volume 12, Issue 17.
For those running the MS-DOS Shell, Volume 12, Issues 19-26 contain the
updates to take the shell from v1.4 to 16. Ian Stewartson from Data Logic Ltd.
<istewart@datlog.co.uk> contributed the updates which fix many bugs and added
enhancements to /dev support, use of the BIOS for screen size detection,
filename completion and full internal interrupt 24 support.
As the coordinator of the Elm Development Group, a group of volunteers which
work on a Mail User Agent, I have to give equal time to our competitors. One
of them, Dan Heller <argv@Eng.Sun.COM>, has released the latest version of
their MUA: MUSH, the Mail User's Shell. Volume 12, Issues 28-47 is MUSH 7.11.
Mush is a more flexible MUA than Elm and is also a bit more complicated.
However, it is will worth comparing to Elm. Try them both and make your own
decision.
For those trying out computer-based fax, Brian Kantor <brian@ucsd.edu> has
provided cheap-fax, an inexpensive way to implement out-going only, text only
electronic mail to fax. If you can put an inexpensive fax card in an old PC,
you too can run a fax server. It uses serial communications and kermit to the
mail host, which must be a UNIX system. cheap-fax is Volume 12, Issue 48.
For those writing shell scripts to produce menus or just prompt the user for
data, grabchars allows for a more flexible read than the $< operator (csh) or
the read statement (sh). It can read one or more keystrokes from the user
without requiring a return. It was written to make all types of shell scripts
more interactive. Also included is a tool to help write scripts to implement
menus. Dan Smith submitted grabchars as Volume 12, Issues 49 and 50.
For terminals that support a status line, and users that want a clock and mail
status display, Robert Drabek <robert@cs.arizona.edu> has contributed wclock
in Volume 12, Issue 86. Wclock provides a clock display and mail indicator for
non-workstation users. It updates the status line periodically with the
current clock value and a mail indicator if you have new mail.

One of the lighter postings was wordnum by Jim Williams
<williams@mimsy.umd.edu>. wordnum, Volume 12, Issue 98, converts phone numbers
to all possible mappings of letters for that number, and also a string back to
a phone number. Useful, I guess, for trying to choose a catchy phrase or word
to describe your phone number.
Another print spooling control system for postscript printers was submitted by
Tony Field <ajfcal!tony>. Psf2.0, Volume 12, Issues 104-109, supports from one
to four up on a page in both landscape or portrait orientations. Fonts and
point sizes may be selected as desired, and pages may be scaled. Also upported
are duplex (double sided) printing and a banner page for job identification.
Those desiring a make that works like the UNIX one, but runs under MS-DOS,
pdmake 1.6 is for you. Greg Yachuk <zebra!greggy> contributed pdmake v1.6 for
Volume 13, Issues 18-19. It also works under UNIX, so it is possible to be
used for both.
If you are dissatisfied with the od (octal dump) utility from UNIX, hd from
Peter Ruczynski <pjr@pyra.co.uk> just might satisfy you. Is Volume 13, Issues
20-25 and it supports octal, decimal and hex all with or without leading
zeros. It has very flexible formatting, and even parts of it can be called
from your C program for debugging.
For those interested in compression methods, a different method, delta
modulation is used in deltac, Volume 13, Issue 48 from Diomidis Spinellis
<dds@cc.ic.ac.uk>. Delta modulation compresses a file by storing the
difference between bytes, coded in a single nibble. For eight-bit grey scale
images, this can sometimes get a 20 percent higher compression rate than byte
encoding methods.
Those working on ANSI C might be interested in a small, portable parser and
lexer for ANSI C. Shankar Unni <shankar@hpclsu.cup.hp.com> has contributed it
for Volume 13, Issue 52. It handles the December 1988 draft of the standard,
but the author is interested in updating it, so suggestions and corrections
are welcomed.
Comparisons of two or three ASCII files is possible with the combine utility
from Cliff Van Dyke at Harris Computer Systems <cliff@ssd.csd.harris.com>. The
output is either a report describing the differences between the files or a
file which is a composite of the original files. The format is intended to be
friendlier than the UNIX diff utility. The comparison algorithm comes from "A
technique for isolating differences between files," Communications of the ACM,
April 1978, Volume 21, Number 4. combine has been placed in the public domain
and was published as Volume 13, Issues 54-56.
As the coordinator of the Elm Mail User Agent project, I took interest when a
posting had the name elmedit. Marc Siegel <mas!smarc> contributed an
easy-to-learn editor which was written for use with Elm. It saves teaching
people to use vi. Although written for Elm, it is a general text editor and
can edit files, as Elm's messages are just files anyway. Elmedit is Volume 13,
Issue 62.
This next posting is real unusual -- a C obfuscator which takes normal C code
and makes it difficult to understand. (That shouldn't take much...) It throws
away all the information not absolutely needed by the compiler and linker and
then squishes the rest into unreadable ids. It supports a dictionary of global
identifiers and their translations for use with separately compiled modules. I
looked at the output and had difficulty making sense of it. From Russ Fish
<fish%kzin@cs.utah.edu>, it was posted as Volume 13, Issue 68.


Looking For moria, It's Back


The complete sources, not just patches, to umoria have been posted in
comp.sources.games as Volume 9, Issues 55-86, Info 7 and Issue 97. Version 5.2
was contributed by Jim Wilson <wilson@ernie.Berkeley.Edu> and now supports
UNIX, IBM PCs, Macs, Atari-STs and Vax VMS. The moria game is a dungeon
exploration game similar to Dungeons and Dragons but it is not a role-playing
game. The sources are large (about 2Mb) so you must have plenty of room and
time available if you install moria.
A much smaller mini-rogue-like adventure game, Wander3, was released in its
latest version, including 60 screens. Wander3, now at v3.2 was issued as
Volume 9, Issues 51-54. Submitted by Steven Shipway <csupt@cu.warwick.ac.uk>,
this version has some improvements, but mostly bugfixes. In wanderer you move
about the screen avoiding pitfalls and gathering treasures and baby monsters.
In Wander3, you can also write your own screens.
One of the stranger postings for a UNIX distribution, and not really C, is
ctrek, Volume 9, Issues 46 and 47. Star-trek games are not new or unusual, but
this one is. It's in Cobol. Give this one to your mainframe friends and let
them have some fun for a change.
An arcade style game where intelligent monsters take the shortest path through
a maze of barriers to attack you, called mz, was contributed by John Smith
<ccicpg!mbf!smithj> for Volume 9 Issue 87. mz is similar to pacman, however
the monsters are always intelligent and will always choose the shortest path
to get to you.
One of the more popular multi-user games, Galactic Bloodshead, has been
updated to a full split client/server networking model. Thus remote
multi-player games are now possible. This new version, 2.0, supports
controlled enrollment remote execution, so the hosting system where the game
resides can be different than the systems where the players reside. If you
think that's complicated, try playing. gb3 was submitted by Garrett Warren van
Cleef <vancleef@Pooh.caltech.edu> as Volume 10, Issues 1-14. Actually the docs
are pretty good for setting everything up. Be aware, that this game requires
BSD 4.2/4.3 type TCP/IP sockets to function. Its not for single-user PCs.


Previews From alt.sources


On the alternate network, where previews abound, new this time is a new
version of a shell archive program: shar, now at version 3.24. Shell archives
are the source distribution packaging method used by Usenet for Network News.
This version attempts to create missing directories, allows for restoring file
modification times and allows for unsharing a collection of shars from a
single file of concatenated shars or from multiple files or any mixture
thereof. It was posted by Warren Tucker <n4hgf!wht> on May 11, 1990.
Rich Skrenta <blekko!skrenta> posted an Apple II emulator. It features Dos 3.3
and Prodos interception, scroll interception, memory map disk support for raw
disk reads and reasonable speed, a fast RISC workstation might just run at
Apple II speeds. It doesn't have graphics or raw disk writes. Of course the
ROM code is not included, as that is copyrighted. It was posted on May 23,
1990.
Using Merkle's Snerfu hashing function from comp.sources.UNIX Volume 21, Dan
Bernstein <brnstnd@ramden.acf.nyu.edu> has written snuffle. A reasonably fast,
very simple private key encryption system. It takes the first line of input as
the key and also supports a "time-key" on the command line. snuffle was posted
on June 14, 1990.
Config, a program written by Steve Pemberton, detects bit order (endianness),
floating point representation, basic type ranges and arithmetic rounding among
others. Posted by Skip Montanaro <montanaro@crdgw1.ge.com> on June 19, 1990.
For those running SCO UNIX/386, a monitor showing CPU utilization, I/O
performance and usage, systemcall usage, and the current value for many of the
tunable parameters. Several versions were posted, including corrections. And
portability to non SCO UNIX's was not really commented on. But if interested
look for Warren Tucker's postings on or near June 21, 1990.
For those using HP Laserjet printers and PCL, ljinit provides a high-level
interface to the HP PCL configuration commands used by the printers. Instead
of escape sequences, the command takes English arguments that it converts to
the appropriate sequences, such as setup=letter, font_id=roman, stroke=bold,
left_margin=20. ljinit was posted on June 23, 1990 by Chip Rosenthal.
Another spell checking program was posted on June 28, 1990 by Graham Toal
<tharr!gtoal>. Dawgutils (his name for it) is a suite of programs for working
with words. He is looking for portability problems so this release was to
alt.sources. It supports soundex correction, a method of guessing the correct
word based on how the word sounds, but the author is working on a better
algorithm. Dawg stands for Directed Acyclic Word Graph, the name of the data
structure the program uses. The name comes from Appel's paper on Scrabble.
































Math.h++ Library


Ron Burk and Helen Custer


Ron Burk has a B.S.E.E. from the University of Kansas and has been a
programmer for the past 10 years. Helen Custer holds degrees in computer
science, English and psychology from the University of Kansas and is currently
researching a book on operating systems for Microsoft Corporation. You may
contact them both at Burk Labs, P.O. Box 3082, Redmond, WA 98073-3082.


Math.h++, from Rogue Wave Associates, is a library of C++ classes designed for
efficient numerical computing. Most of the code performs various numerical
operations on vectors and matrices. The library also supports Fast Fourier
Transforms (FFTs), linear algebra, statistics and random number generation.
This report examines the Math.h++ library, v3.3.


Why C++?


Several well-established program libraries already exist for mathematics and
statistics, such as NAG and IMSL. They are typically written in FORTRAN but
can usually be called from a variety of programming languages. This numerical
C++ library simplifies the task of writing large programs that are readable
and maintainable.
A principle of good computer program design is "problem-solution similarity".
If this principle is followed, the code should contain representations of some
of the same components that make up the problem being solved. C++ doesn't have
built-in support for lists as Lisp does, or for arrays as APL does, or for
complex numbers as FORTRAN does. However, with C++ the programmer can extend
the language to support whatever constructs are needed.
Consider the Mandelbrot set, which is characterized by the following iterative
formula:
Z = Z * Z + C
You might expect to see this formula in the code of a program that plots the
Mandelbrot set. However, since C and C++ do not directly support complex
numbers, the Mandelbrot formula is usually obscured by the details of complex
arithmetic.
It is possible, however, to add a complex number type to C++. In fact, complex
numbers are one of the simpler classes supplied by the Math.h++ class library.
Listing 1 shows a Zortech C++ program that uses the Math.h++ complex number
class and the Zortech graphics package to draw the Mandelbrot set on a PC.
Adding complex numbers to C++ is only a minor feature of Math.h++; the
library's real strengths are vectors and matrices. For example, Figure 1 shows
an electrical circuit and the matrix equation that characterizes it. Finding
the circuit's currents requires inserting actual component values into the
formula and multiplying the voltage vector by the inverse of the impedance
matrix.
If you write a program to perform this simple matrix inversion and
multiplication, you can end up with a lot of code just for ancillary details
(i.e., reading the data into the matrices). Listing 2 shows a simple solution
using C++ and the Math.h++ library.
The Math.h++ classes are designed to work with the C++ "stream" I/O package
(which uses the << operator for input and the >> operator for output). Listing
3 shows a particular input and output for the CIRCUIT. C program in Listing 2.
The Math.h++ class library expects you to enclose matrices and vectors in
square brackets ([]) and to specify matrix dimensions (the 3x3 in this case)
before the matrix data.
This program took only a few minutes to write and required no debugging. The
program is not concerned with how large the vectors and matrix will be, or
with whether matrices are stored in column or row order, or with any other
details not directly related to the task. Larger programs take more effort, of
course, but programming at a higher level of abstraction can significantly
reduce the time and expense of development, debugging and maintenance.


Overview Of Classes


The library contains vectors and matrix classes for the following data types:
signed char, unsigned char, int, float, double and double complex. With some
effort, you can also compile new classes for vectors or matrices of other
types. (Genericity is discussed later in this article.)
Math.h++ overloads arithmetic operators for addition, subtraction,
multiplication, and division to work with vectors and matrices. For example,
if A and B are matrices, then the following expressions are legal:
A = A + B;
A += B;
Both expressions add each element of B to the corresponding element of A. It
is an error if both matrices do not have the same number of rows and columns;
(A discussion of error handling appears later in this article.)
A variety of library functions operate on the class objects in the library.
The following standard C++ functions are overloaded so that they also operate
on vectors and matrices:
abs() acos() asin()
atan() atan2() ceil()
cos() cosh() exp()
floor() log() log10()
pow() sin() sinh()
sqrt() tan() tanh()
For example, given a matrix A, the following code produces a matrix in which
each element in B is the absolute value of the corresponding element in A.
DGEMatrix B = abs(A);
The library supplies specialized functions that operate on vectors and
matrices, including functions for forming vector sums and cumulative sums,
complex conjugates of complex vectors or matrices, determinants of matrices,
dot products of vectors, the spectral variance of a complex vector and the
transpose of a matrix.
Math.h++ implements the LINPACK algorithms for solving simultaneous linear
equations. Math.h++ provides separate classes that contain the LU
decomposition (a matrix transformation which speeds repetitive calculations)
of a matrix. In fact, the code in Listing 2 implicitly performs an LU
decomposition on the impedance matrix before solving the equation. If the
program were going to solve the same equation multiple times with different
inputs (voltages), it would be much faster to generate the LU decomposition
once and use that result explicitly in the call to solve().
Math.h++ also provides support for other areas of numerical analysis. One set
of classes support FFTs. This package also supplies random number generators
that produce the following distributions: binomial, exponential, gamma,
Gaussian, Poisson and uniform. Also included are a class that handles linear
regressions and a histogram class that helps compile and display the
distributions.


Slices And Picks


Another concept integral to the Math.h++ library is a vector slice. A vector
slice is a subset of vector elements formed by starting at a certain position
and taking every nth element until the desired number of elements are
collected. Numerical algorithms can often be coded in terms of vector slices.
Such solutions are more easily ported to a vector processing machine.
In the Math.h++ library, vectors and slices are the same -- a vector is just a
slice that starts at the 0th element and has a step size of 1. Since a slice
is not a special case, it can be used wherever a vector can, and vice versa.
You can refer to arbitrary subsets of vectors using vector picks. Picks are
implemented with a separate class and are not as efficient as slices, because
the elements selected must be remembered explicitly, rather than calculated.
To form matrix picks, you specify a set of rows (a vector of integers) and a
set of columns (another vector of integers). The picked elements are the
intersections of these rows and columns, which enables you to perform
operations on selected portions of a matrix. For example, you could use a
matrix pick to set the lower half of a matrix to 0.
Slices do not apply to matrices, but there are matrix operators that return
the rows, columns and diagonals of a matrix. These operators return lvalues, a
feature that supports coding techniques such as:

// 10x10 matrix
DGEMatrix a(10,10);
// set column 1 to zero
a.col (1) = 0;
// set main diag to -1
a.diagonal(0) = -1;
Treating rows, columns, diagonals, slices and picks as atomic elements
eliminates a lot of for loops from your code.


Class Design Issues


Every class library design begins by answering at least two important
questions: Does the inheritance structure form a tree (all descended from one
root) or a forest (many independent trees)? Do class objects assume reference
or value semantics?
The Math.h++ library is a forest rather than a tree of classes. Although there
are disadvantages to both approaches, merging a forest of classes into an
existing class tree is probably easier than merging two class trees. For
example, suppose you want all of your classes to be descendants of a class
called Object. You could create a new class for each Math.h++ class that was
simply the inheritance of both the Object class and the individual Math.h++
class.
One of the more confusing aspects of C++ class libraries is reference versus
value semantics. The following statement, which assigns one matrix to another,
illustrates the problem:
A = B
After this statement is executed, does A contain a copy of the matrix B, or
doesA simply refer to the same matrix as B?
The answer is a little complicated. In C++, assignment and initialization are
treated differently. The Math.h++ class library generally uses reference
semantics for initialization and value semantics for assignment. In other
words, assignment creates a distinct copy of the object (vector, matrix, and
so on) that is being assigned. With initialization, however, you can create
two references to the same matrix. The library also supplies functions for
forcing a distinct copy of an object to be made. Using Math.h++, you can
almost, but not quite, ignore the issue for much of your coding.


Portability


In addition to the portability problems of different environments, a C++
program must also cope with variations among currently available C++
compilers. The language itself continues to change, so it is difficult to
choose between all the latest language features or a program that is portable
to the largest number of compiler implementations.
With normal portability problems, Math.h++ does a fair job. The worst flaw is
some gratuitous name-space pollution. For example, the header files define the
symbols TRUE and FALSE as macros, which will cause a syntax error when used in
programs that define these symbols as enums.
There are also some unnecessary dependencies on non-ANSI I/O functions for
doing binary file I/O. On the other hand, conditional compilation is used in
many places to adapt to host environment differences.
The authors seem to have worked hard to accommodate differences between C++
compilers. The code has been ported to compilers from AT&T, Borland, Zortech,
Oregon Software, HCR and GNU. Conditional compilation allows the code to adapt
to various compilers. For example, for compilers that supply a complex number
class, the library generally uses that class rather than its own, to make it
easier to use the library with existing code.
The fact that no standard class library for C++ is defined poses a significant
problem for writing portable C++ class libraries. For example, AT&T ships with
a revised and enhanced stream class for doing I/O, but some compilers do not.
Math.h++ handles this by using whichever stream class the compiler provides.


Efficiency


In general, the design of the library makes efficiency a high priority. Since
reference semantics are used for initialization, matrix copying is avoided in
places where the compiler might generate a temporary matrix variable.
Likewise, since slices are an integral part of the vector design, there is no
significant penalty for using a slice instead of a vector.
The space efficiency of the library depends on how smart your linker is. The
member functions are grouped into relatively few object modules, so many
linkers will load the entire object module even if your program uses only one
of the functions. The CIRCUIT.C program in Listing 2 results in a 55K
executable under MS-DOS.
If space is critical, you could recompile the library in a more granular form,
but this is tedious without an automated tool. Perhaps linkers will become
smarter as object-oriented languages proliferate.
On a related subject, matrices are stored in column-order, as in FORTRAN.
Since most numerical libraries are written in FORTRAN, this increases the odds
that you will be able to mix the Math.h++ library and some other numerical
library in the same program.


Genericity


One important feature that C++ lacks is genericity -- the ability to define a
single class for multiple data types. For example, the only difference between
a float matrix class and a double matrix class is probably that each
occurrence of the key word float is replaced by the key word double.
Currently, the only portable way to accomplish this is with large, multi-line
preprocessor macros that are difficult to read, use and debug.
The Math.h++ solves this problem by supplying its own program (a shell script,
for UNIX) for expanding class "templates" into class definitions. These
templates take a type name as a parameter and expand into the correct
definitions for the given type. The supplied program is not set up to be used
on new data types, but the basic task is simple text substitution, which can
be done in an editor or with a simple sed script in UNIX.


Handling Errors


C++ detects many kinds of type errors, such as applying an operator to the
wrong variable type. The Math.h++ class library can also detect several kinds
of semantic errors, such as an attempt to access an element outside the bounds
of a vector or matrix.
The library supplies bounds checking in a couple of forms. First, the
subscript operator [] always performs bounds checking. Second, the function
call operator () is overloaded to work just like the subscript operator,
except it performs bounds checking only if the macro BOUNDS_CHECK is defined.
The function call operator enables you to perform bounds checking during
development and easily remove it to speed up the final product. Other kinds of
errors, such as attempts to multiply matrices of incompatible sizes, are also
detected. When the library encounters a runtime error, it prints a message on
the screen and (depending upon the severity of the error) terminates the
program.
Unfortunately, you can't easily replace this error function with your own. For
example, if you use the library in a windowing environment, you may have to
alter the error function in the library so that it displays errors in a window
and terminates the program by posting the correct system event.


Summary


Math.h++ is not aimed at the same market as a down-and-dirty, wired-for-speed
numerical library. Its primary strength is the level of abstraction it
provides and the ease it provides in constructing correct, readable,
maintainable programs. Its secondary strength is portability, although that is
a moving target as C++ becomes more standardized. Also, since the library is
supplied in source form, you can change or improve it, but you must maintain
your changes through future releases of the library.
This is one of the earliest commercial C++ class libraries. Class library
vendors face many problems, such as compiler incompatibilities and lack of
support for genericity. Despite these problems, Math.h++ demonstrates that the
promise of reusable class libraries is not all hype -- the future looks
bright.

Math.h++ Library
Rogue Wave Associates
P.O. Box 2328
Corvallis, OR 97339
(503) 745-5908
Figure 1

Listing 1
// mandel.c - Zortech C++ program to draw Mandelbrot set.
#include <conio.h>
#include "fg.h"
#include "dcomplex.h"

inline float scale(int point, int maxpoint)
{
maxpoint /= 2;
return 2 * (point - maxpoint)/float(maxpoint);
}
int resolution = 50;
int maxrow;
int maxcol;

int mandel(float row, float col)
{
DComplex Z(0,0);
DComplex C(col, row);
int color = 0;
for(int iter = 0; iter < resolution && abs(Z) < 2.0; ++iter, ++color)
Z = Z * Z + C;
return iter < resolution ? color : 0;
}

int main(int argc, char **argv)
{
int status = fg_init(); // initialize graphics package
maxrow = fg.displaybox[FG_Y2];
maxcol = fg.displaybox[FG_X2];
int maxcolor = fg.nsimulcolor;
if(status != 0)
{
int step = 20;
for(resolution = 1; !kbhit(); ++resolution)
{
for(int col=maxcol; col >= 0 && !kbhit(); col -= step)
for(int row = 0; row <= maxrow/2; row += step)
{
int color = mandel(scale(row,maxrow),
scale(col,maxcol))
% maxcolor;
fg_drawdot(color, FG_MODE_SET, ~0, col, row);
fg_drawdot(color, FG_MODE_SET, ~0, col, maxrow-row);
}
if(step > 1) --step;
}
while(!kbhit()) // hit key to terminate program
;
fg_term(); // return to text mode
}
}



Listing 2
// circuit.c
#include "dvec.h"
#include "dgemat.h"
#include "dludecmp.h"
#include "rstream.h"

enum bool {FALSE, TRUE };

int main(int argc, char **argv)
{
DoubleVec Voltage;
DGEMatrix Impedance;
cin >> Voltage >> Impedance; // read voltage vector, impedance matrix
DoubleVec Current = solve(Impedance, Voltage); // solve equation
cout << Current; // print current vector
}


Listing 3
Input to circuit program for:

V = 5 volts
Z1 = Z2 = 5 ohms
Z3 = Z4 = Z5 = 3 ohms

[
5 0 0
]
3x3
[
8 -3 0
-3 11 -3
0 -3 6
]

Output from circuit program:

[
0.708955 0.223881 0.11194
]




















Publisher's Forum
Over the years many C programmers have asked us to sponsor a C "get -
together" -- a meeting or seminar where our readers and writers could talk and
learn together. For one reason or another (usually time and money), we've been
unable to put together a meeting worthy of the name "National C Users
Conference". That's changed.
I'm pleased to announce C-FORUM '90 -- a national conference and seminar
exclusively for C programmers. As a co-sponsor (with the Wang Institute of
Boston Unversity), we're deeply involved with making C-FORUM '90 the best (not
the biggest) C event of the year. The conference will run from October 16th to
19th in Tyngsboro, MA (about 30 minutes north of Boston).
We've tried to bring together the best talent available. P.J. Plauger is
serving as technical chair. Brian Kernighan will give the keynote address.
You'll also have the chance to hear and meet Tom Plum, Rex Jaeschke, Jim
Brodie, Larry Constantine, Jack Purdum, and many other reknowned programmers
and writers.
The day before the show, Jack Purdum and I will teach an optional day-long C
tutorial for programmers new to C who want to shorten the learning curve. The
conference proper offers several tracks of seminars covering a wide range of C
topics (see page 114 for more details). In addition to the teaching seminars,
we'll have round-table discussions and presentations by technical
representatives from the major compiler and tool vendors. You'll also be able
to meet with select vendors at their booths in the trade show.
Of course, The C Users Journal will have a booth, and when I'm not conducting
a seminar, I'll probably be there, so you can stop by and tell me, face to
face, what you think. Better yet, you can tell our new editor P.J. Plauger.
In short, C-FORUM '90 is an unprecedented opportunity to work with the leaders
in our industry. We want you to attend. To make it easier, we've arranged a
$100 price break for C Users Journal subscribers. Just mention that you are a
subscriber when you register.
I look forward to seeing you there.
Sincerely yours,
Robert Ward, Publisher



















































New Products


Industry-Related News & Announcements




New C++ Includes Development Environment


ParcPlace Systems announces Objectworks/C++ , an integrated development system
designed to help C++ programmers benefit from object-oriented programming. The
company also offers Object-Kit/C++ , a collection of reusable class libraries
including AT&T 's Standard Library Extension.
Objectworks\C++ permits users to work with traditional UNIX tools. It now
supports large system development and incorporates the AT&T C++ Language
System, Release 2.1.
Objectworks' other new features include open environment, which allows users
to select their preferred C preprocessor, C compiler and linker to customize
their development environment; integrated environment; support for large
systems; and class libraries.
Objectworks\C++ integrates a set of tools designed specifically for C++
program development. The six primary Objectworks\C++ tools for object-oriented
programming are inheritance browser, call relationship browser, program
structure browser, error browser, C++ translator using AT&T 's Language
System, Release 2.1; and C++ source level debugger.
Objectworks\C++ is available for the Sun-3 and SPARCstation platforms and
requires 12Mb of memory. The price is $3,000. For more information, contact
ParcPlace Systems, 1550 Plymouth St., Mountain View, CA 94043, (415) 691-6700;
FAX (415) 691-6715.


Eiffel Ported To Macintosh


Interactive Software Engineering has signed an agreement with the Knowledge
Konnection to port its Eiffel software to the Macintosh operating system.
Eiffel is an advanced object-oriented language, method and environment
designed for the production of high quality software. The Eiffel package
includes a complete set of libraries of reusable components.
For more information, contact Interactive Software Engineering, 270 Storke
Rd., Suite 7, Goleta, CA 93117, (805) 685-1006; FAX (805) 685-6869.


JAM Supports ULTRIX/SQL


JYACC's Application Manager, JAM , is an independent front-end tool to support
Digital Equipment's ULTRIX/SQL. The ULTRIX version supports remote procedure
calls, links to Rdb and concurrent access to multiple servers.
Both DEC and JYACC share similar "no runtime" policies in which neither vendor
charges runtime royalties for applications created with their respective
products.
For more information, contact JYACC , 116 John St., New York, NY 10038, (212)
267-7722; FAX (212) 608-6753.


GX Supports Super-VGA Modes


Genus Microprogramming has added GX Graphics to its GX Development Series of
graphics programming tools. GX Graphics is designed for PC developers wanting
to add graphics to their applications.
GX Graphics provides the standard graphics primitive functions along with some
more advanced routines, such as Super VGA display mode support, mouse
programming routines and the ability to draw to virtual buffers in
conventional expanded or disk memory.
GX Graphics sells for $149 with source code available for an additional $250.
For more information, contact Genus Microprogramming, 11315 Meadow Lake,
Houston, TX 77077, (800) 227-0918; FAX (713) 870-0288.


KEE Gets X-Windows


The Knowledge Engineering Environment (KEE) system from IntelliCorp, is a
knowledge-based system development package based on LISP. KEE 4.0's support
for the X Windows standard allows for distributed processing across a computer
network. With KEE 4.0, an application can run on one system and be displayed
on multiple display servers across the network.
KEE 4.0 is available immediately on HP 9000/8XX workstations and is priced at
$30,000. Support for additional workstations, including the HP 9000/3XX and
SPARCstation is expected in the next six months. For more information, contact
IntelliCorp, 1975 El Camino Real West, Mountain View, CA 94040-2216, (415)
965-5500.


Debugger Supports Z80, 6809, 64180


Now available from Intermetrics Microsystems Software are Whitesmiths
optimizing C cross compilers and CXDB C source-level cross debuggers for the
Zilog Z80, the Motorola 6809 and Hitachi 64180 microcontrollers.
The compilers support the ANSI/ISO Standard C language with Whitesmiths'
chip-specific language extensions for each microprocessor. The C cross
compiler packages include macro assemblers that provide cross reference
listings and generate relocatable code. CXDB, the C source-level cross
debugger, is used to debug Z80, 64180 or 6809 target code.
For more information, contact Intermetrics, 733 Concord Avenue, Cambridge, MA
02138-1002, (617) 661-1840.


Willies Offers Serial Package



Willies' Computer Software Company has released the professional version of
its serial communication package COMM-DRV. The package contains serial
communication libraries, installable MS-DOS communication device drivers, port
monitoring programs, and file transfer programs. The package has support for
multiport serial cards, 25 plus ports (COM1-COMn), bauds up to 115,200 baud,
XON/XOFF and hardware handshaking, polled and interrupt driven I/O (background
processing), IRQ sharing, error free operation under multitaskers like
DESQview and Omniview, reentrant code, Fossil/BBS interface, transparent
interface to MS-DOS int21H, BIOS int4H, C's open(), read(), write(), close()
and fast direct calls.
The package prices at $39.95 for device drivers and libraries at $89.95 if the
source is included. Contact Willies' Computer Software, 2470 S. Dairy Ashford,
Suite 188, Houston, TX 77099, (713) 498-4832; FAX (713) 568-6401.


Gimpel Releases 386 C-terp


Gimpel Software has released C-terp 386, a C interpreter with integrated
debugging and editing. C-terp 386 employs Phar Lap DOS extender technology to
access all available extended memory, making interpretation viable for large
applications.
With user source code in high speed RAM, new inter-file search facilities
allow full access to variables and identifiers in all headers and program
files in a large program.
Other features include an unlimited Undo, various editor enhancements, pointer
read checking, wild-card file loading and improved linking to externals.
C-terp features full K&R support with ANSI extensions, a full-screen, built-in
reconfigurable editor, multiple module support, automatic make file, global
search, preprocess facility, system shell, trace, batch mode and fullscreen
source-level interactive debug facilities.
C-terp 386 is compiler-specific. In the Phar Lap environment, it is now
available for MetaWare's High C 386 and Watcom C 386. C-terp 386 sells for
$398 for the first copy and $300 for each additional copy. C-terp 386U (C-terp
for UNIX System V/386) is also available. Under MS-DOS, C-terp is available
for the Microsoft compiler at $298. 
For more information, contact Gimpel Software, 3207 Hogarth Lane,
Collegeville, PA 19426, (215) 584-4261; FAX (215) 584-4266.


Library Supports Knowledge Base


The KNET (Knowledge Network) Library 1.0 from Konexsys is a set of more than
250 C routines that create, maintain and query an advanced semantic database.
The KNET library was designed anticipating the continued expansion and cost
effectiveness of RAM, allowing multi-dimensionally linked knowledge to be
added, deleted and queried instantaneously.
The KNET Library can be purchased for the Macintosh, Sun, Apollo or the HP
computer. The single developer's license is $995, which includes five hours of
phone support and a free upgrade to v1.1.
For more information, contact Konexsys, 3825 Academy Parkway South, NE,
Albuquerque, NM 87109, (505) 344-8891, FAX (505) 344-8155.


Window And Menu Manager Supported With Class Library


Data++ Windows from PMI is an MS-DOS-based window and menu manager containing
more than 160 classes and subclasses supporting a range of data types. Data++
Windows has string formatting, file I/O, keyboard management and menu
management with virtual windows.
Data++ Windows includes some generic tools: formatters to present data values
to the user, scanners to read data from files, fields to allow the user to
enter/edit data values and a screen editor to allow a set of fields to be
entered as a unit.
Data++ Windows requires MS-DOS 3.0 or later, and occupies between 50K and 100K
of conventional memory. It sells for $189. For more information, contact PMI,
8311 SE 13th Ave., Suite B, Portland, OR 97202, (503) 236-1788; FAX (503)
236-1867.


Replacement malloc Library Implements Best Fit Method


Mbmalloc from Minnow Bear Computers replaces the far heap related functions
for Turbo C. Mbmalloc allows a programmer to use a full best fit allocation
scheme and reduces memory fragmentation, especially when the sizes of the
blocks being allocated are small in relation to the total heap size.
With mbmalloc, programmers can check that the heap has not been trashed, and
by a function call, can instruct mbmalloc to check the status of the heap
during every call to a heap related function.
Mbmalloc costs $395 and includes full source code. Contact Minnow Bear
Computers, P.O. Box 2233, Sta. A, Champaign, IL 61825-2233, (217) 344-1113;
FAX (217) 328-6127.


Greenleaf Enhances Library With New Windowing Functions


Greenleaf Software has announced an upgrade to its SuperFunctions C library.
The new release features new window functions, including the capacity to paint
windows with shadows and scroll text within the window.
SuperFunctions now supports additional C compilers, including Watcom C,
Zortech C++ and TopSpeed C. SuperFunctions continues to support Turbo C2.0,
Turbo C++, Lattice C, Quick C and Microsoft C6.0.
The upgrade price for SuperFunctions is $45. For more information, contact
Greenleaf Software, 16479 Dallas Parkway, Suite 570, Dallas, TX 75248, (800)
523-9830; FAX (214) 248-7830.


Soft Advances Supports MSC 6


Soft Advances' debugging products now support Microsoft C 6.0 symbol
information. Remote DSD, a software debugger for embedded programming,
connects to a target system serial port and uses a small monitor program in
the target. Iceberg, a hardware debugging tool for embedded development,
combines a ROM emulator and a software debugger. ROM-Link is a locating linker
that performs all the operating of the Microsoft link.
For more information, contact Soft Advances, 10811 Washington Blvd., Suite
205, Culver City, CA 90232, (213) 559-7015; FAX (213) 559-6653.


ProKappa Offers Dev. System



The ProKappa system from IntelliCorp is made up of two integrated components,
an application development environment and a set of high-performance runtime
components.
ProKappa includes object-oriented programming capability; ProTalk, a hybrid
rule, pattern and functional language written in and integrated with C;
ActiveImages package, a library of dynamic graphics objects; and dynamic links
to SQL data bases, other data sources and C programs.
Contact IntelliCorp at 1975 El Camino Real West, Mountain View, CA 94040-2216,
(415) 965-5500.


Library Supports MS-DOS, Windows


TSR Systems Limited has introduced Phase One, a C engine that will power the
compiler as a stand-alone product. The MS-DOS version of Phase One requires a
Microsoft C compatible compiler or Borland's Turbo C or Turbo C++.
The Phase One engine contains both the MS-DOS and Windows libraries and will
sell for $495. For more information, contact TSR Systems Limited, 116 Oakland
Ave., Port Jefferson, NY 11777, (516) 331-6336.


SCO Releases UNIX R3.2 V2.0


The Santa Cruz Operation is now shipping SCO UNIX System V/386 Release 3.2,
v2.0. Release 3.2 is a fully licensed, SVID-confirming implementation of
AT&T's UNIX System V/386 that supports all 386 and 486 computers based on
Industry Standard Architecture (ISA), Extended Industry Standard Architecture
(EISA) and Micro Channel Architecture.
Combined with the SCO MPX multiprocessor extension product, this release
supports multiprocessing on 386 and 486 computers. Release 3.2 also adds
several new features, including I/O performance improvements, simplified
management of C2-level trusted systems and job control under the Korn shell.
For more information, contact the Santa Cruz Operation, 400 Encinal Street,
Santa Cruz, CA 95061, (408) 425-7222.


Debugger Supports 8-bit CPUs


RADE (Real-time Advanced Debugging Environment), an eight-bit microprocessor
debugging system available from Introl has been fully integrated with Pentica
Systems' MIME-600 in-circuit emulators. This C and Modula-2 source level
development and debugging environment supports the Motorola 68HC11, 6801 and
6809 and the Hitachi 6301/3 microcontrollers.
Introl-C and Modula-2 cross compilers produce reentrant code which is
downloaded to the MIME-600 for real-time debugging with RADE. This package
supports an unlimited number of hardware breakpoints. All breakpoints can be
set in high-level language terms -- execution breakpoints can be set by source
line number or function name, and data breakpoints can be set by variable
name. Breakpoints can also be set on local, stack-based variables.
For more information, contact Pentica Systems, One Kendall Square, Bldg. 200,
Cambridge, MA 02139, (617) 577-1101; FAX (617) 494-9162


Zortech C++ Runs On SCO UNIX


Zortech has introduced its 2.1 C++ compiler for programmers developing under
SCO UNIX on the 80386 and announced the development of a 386 MS-DOS version
for software engineers using Phar Lap's DOS Extender.
Zortech's new C++ compilers, for use in Phar Lap's 386 development system,
expands Zortech's solution for the 640K barrier. Programmers can use the
virtual code manager on all machines, the Rational DOS Extender for 286 AT
development, or the built-in Microsoft Windows capability for 80386 machines.
The SCO UNIX version of the compiler sells for $495.95. Current Zortech C++
developers can upgrade their compiler packages to UNIX 386 for $249.95. Phar
Lap 386 DOS Extender developers can order the Zortech C++ compiler system for
$995 and existing users can order for only $495.
For more information, contact Zortech, 4-C Gill St., Woburn, MA 01801, (617)
937-0696; FAX (617) 643-7969.


Now, X-Windows For MS-Windows


Graphics Software Systems is now distributing XVision software. XVision allows
PC users to display multiple X Window System applications along with multiple
Microsoft Windows applications.
XVision, a PC-based X server, is a Microsoft Windows application installed on
the PC. In addition to displaying multiple applications in X and MS-Windows
without the need for a MS-DOS hot-key, users can cut and paste between the two
environments. XVision can be networked to a variety of X hosts simultaneously,
including UNIX and VMS, over a TCP/IP network.
XVision sells for $445. For more information, contact Graphic Software
Systems, 9590 SW Gemini Drive, Beaverton, OR 97005, (503) 641-2200; FAX (503)
643-8642.


C++ Library Builds Screens


Screens++ from ImageSoft is a toolbox of high-level, reusable C++
library/classes to build window interfaces for the developer's applications.
It enables the developer to create user interfaces for UNIX and MS-DOS.
Screens++ includes several major classes: abstract data types, screen fields,
virtual terminals, windows and text attributes. Each of these classes has
subclasses, including pop-up and picklist meus, error-handling and keyboard
handling.
For more information, contact ImageSoft, 2 Haven Ave., Port Washington, NY
11050, (516) 767-2233; FAX (516) 767-9067.


Microsoft Ships LAN Manager 1.1


Microsoft is shipping v1.1 of its LAN Manager for UNIX, with new software
tools for remote administration and more programming interfaces to integrate
applications into the network.
LAN Manager v1.1 is designed to be implemented on any operating system and has
been primarily adopted by networking vendors to connect UNIX systems to MS-DOS
and OS/2-based PC systems.
For more information, contact Microsoft, One Microsoft Way, Redmond, WA
98052-6399, (206) 882-8080.



Cross-Debugger Supports 68302


Intermetrics has released RMXDB 5.0, a ROM monitor-based, C source-level cross
debugger for Motorola's 68302 processor. RMXDB 5.0 integrates the source-level
debugging features of XDB 5.0 with a low-level target monitor program.
RMXDB 5.0 provides the following software debugging features through a
multi-window user interface: the ability to display source code, registers and
stack information, set software breakpoints, single step at the C or assembly
level, monitor and modify data, define macros and aliases, record and play
back debugging sessions, simulate target I/O and access pop-up windows for
status information and on-line help.
Prices for RMXDB 5.0 begin at $2,500. Contact Intermetrics, 733 Concord Ave.,
Cambridge, MA 02138-1002, (617) 661-1840.























































We Have Mail
Dear Mr. Ward:
It's been a long time since I've written to a magazine about something I read,
simply because I usually don't have the time. However, since I've been
catching up on my reading while waiting for a big project to MAKE, I've had
the opportunity to read several back issues of CUJ in succession, and found
two things nearly simultaneously that I feel need responding to. Hence, I
write today.
The first item I'd like to address is the review in the March 1990 issue of
The HALO Graphics Library. Since I'm working with some graphics programming,
and expect to do more in the future, I was quite interested in learning more
about how this popular system works. In fact, I was quite glad when I got to
the second paragraph which read "The BARGRAPH application in Listing 1
demonstrates the style and ease with which HALO can be integrated with C
programs," since this is exactly the environment I'd be using, and I'd like to
know how easy it might be. I immediately started flipping pages to find
Listing 1 so that I could examine the code. Unfortunately, I found the end of
the magazine before the referenced Listing, nor could I find it anywhere else
in the issue. Could you send me a copy of the missing listing, or better yet,
publish it in a future issue so that all of us interested in this application
could review it? I'm sure there are other people out here who felt the same
frustration.
My second item comes from reading the April 1990 "We Have Mail," in which a
reader pointed out Leor Zolman's misunderstanding of the DOS SUBST command. By
themselves, Mr. Zolman's initial errors could be forgiven as a UNIX user
trying to live in a "foreign" environment with poor documentation. His
response, however, certainly exacerbated the problem, adding confusion to the
picture rather than more information. I don't know what DOS manual he's using,
but I don't think it's one that came from Microsoft if it didn't mention using
SUBST by itself to list the current drive assignments. As for the rest of his
comments:
1. "After selecting a virtual drive defined via subst, there are two different
notations for specifying the full pathname of any file on the virtual
drive..." This is true. Is there any harm in doing so? "...but no way to
access those portions of the file tree that reside 'above' the base of the
virtual drive without reverting to 'real' drive notation." This is also true.
The primary purposes of the SUBST command are either to provide a shorthand
notation for accessing files deep within a nested hierarchy to avoid the
63-character pathname limit, or to allow software to be run which expects
certain drives to contain their files, and to use a hard disk subdirectory to
hold those files. It should not be expected to provide access to parent
directories of the SUBSTituted path: What meaning is there in accessing the
parent of the root directory of any other drive designation? Allowing parental
access through SUBST would not be consistent with drive characteristics.
2. "After a virtual drive has been assigned with subst, any redefinition of
that drive is prohibited by DOS." This is absolutely not true! I have a WH.BAT
in each of my project areas which (among other things) sets up a set of
virtual drives for the task I'm working on. My compiler then knows that my
source files are always on drive W:, my include files on drive I:, and that
the object files are to be placed on drive 0:. Drives I: and 0: are the same
most of the time, but W: has to change constantly. It's easy:
SUBST W: /D
SUBST W: .
The first command removes the old W: SUBSTitution, the second assigns it to
the current directory. Voila, I'm working in a new environment.
3. "Finally, to be able to use subst at all, CONFIG. SYS must be changed and
the system re-booted." It took me a while to figure this one out, since I
changed my CONFIG.SYS so long ago and forgot about it. The command Mr. Zolman
is talking about is LASTDRIVE. In order to reference drive letters beyond E:
if you don't have a physical disk there, DOS has to be informed of your intent
so that it can allocate its internal data structures (such as for keeping a
record of the current directory on the SUBSTituted drive). On my system at
home, which lives in its own world, I've got LASTDRIVE=Z in the CONFIG.SYS
file so I can use any drive letter I want, for any purpose I want. At work,
it's a little different, since NetWare only starts assigning drive letters
after whatever LASTDRIVE is set to, so I have to have LASTDRIVE=I to match the
way everyone else has their network drives mapped. Hence, my working directory
is drive H:, rather than drive W:. To modify my environment to match the new
system, I only had to modify the batch files which reference the current
working directory (such as the one which runs my compiler). Now my programming
environments work the same at home as at work, and I don't have to worry about
remembering which command set to use where. Like I said, I set up CONFIG. SYS
once and forgot about it. Did I worry about having to re-boot the computer
when I did that? No, since I was also installing a RAM disk driver, a mouse
driver, and a bunch of other things at the same time. There's no harm in
potentially having extra drive letters around, unless you're so cramped for
memory that a few hundred bytes will make the difference. In that case, you
want to change operating systems anyway, so why worry about SUBST not
misbehaving the way you think it should?
I can understand and appreciate Mr. Zolman's desire to make his different
environments work in similar manners, and the desire to reduce the number of
keystrokes to perform a given operation. WH.BAT, for example, used to be
WORKHERE.BAT until I got tired of typing the longer name. I've also got an
extensive set of two-letter-named batch files that I use for most of the
common things I do, and reprogrammed the function keys to enter longer command
strings. DOS has a lot of tools available to make it work more like you want
it to if you're not satisfied with its user interface. Saying that you're not
going to use one of the more powerful tools available because it doesn't work
quite the way you think it should shows that you really don't want to be using
DOS anyway, or that you haven't spent enough time reading the documentation.
If you want to be a UNIX programmer, why knock DOS? It's not a multitasking,
multiuser operating system. It does, however, have a lot of good software
available simply because there is one programming interface for all of the
computers it runs on, not over thirty dialects that have to be accounted for
to get a reasonable market. Hopefully there will be a similar multitasking
standard someday that takes full advantage of all of the powerful computing
platforms now available. Until that happens, I'm probably going to stick to
programming for DOS for my own projects because there is a big market for a
relatively small investment.
I must say that I'm generally quite impressed with your magazine, and look
forward to getting a new issue each month. However, my subscription has
lapsed, so they've stopped coming at the moment. I've therefore enclosed my
renewal order, so that I can once again look forward to CUJ on a regular
basis.
Sincerely,
Fred Koschara
Box 617 - Kenmore Station
Boston, MA 02215
I'm terribly sorry to hear about the problem with the Listing in the April
issue. Of course, I'd love to fix that problem for you, but the manufacturer's
warranty on that issue has long since expired. May I suggest that in the
future you read your magazines sooner?
Seriously, when preparing the Halo story for print, we chose to delete the
listing to save space. Unfortunately we didn't enforce that decision
everywhere during copy editing. We're sending you a copy, and we'll also put
the listing on the code disk for that issue.
Thanks for the SUBST information. I don't think Leor really meant to knock
MS-DOS -- just to explore some system level techniques and show how to "gloss
over" system differences. -- rlw
Dear Mr. Ward:
I would like to comment on a minor point of Dan Saks' second article on
writing your own standard headers (March 1990, volume 8, number 3, page 95).
If a programmer is concerned about "eliminating some irritating portability
problems", he cannot afford to choose a declaration style based on the
compiler currently to hand. Thus Mr. Saks' suggestion to write a declaration
for malloc as either
typedef char *void star;
void star malloc(),
or
void *malloc();
depending on what your compiler will support is not a good idea. If you choose
the latter and then try to port code to a platform whose compiler does not
support generic pointers, you will find yourself with an unpleasant (or at
least time consuming) editing task to perform at the point of all uses of void
*. Rather you should choose a declaration style that can handle the worst case
and use it religiously. Such as:
typedef char *Gptr; /* Generic pointer */
typedef short Undfnd; /* Undefined
type */

Gptr malloc();
Undfnd free();
Then when you find yourself using a more advanced compiler you re-write only
your keyword definitions to take advantage of the compiler's support, as in:
typedef void *Gptr;
typedef void Undfnd;
A more significant issue is the application of this general principal to all
the base types of C. In my work, I never use C's type keywords directly since
in pre-ANSI days they provided no guarantees. For example, it may or may not
have been possible to store the signed literal 35000 in an int. long int and
short int were not a complete solution since they could be equivalent. The
easy solution was to use C's flexible type construction to define type
keywords that guaranteed a minimum size. For a given compiler these might look
like:
typedef char Intl;
typedef int Int2;
typedef long Int4;
If I ported applications to a platform with a different size of 'plain
integer', I needed only adjust the definitions of these keywords.
Even if all non-ANSI compilers disappeared tomorrow, I would continue to use
this technique for its documentation benefits because I feel it might give me
the upper hand in a nasty porting situation.
Sincerely,
Corey F. Huber
Fraser Consulting Inc.
1 Liberty Street
Cazenovia, NY 13035
Dan's Reply:
As I stated in my first article on standard headers (January 1990), my primary
intent is to help people port ANSI C code (found in current literature) to
traditional compilers. Newer code uses headers that are often unavailable on
older compilers. My suggestion is "don't change the code, write your own
headers."
The headers define symbols that provide a portable interface to the C
environment, but the headers themselves need not be portable. It doesn't
matter that the definition for malloc() may be slightly different on different
compilers; it's hidden in <stdlib.h>. It only matters that <stdlib.h> is
defined everywhere so that code using it is portable.
I wrote in my article that putting definitions such as void (and maybe void
star) in <quirks.h> "simplifies writing the standard [headers] and eliminates
some irritating portability problems." My primary concern is making it easier
to write the headers. If, as a bonus, <quirks.h> eliminates some other
portability problems, so much the better.
It seems to me that Mr. Huber is simply suggesting the use of names Undfnd and
Gptr in place of my suggested void and void star. If you use my names, you can
compile ANSI C code such as
void f(p)
char *p;
on older compilers without change. Using his suggestion, you have to change
the void to Undfnd, or add a definition for void (which is my idea). I see no
advantage to Undfnd. If you port ANSI C code with
void *g(n)
size t n;

you might have to change the void * to void star or Gptr. Here, the porting
effort is identical; the choice between the void star and Gptr is purely
personal preference. I agree that careful use of defined types for some
built-in types makes it easier to write highly portable code, but I'm afraid I
don't see that definitions like
typedef long int Int4;
help very much. For example, if you are using an old C compiler in which both
int and long int are 16-bits, then how do you define Int4 so that you can
manipulate 35000 as a positive signed int whose value is distinct from the
negative value with the same 16-bit representation? You could define
typedef struct {unsigned lo; int hi;}
Int4;
to create a 32-bit signed integer type, but then you can't apply operators
like + and to it (you'll need macros or functions), and you have write
literals like 35000 as structured constants, e.g,
Int4 L35000 = {35000, 0};
If you follow Mr. Huber's advice and write in a style that anticipates the
worst possible case, then you have to write all your code assuming 16-bit long
ints. It's probably easier to write a pre-compiler to handle 32-bit long ints
than to write all your C code in this style. Any code which treats 35000 as a
signed integer must assume that long int is bigger than 16 bits (typically 32
bits). You might as well call that type long int.
Dear Sirs:
I am impressed by your ability to print useful and readable material in each
issue, and by your continued interest in numerical applications.
Unfortunately, the examples in Sheppard's article "Evaluating Your Floating
Point Library" don't constitute a very useful test suite, although I found
some challenge in trying to understand the results.
About the only thing tested by the first three listings is the underflow
behavior. Figure 2 is interesting in that it shows the effect of IEEE gradual
underflow, where precision is lost gradually as the number is reduced from
FLT_MIN to FLT_MIN*FLT_EPSILON, which is the point of total underflow.
Listing 3 would have showed undesirable behavior in the Microsoft 8080
library, if there had been a C compiler using the same poorly coded functions.
A properly handled internal underflow will not have any effect. About the only
likely problem would be if the function started out by multiplying the
argument by 2/PI, when accuracy could be lost for arguments less than
LDBL_MIN*PI/2. As the author points out, the function probably involves a
series or a rational polynomial, but only the first term is large enough to
affect results for small arguments. If the coder has cheated on the number of
terms in the series, it wouldn't necessarily show up in this test.
Listing 4 is interesting in that inequality of the results should never occur
for the test values which are squares, even if the arithmetic doesn't fully
meet IEEE standards. In fact, these are the values for which the author found
the worst discrepancies. I tried various software and hardware floating point
options, but I could not produce errors in these values. With normal rounding,
FLT_ROUNDS=1, none of the errors were greater than DBL_EPSILON*x. I had to set
the FLT_ROUNDS=2 mode to get all of the non-zero errors to come out negative,
so there must be something strange about the arithmetic tested by the author.
This reminds me of the old Honeywell 6000, where the accuracy of sqrt() could
be improved greatly by changing the last step from
return (x/y+y)/2;
to
(void)frexp(x/y,&exp);
return (x/y+ldexp(.5,exp-DBL_MANT_DIG)+y)/2;
which worked well because x/y was calculated in FLT_ROUNDS=0 double
arithmetic, with the remaining operations performed in long double and final
rounding in FLT_ROUNDS=1 mode. These extra steps could be performed by a
single OR instruction when coding in assembly. If there were more than one
upward rounding operation, results similar to Sheppard's would be produced.
The Paranoia code contains a fairly complete test of the sqrt() function which
will show whether tricks like the one above have worked. I believe that there
exist test suites for the elementary functions, but they are not covered by
the IEEE standard. I would be interested to know if there are good public
domain test suites written in C or which could be translated to C.
Sincerely,
Tim Prince
39 Harbor Hill Dr.
Grosse Pte Farms NI 48237-3747
I too would be interested in other floating point test suites, public domain
or not. If you have some experience with some, or ideas for how to construct a
good one, write. We'll share the most useful here. -- rlw






































OS/2 Interprocess Communication Features


Bob Withers


Bob Withers has a BS in computer science from Oakland University in Rochester,
MI and has been active in data processing for 20 years. He began programming
for micros in 1985 and has focused increasingly on C and OS/2. Currently, he
works for Texas Instruments as a member of its Group Technical Staff. He can
be contacted at 649 Meadowbrook St., Allen, TX 75002.


OS/2 is a multitasking operating system allowing many programs to be active at
the same time. OS/2 processes may create multiple threads of execution, as
well as child processes, that will compete for system resources. In addition,
the user is free to invoke multiple programs to be executed. Much of the
usefulness of multitasking would be lost if there were not some mechanism for
communication among the active processes within the system. Fortunately, OS/2
contains a very rich set of Interprocess Communication (IPC) tools that can be
combined to allow information sharing.
This article will discuss the IPC options provided by the current version of
the operating system, OS/2 V1.2, and how they can be used to allow information
to flow to processes within the same machine.


Types Of IPC


OS/2 supports many forms of IPC that may be used alone or combined to allow
programs to share information. The various IPC mechanisms allow for passing
data as small as a single bit or as large as megabytes of information. Many of
the IPC features of OS/2 are restricted to communication within the same
physical machine, but some have been extended, via the OS/2 LAN Manager, to
communicate with remote processes. Also, the array of features provided is
powerful enough that the programmer is able to construct new forms of IPC by
combining the existing features. These new forms allow the OS/2 developer much
freedom in planning how his application will be designed and how the various
subsystems will communicate with each other.


Semaphores


Semaphores are usually used as signaling mechanisms. Like their railroad
namesake, semaphores are often used as a means of mutual exclusion, preventing
multiple access to a serially reusable resource. They are also used as signals
for one process (or thread) to advise others that some event has occurred.
Semaphores are system objects that have two states, owned and not owned. The
operating system guarantees that the act of setting and clearing these states
is atomic. This guarantee assures a process that when it requests semaphore
ownership it will be blocked until it owns the semaphore. (Or, optionally, it
can receive an error code after a timeout period). Figure 1 lists the API
(Application Programming Interface) functions that support semaphores. OS/2
currently supports three types of semaphores: RAM semaphores, System
semaphores, and Fast-Safe RAM semaphores.
RAM semaphores are program variables that are used as semaphores. The program
provides the storage for the semaphore and is responsible for initializing it.
The program calls OS/2 API functions to control ownership of the semaphore.
RAM semaphores are very fast and require little of system resources. They are
useful for coordinating access among the multiple threads of a process. For
example, multiple threads may wish to access/update variables stored in the
process's data segment. Since there is only one instance of the data segment
per process, a semaphore can prevent multiple threads from updating the same
variable at the same time. Each thread requests ownership of the semaphore and
only accesses the variable while it is the owner.
One of the disadvantages of RAM semaphores is that OS/2 provides no clean-up
support for them. If a thread or process dies unexpectedly while owning a RAM
semaphore, OS/2 makes no effort to release the semaphore or notify other
threads that may be blocked. This lack of clean-up support could result in a
deadlock in which a thread is waiting on a resource that will never be
available. Restricting RAM semaphore use to coordination/signaling functions
between the threads of a single process avoids this possible deadlock. If a
thread ends while owning the semaphore a problem could still occur, but if the
process dies (a more likely event) no harm will be done since all users of the
semaphore will also die. Figure 2 shows a sample function that can be called
by multiple threads to update and return an application counter. This example,
although trivial, demonstrates the use of a RAM semaphore to protect the
integrity of the counter.
Fast-Safe semaphores, an extension of RAM semaphores, attempt to alleviate
some of the RAM semaphore shortcomings without giving up the convenience. Like
RAM semaphores, Fast-Safe RAM semaphores are allocated by the process and are
identified by their address. Where the RAM semaphore uses a four-byte integer
(LONG/ULONG) to allocate semaphore storage, the Fast-Safe RAM semaphore uses
the structure shown in Figure 3.
The major advantage of Fast-Safe RAM semaphores is a mechanism that allows an
owning process to release the semaphore if it dies. Each process using a
Fast-Safe RAM semaphore must register an exit list via the DosExitList API.
During exit list processing, the contents of the pid field may be examined to
determine if the terminating process owns the semaphore. The client field can
also indicate some required process specific clean-up. If clean-up is
required, the exit list code issues the DosFSRamSemRequest API call. This
function, when called from a processes exit list, forces the calling process
to take semaphore ownership and sets the usage count to one. The exit list
code calls DosFSRamSemClear to release the semaphore and allow one of the
blocked threads (presumably in another process) to acquire ownership. With the
exception of the client field of the DOSFSRSEM structure, the application
should not modify any fields after the semaphore is put to use by the first
call to DosFSRamSemRequest.
System semaphores are named resources under OS/2 control. They reside within
the operating system and are created and destroyed via API calls. Figure 1
lists the API calls that system semaphores support. Unlike both forms of RAM
semaphores, any process knowing the name of a system semaphore can gain access
to it. Additional API calls are provided to create, open and close a system
semaphore. These functions return a handle that identifies the semaphore when
using the Request/Set/Clear API functions used for RAM semaphores.


Shared Memory


Shared memory is probably the fastest way two processes can share information.
Once all processes sharing information obtain access to the memory area, each
process can access the data in the memory block just as it would access
private memory. In general, another IPC mechanism, typically a semaphore,
mediates access to the shared memory area. Since simultaneous updating of the
shared memory area could wreak havoc on the reliability of the information it
contains, processes usually agree to obtain ownership of a particular
semaphore before accessing shared memory data.
Shared memory implementation in OS/2 V1.X lies very close to the host CPU. The
architecture of the Intel 80x86 family of processors places a burden on the
programmer and makes creating portable applications difficult. All memory
allocation is based on a segment which may be from one to 64K bytes in size. A
selector, a magic number assigned by the operating system, identities the
segment. The combination of a selector and an offset within the segment
accesses addressable memory.


Anonymous Shared Memory


Anonymous shared memory, as its name implies, is identified solely by its
memory selector. There are two types of anonymous shared memory, normal (a
single segment) and huge (multiple segments treated as one). First I'll
discuss single segment shared memory.
Anonymous shared memory is allocated, freed, and reallocated exactly like
private memory in OS/2. In both instances, the program issues the same API
calls, both which are listed in Figure 4. Flags assigned at allocation time
distinguish shared memory. The DosAllocSeg API call (and also the DosAllocHuge
call) supports a parameter that the application uses to indicate how the newly
allocated memory will be used. The OS/2 Programmer's Toolkit defines the
manifest constants shown in Figure 5 for specifying this parameter.
The two flags that indicate shared memory are SEG_GETTABLE and SEG_GIVEABLE. A
memory segment may be allocated as either GETTABLE (processes knowing the
selector may gain access), GIVEABLE (processes knowing the selector may give
access to others), or both. The programmer may use the SEG_DISCARDABLE flag to
aid the operating system in virtual memory management. This flag states that
if memory runs short, the system can discard this segment to make room for new
allocations. Systems using the discardable attribute must also use the
DosLockSeg and DosUnlockSeg API calls to assure that the system doesn't
discard a memory segment while the segment is being accessed. If the system
does discard a segment, DosLockSeg returns an error code and DosReallocSeg
assigns new memory to the segment. At this point the application must
regenerate the memory segment's contents. While discardable segments are a
real boon to the OS/2 memory manager, they are seldom used. In most cases
applications that can completely regenerate the contents of a memory segment
have little use for the segment. This is particularity true of shared memory
segments being used for inter-process communication.
Another new feature of v1.2 is the SEG_SIZEABLE flag. It allows a shared
memory segment to be reduced. Prior to v1.2 a shared memory segment could be
expanded but not reduced, because the designers thought that allowing one
process to shrink a shared segment might cause another to fail. While in some
cases this might be true, the definition of shared memory implies cooperating
processes. Since some processes may cooperate in reducing the size of a memory
block, the SEG_SIZEABLE flag now permits this.
A process relinquishes its access to a shared segment via the DosFreeSeg API
call. After issuing this call the selector is no longer valid for that
process. Accesses made with it will result in a General Protection Exception,
terminating the offending process. Shared memory is not actually freed until
all processes having access to it either issue calls to DosFreeSeg or
terminate.
The concept of giveable vs. gettable memory can be confusing. A process uses
the DosGiveSeg API to allow another process to access a shared segment. This
API call deals with two selectors, one used by the current process and one to
be used by the process obtaining access. Under certain circumstances the two
selectors could be different, although I have never found this to be the case.
Flagging a memory segment as both giveable and gettable guarantees that the
same selector is assigned to all processes.
Another form of anonymous memory, huge shared memory, allows multiple 64K
segments to be assigned to a memory object. The DosAllocHuge API call creates
the huge memory object. This API's parameters include both the number of
segments requested and the maximum number that may be used via the
DosReallocHuge API. The selector of the first allocated segment is returned
and used to identify the memory allocation. The programmer must determine when
a memory access will cross a segment boundary and use the value returned by
the DosGetHugeShift API to calculate the proper selector. The value returned
by this API represents a power of two that should be added to one selector in
a huge memory object in order to calculate the next selector in the object.
For example, if DosGetHugeShift returned the value 4, the value to be added to
huge selectors would be 16 (24). This value is constant for a particular
version of OS/2 but may change with other versions. Huge memory objects may be
freed by calling the DosFreeSeg API and passing the first selector in the huge
memory block.


Named Shared Memory


Named shared memory is similar in both function and features to anonymous
shared memory. The difference is that named shared memory is given a symbolic
name that appears to be part of the file system. Any process which knows the
name of the shared memory may request access. The names of all named shared
memory begin with the directory name \SHAREMEM and may be constructed just
like file system pathnames. For example, \SHAREMEM\THIS\IS\MY\MEMORY.XXX is a
valid name for shared memory. The DosAllocShrSeg API creates shared memory and
the DosGetShrSeg API accesses existing named shared memory. Both return a
selector that may be used, in the context of the calling process, to access
the memory. DosFreeSeg relinquishes access to named shared memory. The shared
memory is destroyed after being freed by all using processes.


Queues



Queues are named IPC objects that may be opened and written to by any process
knowing the Queue name. Figure 6 lists the OS/2 API calls used with queues. As
their name implies, queues are collection points for data. Any number of
processes may add records to a queue, but only the process that created the
queue may read and remove records. These rules force a natural client/server
situation in which multiple clients place requests for service into the queue
and the server process removes and processes them as its resources permit.
OS/2 queues are supported by a dynamic link library (DLL) provided by OS/2.
They rely on anonymous shared memory to provide storage for queue records. In
fact, the implementation of queues in OS/2 contains no privileged
functionality. It is also an application of the OS/2 kernel.
A queue is created via the DosCreateQueue API call. The parameters to this API
are the name of the queue and the queue ordering method. Figure 7 shows some
possible ordering methods.
The DosCreateQueue API returns a handle that may be used with the
DosReadQueue, DosPeekQueue, or DosPurgeQueue APIs. These APIs can only be used
with the queue owner handle returned by DosCreateQueue. Client applications
may call the DosOpenQueue API to receive a handle that may be used with the
DosWriteQueue API to add records to the queue. The DosOpenQueue API releases
the queue handle and relinquishes access to the queue. When the queue owner
calls this API, it destroys the queue even if client processes still have
handles open.
As mentioned earlier, the queue IPC mechanism is built primarily on anonymous
shared memory. Client processes are responsible for allocating shared memory
of the appropriate size, placing the queue record in it, and giving access to
the queue owner before freeing the memory. The queue owner receives a pointer
to this shared memory when it reads a record from the queue. Unless the queue
owner completely trusts all of the queue clients, the owner should issue the
DosGetSeg API to ensure access to the passed memory segment. In addition, the
DosSizeSeg API verifies that the segment size is at least as large as the
client process stated. For this reason queue owners should require that shared
memory passed to them be both GIVEABLE and GETTABLE. This defensive
programming by the queue owner can save a lot of debugging time when an errant
client process begins passing bad memory selectors. Figure 8 shows the general
sequence of events to use in handling queues.


Pipes


Pipes are one of the most useful IPC features offered by OS/2. They allow
large amounts of information to be transferred between processes, while
requiring a minimum of collaboration. Most IPC features require advanced
planning and agreement for sharing resources. For example, if a system
semaphore is used and is protecting shared memory, all processes that will use
the memory must agree to use the semaphore. While some agreement may be needed
to use pipes, it is usually on the content of the data passed rather than how
it is passed.
OS/2 pipes are based on the paradigm of serial file I/O. For the most part a
program which correctly performs file I/O can easily be modified to perform
I/O from a pipe. Not all features of the file system are supported however.
You can only access pipes sequentially; seeking is not supported. Also, you
can't rewind and re-read a pipe, once data is read it is lost.
Anonymous pipes are the simplest form of pipe. They are commonly used by the
command interpreter to perform command line piping. (Anonymous pipes were
discussed in detail in my article in the July 1990 CUJ so I will only briefly
review them.) Anonymous pipes are created via the DosMakePipe API call, which
returns a read and a write handle. These handles are valid only in the context
of the creating process or one of its descendants. The handles cannot be
passed to an unrelated process and used. An anonymous pipe is implemented as a
circular buffer, with the write handle adding data to the tail of the buffer
and the read handle extracting data from the head of the buffer. Figure 9
lists the OS/2 API calls commonly used with anonymous pipes.
Named pipes are built on the client/server model of computing. One process
(the server) has a set of resources that it can export to the community at
large. Other processes (the clients) make requests of the server process,
allowing it to perform services for them.
Like the other named IPC features supported by OS/2, the names of pipes are
modeled on the file system. All pipe names begin with the directory name \PIPE
and follow the syntax of file system pathnames. This name is passed to the
DosMakeNmPipe API to create a named pipe (or named pipe instance) and to the
DosOpen API to obtain a client handle to the pipe. Once a pipe handle is
obtained, it can be used just like a normal system file handle. Named pipes,
however, have some additional API calls to help connect and disconnect the
virtual circuit between asynchronous processes. Figure 10 demonstrates example
code that represents typical use of named pipes from both the client and
server end of the pipe. Of course production code should contain much more
error checking.
Named pipes support a unique feature called Multiple instances. This feature
is useful when a named pipe server wants to export multiple pipes without
giving them distinct names. Using multiple instances a server process (or
multiple processes) can create many pipes with the same name. Typically, a
separate thread is created to monitor each pipe instance. When a client
process connects to this pipe name, OS/2 treats the request like a telephone
rotary and performs the connection to the next available pipe. This allows
several clients to be connected to the same pipe name simultaneously.


Signals


OS/2 signals are a low-level form of IPC that simulate hardware interrupts of
the receiving process. Processes may inform OS/2 of their ability to process
signals via the DosSetSigHandler API call. Processes that do not set up their
own signal handling are assigned default handlers that either ignore the
signal or terminate the process, depending on the signal received. Figure 11
shows the signals supported.
The operating system generates SIG_BROKENPIPE whenever a pipe being used by a
process is unexpectedly broken. The process ignores SIG_BROKENPIPE unless it
has registered a signal handler to process it. The user generates
SIG_CTRLBREAK and SIG_CTRLC by pressing the appropriate keyboard keys. By
default, these signals result in process termination. The DosKillProcess API
generates SIG_KILLPROCESS, which also defaults to process termination. The
user-defined signals SIG_PFLG_A, SIG_PFLG_B and SIG_PFLG_C may be used for any
purpose. These signals, generated by the DosFlagProcess API, can only be sent
to the calling process or one of its descendants. The signal can pass one word
of application-defined data. By default these signals are ignored.
In general, signals are seldom used. They are too primitive for most
communication between processes.


Dynamic Data Exchange


Dynamic Data Exchange (DDE) is an event-driven protocol for interprocess
communication first developed under Microsoft Windows. DDE relies heavily on
the message passing ability of the Presentation Manager component of OS/2. It
is only supported for use among PM applications. DDE, like named pipes, is
implemented under the client/server model of communication. The basis for
communication under DDE is the "conversation." A client process establishes a
conversation by broadcasting a WM_DDE_INITIATE message via the WinDdeInitiate
API call. This message passes two arguments -- an application name and a topic
name. The message is delivered to all top-level windows that exist under PM.
If any window receiving the INITIATE message recognizes the passed application
name and supports the requested topic, it responds to the client process by
sending an acknowledgment message, WM_DDE_INITIATEACK. At this point the
client can direct requests to the responding server via the WM_DDE_REQUEST
message or it can establish a hot link by sending the WM_DDE_ADVISE. The
REQUEST message is a one-time solicitation for information that can be
provided by the server process. A hot link, on the other hand, is a request to
receive the information and to be updated if the data subsequently changes.
The response to both these messages is a WM_DDE_DATA message, which contains
pointers to the requested data.
Conversations may continue for as long as both the client and server wish.
Using the WM_DDE_TERMINATE message, either the client or the server can
terminate the conversation. Once a conversation is terminated, the client must
establish a new conversation via WinDdeInitiate before sending any additional
requests. A client can also terminate a hot link without ending the
conversation by sending the WM_DDE_UNADVISE message.
All DDE messages contain a consistent set of parameters that are generally
pointers to data. The data pointed to, however, is application-dependent and
must be agreed upon in advance by both parties. These pointers are always to
one of two predefined structures know as DDEINIT and DDESTRUCT, which are
detailed in Figure 12. The WM_DDE_INITIATE message passes the DDEINIT
structure, which contains the application and topic names of the request. The
DDESTRUCT structure is a general purpose structure used to send requests and
receive responses. Immediately following the DDESTRUCT structure (part of the
same data object) are two variable length fields, the item name and the actual
item data. Two fields in the structure contain the offsets (from the base of
the structure) of these fields.
Figure 13 lists DDE messages and API calls.


Conclusion


In this article I have discussed the major forms of interprocess communication
under the current versions of OS/2. While space constraints prohibit an
in-depth study of any of these features, I have tried to offer a well-rounded
introduction. OS/2 supports a powerful set of IPC mechanisms that may be used
alone or in combination to support the needs of application programmers.
I also omitted the extensions to the OS/2 IPC supported by the LAN Manager
component of OS/2. LAN Manager is an added cost to the base OS/2, although I
suspect it will be present on most OS/2 systems.
For readers interested in additional information on OS/2 IPC, I recommend the
following books and magazines:
Duncan, Ray, Advanced OS/2 Programming, Microsoft Press, 1989.
Letwin, Gordon, Inside OS/2, Microsoft Press, 1988.
Southerton, Alan, Advanced OS/2 Presentation Manager Programming,
Addison-Wesley, 1989.
OS/2 Toolkit, Microsoft Corp.
Microsoft Systems Journal, Microsoft Corp.
Figure 1 OS/2 API Calls Available for Manipulating RAM, Fast-Safe RAM, and
System Semaphores
Semaphore API for synchronization/resource control

DosRequestSem Request ownership of a semaphore
DosClearSem Relinquish semaphore ownership

Semaphore API for event signaling

DosSemSet Unconditionally set a semaphore
DosSemClear Unconditionally clear a semaphore
DosSemWait Wait on a semaphore to be cleared
DosSemSetWait Set a semaphore and then wait until it

 is cleared by another process/thread
DosMuxSemWait Wait for one of several semaphores to
 be cleared

Semaphore API for use with Fast-Safe RAM Semaphores

DosFSRamSemRequest Request ownership of a Fast-Safe
 RAM Semaphore
DosFSRamSemClear Release ownership of a Fast-Safe
 RAM Semaphore

Semaphore API for accessing System Semaphores

DosCreateSem Create a system semaphore
DosOpenSem Request access to an existing system
 semaphore
DosCloseSem Close a system semaphore
Figure 2 Example of a RAM Semaphore to Protect a Static Variable
#include <os2.h>

SHORT GetUpdateAppCounter(SHORT sUpdCnt)
{
 auto USHORT usRC;
 auto SHORT sInstCnt;
 static SHORT sCounter = 0;
 static ULONG ulSem = 0L;

 usRC = DosSemRequest(&ulSem, SEM_INDEFINATE_WAIT);
 if (usRC)
 ErrorMsg("GetUpdateAppCounter", usRC); // return
 sCounter += sUpdCnt;
 sInstCnt = sCounter;

 DosSemClear(&ulSem);
 return(sInstCnt);
}
Figure 3
 struct _DOSFSRSEM
 {
 USHORT cb;
 PID pid;
 TID tid;
 USHORT cUsage;
 USHORT client;
 ULONG sem;
 };
 typedef struct _DOSFSRSEM DOSFSRSEM;

cb Specifies the length of the structure. Currently
 this must be initialized to 14.

pid Specifies the process ID of the owning process.

tid Specifies the owning thread ID.

cUsage Count of the number of times the owner has
 requested the semaphore without clearing it.

client A word reserved for use by the semaphore owner.


sem The actual RAM semaphore
Figure 4 OS/2 API Calls Used in Manuipulating Anonymous Shared Memory
DosAllocSeg Allocate a memory segment.
DosFreeSeg Free a memory segment.
DosGetSet Obtain access to a shared segment.
DosGiveSeg Give shared segment access to another
 process.
DosReallocSeg Change the size of a memory segment.

DosAllocHuge Allocate a huge memory block composed of
 multiple segments.
DosGetHugeShift Selector shift count for accessing
 multiple segments of a huge block.
DosReallocHuge Change the size of a huge memory block.

DosLockSeg Lock a discardable segment during
 access.
DosUnlockSeg Release discardable segment lock.

API calls used for Named Shared Memory

DosAllocShrSeg Allocate a named shared segment.
DosFreeSeg Free a memory segment.
DosGetShrSeg Obtain access to a named shared segment.
DosReallocSeg Change the size of a memory segment.
Figure 5
SEG_DISCARDABLE The segment may be discarded.
SEG_GETTABLE A shared segment that other
 processes may access via the
 DosGetSeg API.
SEG_GIVEABLE A shared segment that may be
 given to other processes.
SEG_NONSHARED A nonshared, nondiscardable
 segment.
SEG_SIZEABLE Specifies that a shared segment
 may be reduced in size.
Figure 6 OS/2 API Calls for Queue IPC Features
Access to Queue

DosCreateQueue Creates a queue, assigns queue
 name, and determines owner process.
DosOpenQueue Opens an existing queue to allow a
 client to add records. Returns
 queue handle and owner PID.
DosCloseQueue Terminates access to a queue and
 invalidates client handle. If
 called by the owner, the queue is
 destroyed.

Queue I/O

DosReadQueue Removes a record from the queue.
 May only be called by owner.
DosPeekQueue Access queue records without
 removing them from queue. May only
 be called by owner.
DosWriteQueue Adds a record to a queue.


Miscellaneous Queue Functions

DosPurgeQueue Discards all records currently in
 the queue. May only be called by
 owner.
DosQueryQueue Returns the number of records
 currently in the queue.
Figure 7
QUE_FIFO First In First Out ordering, records
 will be read by age - oldest first.
QUE_LIFO Last In First Out ordering, records
 will be read by age - newest first.
QUE_PRIORITY Queue records are ordered by an
 application defined priority
Figure 8
1. Create queue via DosCreateQueue.
2. Block reading queue via DosReadQueue.
3. DosGetSeg on passed pointer to insure access.
4. DosSizeSeg to insure valid segment length.
5. Process the request represented by the record.
6. DosFreeSeg to free shared memory segment.
7. Go to #2.

Queue Owner

1. Open the queue via DosOpenQueue.
2. Allocate the memory segment via DosAllocSeg.
3. Construct the queue record in the memory segment.
4. Give the segment to the queue owner via DosGiveSeg
 (Owners Process ID is returned from DosOpenQueue).
5. Write record to queue via DosWriteQueue.
6. Free the memory segment via DosFreeSeg (the segment
 will not actually be destroyed since access was
 given to the queue owner).
7. Go to #2.
Queue Client
Figure 9 OS/2 API Calls Used with Pipes
Anonymous Pipe API

DosMakePipe Create an anonymous pipe and
 returns the handles used to
 access it.
DosClose Close an anonymous pipe handle.
DosRead Read from an anonymous pipe
 (must use "read handle").
DosWrite Write to an anonymous pipe
 (must use "write handle").
DosDupHandle Useful in assigning a pipe
 handle to standard input or
 standard output for child
 inheritance.
DosSetFHandState Modify a handles attributes.

Named Pipe API (Server End)

DosMakeNmPipe Creates an instance of a named
 pipe and return a handle used to
 access it.
DosConnectNmPipe Block waiting for a client

 process to open a named pipe.
DosDisConnectNmPipe Disconnect a named pipe from a
 client process.
DosRead Read from named pipe.
DosWrite Write to named pipe.
DosClose Close a handle to a named pipe.

Named Pipe API (Client End)

DosOpen Open a named pipe.
DosWaitNmPipe Wait for an instance of a named
 pipe to become available.
DosRead Read from a named pipe.
DosWrite Write to a named pipe
DosTransactNmPipe Writes data to a named pipe and
 then reads a response before
 returning. Equivalent to
 DosWrite followed by DosRead.
DosCallNmPipe Similar to DosTransactNmPipe but
 the pipe is also opened and then
 closed. Equivalent to DosOpen,
 DosTransactNmPipe, DosClose.

Named Pipe API (Either End)

DosPeekNmPipe Looks at named pipe data without
 removing it from the buffer.
DosQNmPHandState Returns information on the state
 of a named pipe.
DosSetNmPHandState Set name pipe state.
DosQNmPipeInfo Returns additional information
 about a named pipe.
DosQNmPipeSemState Returns information about a
 named pipe associated with a
 semaphore.
DosSetNmPipeSem Associates a semaphore with a
 named pipe.
Figure 10
#define PIPENAME "\\PIPE\SAMPLE.PIP"
#define DEFAULT_TIMEOUT 5000L // 5 seconds

static HPIPE hPipe;
static USHORT usNoBytes, usRC;
static CHAR acInBuf[4096], acOutBuf[4096];

DosMakeNmPipe(PIPENAME, &hPipe, PIPE_ACCESS_DUPLEX,
 PIPE_WAIT PIPE_TYPE_MESSAGE
 PIPE_UNLIMITED_INSTANCES.
 sizeof(acOutBuf), sizeof(acInBuf),
 DEFAULT_TIMEOUT);
while (TRUE)
{
 DosConnectNmPipe(hPipe);
 while (TRUE)
 {
 usRC = DosRead(hPipe, acInBuf, sizeof(acInBuf),
 &usNoBytes);
 if (usRC 0 == usNoBytes)
 break;

 // Process request - format output in acOutBuf and
 // response length in usNoBytes
 DosWrite(hPipe, acOutBuf, usNoBytes, &usNoBytes);
 }
 DosDisConnectNmPipe(hPipe);
}

Typical Name Pipe Processing (Server End)


while (TRUE)
{
 usRC = DosOpen(PIPENAME, &hPipe, &usAction, 0L,
 FILE_NORMAL, FILE_OPEN,
 OPEN_ACCESS_READWRITE, 0L);
 if (0 == usRC)
 break;
 DosWaitNmPipe(PIPENAME, NP_DEFAULT_WAIT);
}
while (bRequests) // while requests to be processed
{
 // Construct request in acOutBuf, length in usNoBytes
 DosTransactNmPipe(hPipe, acOutBuf, usNoBytes,
 acInBuf, &usNoBytes);
 // Process response in acInBuf, lenght in usNoBytes
}
DosClose(hPipe);

Typical Named Pipe Processing (Client End)
Figure 11
SIG_BROKENPIPE Connection to a pipe was broken.
SIG_CTRLBREAK Control-Break was detected.
SIG_CTRLC Control-C was detected.
SIG_KILLPROCESS Request to terminate process.
SIG_PFLG_A User defined signal A.
SIG_PFLG_B User defined signal B.
SIG_PFLG_C User defined signal C.
Figure 12 Structures Used in Passing Data Via DDE
struct _DDEINIT
{
 USHORT cb;
 PSZ pszApplName;
 PSZ pszTopic;
};
typedef struct _DDEINIT DDEINIT;


struct _DDESTRUCT
}
 ULONG cbData;
 USHORT usStatus;
 USHORT usFormat;
 USHORT offszItemName;
 USHORT offabData;
};
typedef struct _DDESTRUCT DDESTRUCT;
Figure 13
WM_DDE_INITIATE Request a conversation.
WM_DDE_INITIATEACK Send by server to acknowledge

 the initiate request.
WM_DDE_TERMINATE Terminates a conversation.
WM_DDE_ACK Acknowledges receipt of request
 other than initiate.
WM_DDE_REQUEST Request information from
 server.
WM_DDE_DATA Returns data to the client.
WM_DDE_ADVISE Establish hot link to server.
WM_DDE_UNADVISE Terminate server hot link.
WM_DDE_POKE Request application to accept
 unsolicited data item.
WM_DDE_EXECUTE Sends a command string to a
 server to be executed.

DDE Messages

WinDdeInitiate Initiate a DDe conversation
WinDdePost Post a DDE message to a window.
WinDdeRespond Sends an acknowledge message.

DDE API Calls










































UNIX Interprocess Communications


William J. Freda


William J. Freda is a senior consultant with Automated Concepts Inc.
Specializing in UNIX/C applications in a relational database environment,
William has had a variety of assignments throughout AT&T and Bell
Laboratories. You can reach him at 90 Woodbridge Center Drive, Woodbridge,
N.J. 07095, (201) 602-0200


Interprocess Communication (IPC) enables two or more processes to exchange
information, share data, and synchronize execution. The basic interprocess
communications facilities are signals, pipes, and named pipes. The more
advanced facilities are semaphores, shared memory, and message queues. In this
article, I describe each with corresponding advantages and disadvantages.


Signals


Signals are sent to processes when a particular event occurs or when some type
of unrecoverable error occurs. Typically, user processes will trap most of
these signals to perform some clean-up work before exiting the program. User
processes can also send signals to each other via the kill (2) system call,
but this is limited to descendants, or children, of the original proccess and
to the superuser.


Pipes


A pipe is a unidirectional communication channel open between a parent and
child process. A process can open a pipe to another process for reading or
writing but not both. This limitation makes pipes unsuitable for processes
that need to communicate bidirectionally.


Named Pipes


A named pipe is a FIFO (first in, first out) file. One process writes into the
named pipe and the second process reads it. Since a named pipe is a regular
file, its access permission can be set to allow all to use it. Thus, a named
pipe is not limited to parent and child processes. One disadvantage of named
pipes is that user processes can't be selective about what type of information
they want to read from the pipe. In other words, if you were looking for a
particular record in a named pipe, you would have to sequentially read the
pipe until you found the record. Once information is read from a named pipe it
is lost. In adddition, only one process should have a named pipe open ffor
reading at a time. If more than one process has a named pipe open, the kernel
will effectively toggle between the processes, giving one process one record,
the second process the next record and so on.


Semaphores


Semaphores enable processes to synchronize execution. A semaphore is a
positive integer that supports two possible operations: P and V. The P
operation decrements the value of a semaphore, the V operation increments it.
The semaphore's value must always be greater than or equal to 0. If a process
attempts to decrement the semaphore's value with a P operation when its value
is 0, the kernel puts the process to sleep until the operation can complete
successfully. Semaphores have the following advantages over signals:
Any process can use a semaphore as long as the permissions are set
accordingly.
A process can postpone execution until a semaphore reaches a certain value.
Processes can perform operations on more than one semaphore at a time. If all
operations cannot be performed simultaneously, none are performed.


Shared Memory


Shared memory enables processes to attach a common data area to share
information. Once attached, the shared memory segment can be accessed the same
way you access a memory block obtained from malloc(3C). Typically, you use a
semaphore to ensure that only one proccess at a time writes to shared memory.


Message Queues


A message queue enables two or more processes to exchange information. Message
queues have some advantages over pipes and named pipes:
Any process can use a message queue as long as the permissions are set
accordingly.
A message queue is a bidirectional communication channel. Processes can send
and receive messages over the same message queue.
Processes can select a particular message type without having to sequentially
read each message.
These IPC facilities are beneficial because they enable you to split tasks
among multiple processes. Say, for example, that you have an application that
reads records from a tape, validates them, then updates several tables in the
database. Using the IPC facilities, you can split the application into two
separate processes: one that reads and validates records from the tape and one
that updates the database. With this scheme, the second process updates the
database while the first process reads and validates the next record. If this
could cut processing time by 1 percent with 100,000 records on a tape (10ms
per record), the net reduction in processing time would be 1,000 seconds, or
16 minutes and 40 seconds.
A message queue enables two or moe processes to exchange information, ranging
from a simple character string to an array of C structures.
Before a process can send or receive messages, it must obtain a message queue
identifier (msqid) from the UNIX kernel. The msqid, which is similar to a file
descriptor, is basically an index into a set of kernel tables. The msgget (2)
system call is used for this purpose (see Listing 1).



Synopsis


#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <errno.h>
int msgget(key, msgflg)
key_t key; int msgflg;


Argument Description


key is a user-chosen name for a message queue. If key is set equal to
IPC_PRIVATE, it informs the kernel that this is a private message queue.
Private message queues are known only to the calling process and any child
processes. This is roughly equivalent to opening a bidirectional pipe between
two processes.
msgflg is an integer value that contains the access permissions of the message
queue ORed with the control flags IPC_CREAT and/or IPC_EXCL. The access
permissions for a message queue are similar to the access permission for a
regular file. The IPC_CREAT control flag instructs msgget() to create a
message queue if one doesn't already exist. The IPC_EXCL control flag is used
with IPC_CREAT to ensure that a new message queue is created and to return an
error if one already exists.
Upon successful completion, msgget() returns a message queue identifier.
Otherwise, msgget() returns -1 and places the error number in the external
integer errno.


Possible Errors


EACCES -- A message queue identifier already exists for key, but access
permission is denied to this process.
ENOENT -- The message queue needs to be created but msgflg did not contain
IPC_CREAT.
ENOSPC -- The maximum number of message queues system wide has been reached.
This error rarely occurs.
EEXIST -- A message queue identifier already exists for key. This occurs when
msgflg contained IPC_CREAT IPC_EXCL.
The example in Listing 1 creates a (possible existing) message queue with
owner and group read/write permission. The example in Listing 2 creates a new
message queue with owner and group read/write permission. It also notifies the
user with an appropriate error message if the message queue already exists.


Message Queue Operations




Sending Messages


Sending messages is accomplished using the msgsnd(2) system call.


Synopsis


#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <errno.h>

int msgsnd(msqid, msgp, msgsz, msglf)
int msqid;
struct mssg *msgp;
int msgsz, msgflg;


Argument Description


msqid is a message queue identifier obtained from a previous call to
msgget(2). msgp is a pointer to a user-defined structure:
struct mssg {

long mtype; /* Message type */
char mtext[LENGTH]; /* Message text */
}
mtype -- Can be used to assisn the message a particular type or
classification. The receiving process uses mtype to select a particular
message on queue without having to sequentially read each message.
mtext -- The text of the message. LENGTH is any size your application requires
up to a system-defined maximum.
msgsz -- The length of the message contained in mtext.
msgflg -- A control flag that contains either 0 or IPC_NOWAIT. If the message
queue is full, the kernel will normally put the process to sleep until the
message queue has sufficient room to place the message.
The IPC_NOWAIT control flag disables this feature, and msgsnd() returns
immediately if the message queue is full.
Upon successful completion, msgsnd() returns 0. Otherwise, msgsnd() returns -1
and places the error number in the external integer errno. See the example in
Listing 3.


Possible Errors


EINVAL
1) Msqid is an invalid message queue identifier
2) Mtype is less than 1
3) Msgsz is less than 0 or greater than the system limit
EACCESS -- Access permission denied.
EAGAIN -- The message queue is full. this only happens when msgflg contains
IPC_NOWAIT.
EFAULT -- Only a program bug can cause this error. Msgp points to an illegal
address.
EINTR -- The process was asleep waiting to send a message and an interrupt
occurred.
EIDRM -- The process was asleep waiting to send a message and someone removed
the message queue from the system.


Receiving Messages


Receiving messages is accomplished using the msgrcv(2) system call. See the
example in Listing 4.


Synopsis


#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <errno.h>
int msgrcv(mspid, msgp, msgsz,
msgtyp, msgflg)
int msqid;
struct mssg *msgp;
int msgsz;
long msgtyp;
int msgflg;


Argument Description


msqid -- A message queue identifier obtained from a previous call to msgget
(2) .
msgp -- A pointer to the user-defined structure described above.
msgsz -- The length of the message buffer mtext. This prevents the kernel from
inadvertently overwriting the buffer.
msgtyp -- A particular message type you wish to receive. If it contains 0 the
first message on the message queue is returned regardless of the message type.
msgflg -- A control flag that contains either 0, IPCC_NOWAIT, MSG_NOERROR , o
r IPC_NOWAIT ORed with MSG_NOERROR. The kernel will normally put the process
to sleep if the message queue is empty or there is no message of msgtyp
currently on queue. The IPC_NOWAIT control flag disables this feature and
causes msgrcv() to return immediately. If the message is too long, msgrcv()
will normally return an error. The MSG_NOERROR control flag disables this
feature and msgrcv() will silently truncate any message that is too long.
Upon successful completion, msgrcv() returns the number of bytes places in
mtext. Otherwise, msgrcv() returns -1 and places the error number in the
external integer errno.


Possible Errors



EINVAL
1) Msqid is an invalid message queue identifier
2) Msgsz is less than 0
EACCESS -- Access permission denied.
E2BIG -- The message on queue is longer than msgsz. This error is not returned
if msgflg contains MSG_NOERROR.
ENOMSG -- There is no message of the desired type on queue. This only happens
when msgflag contains IPC_NOWAIT.
EFAULT -- Only a program bug can cause this error. Msgp points to an illegal
address.
EINTR -- The process was asleep waiting to receive a message and an interrupt
occurred.
EIDRM -- The process was asleep waiting to receive a message and someone
removed the message queue from the system.


Controlling Message Queues


In addition to sending and receiving messages, your application may need to
exercise some degree of control over a message queue. The msgctl(2) system
call is used for this purpose. It performs the following functions:
It enables the owner of the message queue to cchange the access permissions or
remove the message queue from the system.
It also enables anyone with read permission to gather statistics that detail:
1) The process id of the last process that sent a message.
2) The date and time the last message was sent.
3) The process id of the last process that received a message.
4) The date and time the last message was received.
5) The number of messages currently on queue.
6) The total number of bytes on queue.


Synopsis


#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <errno.h>

int msgctl (msqid, cmd, buf)
int msqid, cmd;
struct msqid_)ds *buf;


Argument Description


msqid -- A message queue identifier obtained from a previous call to msgget
(2).
cmd -- One of the following commands:
IPC_STAT -- Gather statistics as detailed above.
IPC_SET -- Change the access permissions.
IPC_RMID -- Remove the message queue from the system.
Upon successful completion, msgctl() returns 0. Otherwise msgctl() returns -1
and places the error number in the external integer errno.


Possible Errors


EINVAL
1) Msqid is an invalid message queue identifier
2) Cmd is an invalid command
EACCESS -- Access permissions denied.
EPERM -- This error occurs whenever cmd is IPC_SET or IPC_RMID and someone
other than the owner or the super-user is attempting to change the access
permissions or remove the message queue.

EFAULT -- Only a program bug can cause this error. Buf points to an illegal
address.
Quite frankly, the only facility of msgctl() that I have found useful is
IPC_RMID, whicch removes a message queue from the system (see Listing 5). If I
want detailed information about a message queue or any other IPC facility, I
use the UNIX ipcs command, which dumps out the statistics described above.
The sample program in Listing 6 could easily fit into a number of different
applications. With this program, users can send one-line messages to each
other. The sen_mssg() function prompts the user to enter the login name of the
user to which he or she wants to send a message. send_mssg() then extracts the
UNIX userid by querying the password file using the library function getpwent
(3C). The message is then inserted into the message queue via msgsnd(2) with
the message type set equal to the userid.
Placing calls to recv_mssg() in different areas of the application gives the
users the illusion that the message has been received in near realtime. The
call to msgrcv(2) uses the IPCC_NOWAIT control flag and will return
immediately to the caller if no message is on queue.


Summary


Message queues allow multiple processes to exchange information
bidirectionally. Since a message queue is a UNIX kernel data structure and
thus part of the operating system, this information is passed between
processes without the need for any disk I/O. In addition, processes can select
a particular message type without having to sequentially read each message.
message queues eliminate the need for consuming processor cycles polling for a
particilar message to arrive. The UNIX kernel takes care of this for you by
putting your process to sleep and waking it when the message arrives.
By recognizing areas in an application where tasks could be performed
concurrently, IPC facilities such as message queues allow you to perform the
most work in the least amount of time.

Listing 1
int msqid; /* Message queue identifier */
key_t key=0x1024; /* Name of message queue */

/* Obtain a message queue identifier. If the
* message queue doesn't exist, create it.
*/
if ((msqid=msgget(key, 0660 IPC_CREAT)) == -1) {
/* The perror(3C) function prints the
* text of the error number contained
* in the external integer errno.
*/
perror("msgget() failed: ");
return(-1);
}


Listing 2
int msqid; /* Message queue identifier */
key_t key=0x1024; /* Name of message queue */

if ((msqid=msgget(key, 066 IPC_CREAT IPC_EXCL))==-1)}
/* Can't create a new message queue. One already
* exists by the same name. It is possible another
* process is using the same key to get a message
* queue. This prevents us from stepping on each
* other.
*/
fprintf(stderr,"Message queue already exists\n");
return(-1);
}


Listing 3
func()
{
int msqid, msglen;
char string[80];
struct mssg {
long mtype;
char mtext [BUFSIZ];
} msgbuf;

/* Prompt user for a message to send */
printf("Enter message: ");
gets(msgbuf.mtext);


/* Prompt user for a message type */
printf("Enter message type: ");
gets(string);
sscanf(string,"%ld",&(msgbuf.mtype));

/* Set the message length */
msglen=strlen(msgbuf.mtext);

/* Send the message. The kernel will put
* the process to sleep if the message queue
* is full.
*/
if (msgsnd(msqid, &msgbuf, msglen, 0) == -1) {
/* The perror(3C) function prints the
* text of the error number contained
* in the external integer errno.
*/
perror("msgsnd() failed: ");
exit(1);
}
}


Listing 4
func()
{
int msqid, msglen;
long msgtype;
char string[80];
struct mssg {
long mtype;
char mtext[BUFSIZ];
} msgbuf;

/* Ask the user if their looking
* for a particular message type.
*/
printf("Enter message type or hit <RETURN> for the \
first message on queue: ");

gets(string);
if (*string == NULL)
msgtype=0;
else
sscanf(string,"%ld",&msgtype));

/* Attempt to receive a message. The kernel
* will put the process to sleep until
* a message arrives.
*/
if ((msglen=msgrcv(msqid, &msgbuf, BUFSIZ, msgtype, 0)) == -1) {
/* The perror(3C) function prints the
* text of the error number contained
* in the external integer errno.
*/
perror("msgget() failed: ");
return(-1);
}


/* NULL terminate the message */
msgbuf.mtext[msglen]=NULL;

return(0);
}


Listing 5
/* Remove message queue. The argument buf
* is not needed but is casted to keep the
* compiler happy.
*/
if (msgctl(msqid, IPC_RMID, (struct msqid_ds *)0) == -1) {
/* The perror(3C) function prints the
* text of the error number contained
* in the external integer errno.
*/
perror("msgctl() failed: ");
return(-1);
}


Listing 6
/* Program: usrsnd
*
* Filename: listing1.c
*
* Description: Demonstration program that uses a message
* queue to enable users to send one line messages
* to each other.
*
* Contents: main(), send_mssg(), recv_mssg(),
* get_msg_queue()
*
* Author/Programmer: W. J. Freda
* Automated Concepts Inc.
*/
#include <stdio.h>
#include <signal.h>
#include <pwd.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

#define KEY 0x5642412
static struct mssg {
long mtype; /* Message type */
char mtext[BUFSIZ]; /* Message text */
} Msgbuf;
static int Msqid; /* Message queue identifier */
static int Firstime=1;

/* MAIN
*
* DESCRIPTION: Provides a simple menu that ties the
* send_msg() and recv_msg() functions
* together.
*
* INPUT: NONE

* OUTPUT: Program termination
*/
main()
{
char *mssg, *recv_mssg();
char selection[80];

while(1) {
printf("\n\n\t1) Send a message to a specific user\n");
printf("\t2) Receive messages\n");
printf("\t3) Exit\n");
printf("\n\t\tEnter selection: ");

gets(selection);

switch(*selection) {
case '1':
send_mssg();
break;
case '2':
if ((mssg=recv_mssg()) == NULL)
printf("\n\nNo messages on queue\n\n");

else
printf("\n\n%s\n\n",mssg);
break;
case '3':
exit(0);
default:
printf("\tInvalid selection.\n\n");
break;
}
}
}


/* SEND_MSSG FUNCTION
*
* DESCRIPTION: Prompts the user for a username and message
* and sends the message to that user via a message
* queue. The user can read the message using
* rcv_mssg().
*
* INPUT: NONE
* OUTPUT: 0 If successful
* -1 Otherwise
*/
send_mssg()
{
extern int Msqid; /* Message queue identifier */
extern struct mssg Msgbuf; /* Message structure */
extern int errno;
int msglen; /* Length of message */
char message[BUFSIZ]; /* Message text */
char username [20];
struct passwd *pass, *getpwnam();

if (Firstime) {
if ((Msqid=get_msg_queue()) == -1)

return(-1);

Firstime=0;
}

/* Prompt for user name */
printf("\n\nSend message to which user? ");
gets(username);

/* Validate username */
setpwent();
pass=getpwnam(username);
endpwent();

if (pass == NULL) {
printf("Invalid username.\n\n");
return(0);
}

/* Prompt user for a message to send */
printf("Enter message: ");
gets(Msgbuf.mtext);

if (*(Msgbuf.mtext) == NULL)
/* Nothing to send */
return(0);

/* Set the message length */
msglen=strlen(Msgbuf.mtext);

/* Set the message type equal to the userid
* we want to send the message to.
*/
Msgbuf.mtype = pass->pw_uid;

/* Send the message. The IPC_NOWAIT flag informs msgsnd()
* not to send the message if the message queue is full.
* Without the IPC_NOWAIT flag this process would be put
* to sleep by the kernel until the message can be sent.
*/

if (msgsnd(Msqid, &Msgbuf, msglen, IPC_NOWAIT) == -1) {
/* Error occurred */
return(-1);
}

return(0);
}


/* RECV_MSSG FUNCTION char *recv_mssg()
*
* DESCRIPTION: Attempts to retrieve a message from the
* message queue.
*
* INPUT: NONE
* OUTPUT: text Returns a pointer to the first
* character of the message text.
* NULL If there are no messages or an

* error occurred.
*/
char *recv_mssg()
{
extern int Msqid; /* Message queue identifier */
extern struct mssg Msgbuf; /* Message structure */
extern int Firstime;
int uid; /* User id */
int len;

if (Firstime) {
if ((Msqid=get_msg_queue()) == -1) {
return(NULL);
}
Firstime=0;
}

/* Get the user id */
uid=getuid();

/* Receive a message. The IPC_NOWAIT flag informs msgrcv()
* to return immediately if no message is on queue. Without
* the IPC_NOWAIT flag this process would be put to sleep by
* the kernel until a message of type uid is received.
*/
if ((len=msgrcv(Msqid, &Msgbuf, BUFSIZ, uid, IPC_NOWAIT)) == -1)
return(NULL);

/* Null terminate the message */
Msgbuf.mtext[len]=NULL;
return(Msgbuf.mtext);
}

/* GET_MSG_QUEUE FUNCTION
*
* DESCRIPTION: Gets and possibly creates a message
* queue identifier via the msgget(2)
* system call.
*
* INPUT: NONE
* OUTPUT: qid Message queue ID
* -1 Otherwise
*/
get_msg_queue()
{
int qid;

/* Acquire a message queue.
* The IPC_CREAT flag informs msgget() to create
* the message queue with the permission 0666.
* The IPC_CREAT flags inform msgget() to create
* the message queue if it doesn't already exist.
*/
if ((qid=msgget((key_t)KEY, 0666 IPC_CREAT)) == -1) {
return(-1);
}

return(qid);
}
































































A Non-Preemptive Multitasking Executive In C++


Michel de Champlain


Michel de Champlain is president of Cnapse, which specializes in C & C++
training, real-time programming and reusable software components. He is a
faculty member in the Computer Science department at College Militaire Royal
St-Jean. He has a MS in computer science and is completing a PhD at Ecole
Polytechnique, Montreal. He can be contacted at Cnapse, 11 Laurier, Suite 102,
Chambly, Qc, CANADA J3L 9Z7, tel: (514) 447-7221, fax: (514) 447-7297.


This article focuses on designing and implementing a non-reemptive
multitasking executive using the object-oriented features of C++. I've tested
this implementation on an AT clone using Zortech C++ v2.0 (small model). One
source file must be in assembly language to implement context switching
between tasks.


Object-oriented Versus Procedural


Object-oriented design (OOD) represents a different way of thinking compared
to the classical procedural approach. OOD focuses on the data to be
manipulated rather than on the procedures that do the manipulating. So the
data forms the basis for the software decomposition. In OOD, an abstract data
(or object) type is a class that defines a type and an associated set of
operations. These operations characterize the behavior of the underlying type.
OOD is not top down -- the resulting classes are more independent and easier
to reuse. One of the nicest aspects of OOD is the extensibility of object
types through inheritance. The original class remains untouched as new
functionality is added in a derived type. Basic classes are general object
types that can easily be reused for many extensions of a design.


Non-preemptive Multitasking Executive


I've presented an executive that is a multitasking, non-reemptive kernel with
a cooperative scheduling discipline. Tasks pass control from one to another
much like relay runners pass the baton. A task continues to run until it calls
ReSchedule. This function saves the tasks's context, determines the next
running (active) task, and transfers control to the new running task.


Executive Services


One of the executive's primary functions is overseeing the execution of tasks
that make up a multitasking application. The executive offers many services,
also named system calls. These include:
Start - Create and schedule a task to run.
Terminate - Delete the specified task (or itself).
 ReSchedule - Move the running task onto the ready queue, complete every
detail of scheduling and context switching, select a new task to run, move it
from the ready queue, mark it running, and finally, restore the context of the
new task.
 Suspend - Suspend execution of the specified task (or itself). A suspended
task will not resume execution until a Resume is issued. This method starts
the rescheduling method if the running task suspends.
 Resume - Resume execution of the specified task (previously suspended). This
method starts the rescheduling method.
 Self - Return the running task identification.
 Parent - Return the task identification of the task's parent (creator).
Besides these system calls, the specification also includes:
 Schedule - Schedule is a method private to the executive that encapsulates
the scheduling algorithm. Packaging it as a method lets you change the
scheduling algorithm without affecting the rest of the executive.


Task States And State Transitions


In a multitasking application based on this executive, tasks exist in one of
four states: Running, Ready, Suspended or Terminated:
Running - The task has control of the CPU and is executing.
Ready - The task is ready for execution but cannot gain control of the CPU
(enter the Running state) until a task in the Running state passes control to
it by terminating or suspending.
Suspended - The task suspends in mid-xecution and waits to be readied by a
Resume call.
Terminated - The task is not yet started or its execution is complete.
Figure 1 shows all the state transitions and how system calls change states.


The C+ + Implementation


Every state is an object, and all transitions are methods (system calls). The
first base class in the executive is StateQ (Listing 4), an abstract class
that hides all the doubly linked list management functions. An instance of the
class StateQ contains a First-n First-ut (FIFO) queue that can be accessed by
the insert method. Another method called transfer allows the executive to
remove a task from any queue (at any position) and reinsert it in another.
A second base class Task (Listing 3) is needed to construct tasks as separate,
sequential programs. Even though the tasks run concurrently, each task must
have a private context. The base class Task is the abstract class that hides
(and saves) that context for each of its instances. This private task context
is a snapshot of its own state that remains undisturbed while other tasks run.
A state consists of a stack pointer, a stack size, a stack base, an id, a
parent id, and a starting address.
In addition to these base classes, the class ReadyQ (derived from the base
class StateQ) has a specialized method called GetNextRunning. This method
retrieves the next ready task at the head of the ready queue. A next ready
task corresponds to the next running (active) task.
class ReadyQ : StateQ {

public:
ReadyQ( ) : StateQ(0, READY) { }
void GetNextRunning(void);
};


Starting From Main


A multitasking application is normally composed of several user tasks. You
specify these by providing your own verson of StartUpUserTasks. Listing 5
shows an example. All user tasks (Background, Task0, and Task1) are
implemented in the same file suuser. cpp. In a real application you would more
likely compile each user task separately.
Compare the output generated (in Figure 2) with the code in Listing 5. After
the version message printed, main (in Listing 4) creates three queues:
terminatedQ, suspendedQ, and readyQ. It also starts StartUpUserTasks task with
these statements:
(new Task (StartUpUserTasks,
1024))->Start();
readyQ->GetNext Running();
RunNext ();
Control now passes to StartUpUserTasks (Listing 5) where the Background,
Task0, and Task1 tasks are started:
(new Task(Background, 1024))->
Start();
(t0 = new Task(Task0, 1024))->
Start ();
(t1 = new Task(Task1, 1024))->
Start();
Remember, starting a task means creating and scheduling a task to run in the
readyQ so StartUpUserTasks can maintain control until it reschedules itself.
Thus, from this point, the execution involves 20 consecutive context switches,
allowing StartUpUserTasks, Background, Task0, and Task1 to print S. 01 in
sequence. (See the output in Figure 2.) This sequence will break when Task0
suicides (you can see when Task0 stops printing). StartUpUserTasks then
suspends Task1 before doing five consecutive context switches, printing five
sequences of S. in a row.
The controlling task resumes Task1, restarts Task0 and suicides gracefully.
Again, Task0 suicides, and the final printing sequence that will run forever
(until control-C is pressed) is 1., illustrating that Task1 and Background are
sharing the rest of the processor time.


Conclusion


This is not a complete executive, but it is a practical implementation using
inheritance in a multitasking programming context. Every system call is
localized and implemented through a method without changing any others. The
private Schedule method is isolated to allow an easy replacement by a more
sophisticated task scheduling mechanism. A black-ox scheduler gives complete
flexibility to the developer during all phases of the application. (See
"Practical Schedulers for RealTime Applications" by Robert Ward, CUJ April
1990.) No inter-ask communication has been covered. I hope to cover this topic
(as well as some real multitasking applications) in a future article. In this
article, I've concentrated mostly on design and C+ + programming techniques.
If you're interested in reusable software components and several applications
that exercise the executive, contact the author.
Figure 1 Tasks and State Transitions
Figure 2 Output
Non-Preemptive, eXecutive, Copyright 1990 by Cnapse

[2716] StartUpUserTasks started.
[2b32] Background started.
[2f4e] Task0 started.
[336a] Task1 started.
S.01S.01S.01S.01S.01S.01S.01S.01S.01S.01S.1S.1S.1S.1S.1S.1S.1S.1S.1S.S.S.S.S.S.1
[3720] Task0 started.
.10.10.10.10.10.10.10.10.10.10.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1
.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.^C

Listing 1
/* std.h - Standard Header File
* Copyright 1986-90 by Cnapse
* Written by: M. de Champlain
*/

typedef unsigned char uchar, byte;
typedef unsigned short ushort, word;
typeder unsigned long ulong, dword;
typedef int bool;

#define loop while(1)



Listing 2
/* list.h - List Management Header File
* Copyright 1988-90 by Cnapse
* Written by: M. de Champlain
*/


typedef struct link
{
struct link *previous;
struct link *next;
} LINK;


extern void *List_Allocate(word theNumberOfNodes, word theNodeSize);
extern void List_Free(void *theList);
extern void *List_RemoveHead(void *fromTheList);
extern void *List_RemoveTail(void *fromTheList);
extern void *List_Remove(void *theNode);
extern void List_InsertHead(void *theNode, void *toTheList);
extern void List_InsertTail(void *theNode, void *toTheList);
extern void List_InsertAfter(void *theNode, void *afterThisNode);
extern void List_InsertBefore(void *theNode, void *beforeThisNode);


#define List_IsEmpty(theList) ((bool) \
( ((LINK *)theList) == \
((LINK *)theList)->next ))


Listing 3
// npx.hpp - Non-Preemptive eXecutive Header
// Copyright 1990 by Cnapse
// Written by: M. de Champlain

#include "std.h"
#include "list.h"

typedef word *reg;
typedef enum { TERMINATED, READY, RUNNING, SUSPENDED } taskState;

class Task {
friend class StateQ;
friend class ReadyQ;
LINK n;
reg sp;
word *stackBase;
taskState state;
void (*taskStartingAddress)();
word stackSizeInBytes;
Task *self;
Task *parent;
void Schedule(void);
public:
Task(void (*task)(), word stackSize) { taskStartingAddress = task;
stackSizeInBytes = stackSize; }
Task *Start(void);
Task *Self(void);
Task *Parent(void);

void ReSchedule(void);
void Terminate(Task *id);
void Suspend(Task *id);
void Resume(Task *id);
};

extern Task *running;


Listing 4
// npx.cpp - Non-Preemptive eXecutive
// Copyright 1990 by Cnapse
// Written by: M. de Champlain

#include <stdio.h>
#include <stdlib.h>

#include "npx.hpp"

Task *running;

// ---- base class StateQ -------------------
class StateQ {
Friend class ReadyQ;
Task *header;
taskState state;
public:
StateQ(int nTasks, taskState st);
void Insert(Task *thisTask);
void Transfer(Task *thisTask);
~StateQ( );
};

inline StateQ::$tateQ(int nTasks, taskState st)
{
header = List_Allocate(nTasks, sizeof(Task));
state = st;
}

inline void StateQ::Insert(Task *thisTask)
{
thisTask->state = state;
List_InsertTail(thisTask, header);
}

inline void StateQ::Transfer(Task *thisTask)
{
thisTask->state = state;
List_Remove(thisTask);
List_InsertTail(thisTask, header);
}

inline StateQ::~StateQ()
{
List_Free( header );
}
// ---- derived class ReadyQ -------------------

class ReadyQ : StateQ {

public:
ReadyQ() : StateQ(0, READY) {}
void GetNextRunning(void);
};

inline void ReadyQ::GetNextRunning(void)
{
(running = List_RemoveHead(header))->state = RUNNING;
}

StateQ *terminatedQ, *suspendedQ;
ReadyQ *readyQ;
Task *Task::Start(void)
{
if (stackBase = new word[stackSizeInBytes/2])
{
/* establish new task's SP */
sp = (reg)((word)stackBase + stackSizeInBytes);
*--sp = (word) taskStartingAddress;
--sp; /* push bp */
parent = running;
self = this;
readyQ->Insert(this);
return self;
}
else
return 0;
}
Task *Task::Self(void)
{
return self;
}
Task *Task::Parent(void)
{
return parent;
}
void Task::Schedule(void)
{
readyQ->GetNextRunning(); // assume at least one task is READY
}
extern void ContextSwitch(void);
extern void RunNext(void);
reg *addrRunningTcbSp;

void Task::ReSchedule(void)
{
// save the address of the runningTcb's stack ptr for ContextSwitch.
addrRunningTcbSp = &running->sp;
// put the running task in the READY queue
readyQ->Insert(running);
Schedule();
ContextSwitch();
}
void Task::Terminate(Task *id)
{
if ( id->state != TERMINATED )
{
delete id->stackBase;
terminatedQ->Transfer(id);

delete id;
if ( id == running )
{
Schedule();
RunNext();
// should never back here
}
}
}
void Task::Suspend(Task *id)
{
if ( id->state != SUSPENDED )
{
suspendedQ->Transfer(id);
if ( id == running )
{
// save address of the runningTcb's stack ptr for ContextSwitch.
addrRunningTcbSp = &running->sp;
Schedule();
ContextSwitch();
// will come back here after a Resume
}
}
}
void Task::Resume(Task *id)
{
if ( id->state == SUSPENDED )
readyQ->Transfer(id);
}
extern void StartUpUserTasks(void);
main(void)
{
printf("Non-Preemptive executive, Copyright 1990 by Cnapse\n\n");
terminatedQ = new StateQ(10, TERMINATED );
suspendedQ = new StateQ( 0, SUSPENDED );
readyQ = new ReadyQ();

// Make StartUpUserTasks RUNNING
(new Task(StartUpUserTasks, 1024))->Start();
readyQ->GetNextRunning();
RunNext();
}


Listing 5
// suuser.cpp - Start Up User Tasks
// Copyright 1990 by Cnapse
// Written by: M. de Champlain


#include <stdio.h>
#include "npx.hpp"

void Background(void)
{
printf("[%x]\tBackground started.\n", running->Self());
loop
{
running->ReSchedule();

putchar('.');
}
}

void Task0(void)
{
short n;

printf("[%x]\tTask0 started.\n", running->Self());

n = 10;
while (n--)
{
running->ReSchedule();
putchar('0');
}

running->Terminate(running);
}

void Task1(void)
{
printf("[%x]\tTask1 started.\n", running->Self());
loop
{
running->ReSchedule();
putchar('1');
}
}

void StartUpUserTasks(void)
{
short n;
Task *t0, *t1;

printf("[%x]\tStartUpUserTasks started.\n", running->Self());
(new Task(Background, 1024))->Start();
(t0 = new Task( Task0, 1024))->Start();
(t1 = new Task( Task1, 1024))->Start();

n = 20;
while (n--)
{
running->ReSchedule();
putchar('S');
}

running->Suspend(t1);

n: 5;
while (n--)
{
running->ReSchedule();
putchar('S');
}

running->Resume(t1);

(new Task(Task0, 1024))->Start();

running->Terminate(running);
}


Listing 6
TITLE cswitch.asm - interface for Zortech C++(Small Model)
DOSSEG
MODEL SMALL
CODE

EXTRN _addrRunningTcbSp:WORD
EXTRN _running:WORD

PUBLIC _ContextSwitch_Nv
_ContextSwitch_Nv PROC NEAR
push bp
mov bp, word ptr _addrRunningTcbSp
mov [bp], sp
mov bp, word ptr _running ; sp = running->sp
mov sp, [bp + 4] ; WARNING 4 depends of LINK
pop bp ; 2 ptrs(4)
ret
_ContextSwitch_Nv ENDP

PUBLIC _RunNext_Nv
_RunNext_Nv PROC NEAR
push bp
mov bp, word ptr _running ; sp = running->sp
mov sp, [bp + 4] ; WARNING 4 depends of LINK
pop bp ; 2 ptrs(4)
ret
_RunNext_Nv ENDP
END






























Interprocess Communication: A VAX/VMS Example


Michael J. Gilson


Mike Gilson is a senior system test engineer for the Allen-Bradley Co. and is
responsible for developing test software on VAX/VMS and UNIX platforms. You
may contact him at 747 Alpha Drive, Highland Heights, Ohio 44143.


This article describes the design and VAX/VMS implementation of interprocess
communication (IPC) primitives that allow processes to share information. It
also describes an architecture that reduces the number of IPC links between
processes but still allows any process to communicate with any other. The
implementation uses VAX/VMS I/O services and mailboxes as the IPC mechanisms.


Mailboxes: A VAX/VMS IPC Mechanism


Mailboxes are virtual I/O devices that can be used for communication among
processes. Unlike other I/O devices, which are hardware devices, mailboxes are
implemented in software. Mailbox operations include receiving mail, sending
mail, and rejecting mail. The receive and send operations may be performed
synchronously or asynchronously through a variety of system services. In this
application, data transfer is performed by a call to the SYS$QIO system
service. VAX/VMS provides higher level I/O system services, but all of them
eventually call SYS$QIO or SYS$QIOW (the synchronous version of SYS$QIO).
Mailboxes are flexible in size, message format, and protection. When a mailbox
is created, the programmer may specify the mailbox's size (in bytes) or use
the system default. Mailboxes place no restriction on message format, so the
programmer may read/write any structure to a mailbox as long as the message is
within the mailbox's size restrictions. Mailbox access is controlled through
the usual VAX/VMS protection strategies.
Although a mailbox may be used for two-way communication, this is not always
convenient, especially if asynchronous communication is desired. If process A
and process B are communicating through a single mailbox, both processes will
be notified that a message is in the mailbox regardless of which process sent
the message. Applying mailboxes as if they were one-way devices is often
easier. Using this approach, two mailboxes link processes A and B: one for
A-to-B transfers, another for B-to-A.


Architecture


When designing an IPC solution, the programmer needs to consider the tradeoffs
between minimizing the number of IPC links and maximizing performance.
Mailboxes require memory, which is deducted from process resources. Too many
mailboxes will slow down a system by eating up memory. Minimizing the number
of links requires more processing overhead. Two of the designs I'll discuss in
this article aim for maximum performance, and for minimum links. Our
implementation, which is a compromise of the two, is a third design.
When a link is established from each process to every other process in the IPC
network as in Figure 1, the total number of links for n processes is n * (n -
1)/2. This solution obtains a constant, minimum transfer time but requires a
large number of links. If the links are to be bidirectional, two mailboxes
will be required for each link: a total of n * (n - 1) mailboxes.
The minimum link design, shown in Figure 2, yields n - 1 links for n
processes. Since the links are bidirectional, 2 * (n - 1) mailboxes are
required. In this scheme, messages are passed from one process to another
until the destination is eventually reached. This approach requires message
handling by each process in the path between sender and receiver. The number
of links is minimized, but transfer time now depends on the number of
processes that handle the message. One transfer in the best case, n-2
transfers in the worst case, n/2 in the average case.
The architecture used in our application requires near-minimum links and
provides fast transfer time (see Figure 3). For n processes, this solution
requires n + 1 links and n + 1 mailboxes. IPC between client processes 0 and 1
requires a transfer from the client 0 to the server, then from the server to
client 1. This architecture minimizes mailboxes since all clients write to a
single server mailbox.


Server Functions


The server provides pass-through and broadcast functions to perform IPC
between any two processes. The pass-through function allows any process A to
send a message to any process B via the server, assuming both A and B have
established IPC links with the server. The broadcast function allows any
process to broadcast a message to all other processes linked to the server.


Server Startup


During startup, the server creates its receive mailbox (see Listing 2) by
calling the open_comm_link() primitive shown in Listing 3. open_comm_link()
creates a VAX/VMS mailbox using the SYS$CREMBX system service, which returns
the completion status code. The mailbox is created with the following
attributes: it is temporary and both protection and access mode are set to
lowest levels. If the mailbox already exists, the SYS$CREMBX service assigns a
channel to that mailbox. Thus cooperating processes need not consider which
process must execute first to create the mailbox. Table 1 shows arguments
passed to open_comm_link().
The server then calls the SYS$DCLAST system service to enable the asynchronous
receipt of messages. SYS$DCLAST causes an interrupt, declares the receive()
primitive as the interrupt handler, and passes rcvmbx as an argument to
receive(). The server maintains a message queue for buffering incoming
messages. The queue is a fixed length array which is treated as a ring buffer
or circular queue. Messages sent to the server's mailbox are received and
placed in the queue asynchronously by receive(). The SYS$QIO system service is
the key to the receive() routine. Table 2 shows the parameters passed to
SYS$QIO.
The example makes use of two powerful features of SYS$QIO, receiving I/O
asynchronously and calling a user-defined interrupt service routine when the
I/O completes. The server calls receive() (through SYS$DCLAST) immediately
after creating its receive mailbox. Receive() posts a read request and returns
control to the main loop. When the mailbox receives a message, the SYS$QIO
sets the RCVEF local event flag, copies the message and the final I/O
completion status into the message queue, and calls the interrupt service
routine, passing it mbxid as a parameter. In this case, the interrupt service
routine is receive(). Every time the mailbox receives a message, receive()
places it in the messge queue and posts a new read request. It might appear to
create an infinite loop, but it posts a new read request only after the last
request is completed.


Establishing The IPC Link


After startup, the server places itself in a wait state. When the server
receives a message, its RCVEF event flag is set and the server drops into the
main processing loop. The first message a client process sends is an ADDMBX
command (see Listing 4). This message requests that the server assign a
channel to the server-to-client mailbox created by the client process in
Listing 4. The name of this mailbox is CLIENTMBX concatenated with the client
process' process number. Before assigning the channel to the mailbox, the
server first examines its client list, which is a linked list of processes
that it is serving. If the process is not in the client list, it adds it to
the list and assigns the channel, storing the mbxid in the new list entry.


Pass-through


When client A wants to send a message to client B (and only client B), client
A initializes MSGBUF variable and sends it to the server. In the following
code fragment, assume that A's process number is 0, that B's is 1, and that
the message type is TEXT.
msgbuf.cmdtyp = PASSTHRU;
msgbuf.msgtyp = TEXT;
msgbuf.xmt_prcnum = 0;
msgbuf.rcv_prcnum = 1;

strcpy(msgbuf.msg.text,
"\nTesting");
send(xmtbmx, sizeof(MSGBUF),
msgbuf);
The server first checks to see that B is a valid destination. If it is, then
the server sends the message to client B by calling the send() primitive (see
Listing 3).
Send() uses the SYS$QIOW system service to write a message to a mailbox.
SYS$QIOW waits for completion of the write transaction and returns completion
status in the variable's status and iosb. In this example, the status returned
in iosb is used for debugging purposes only. Send() expects
mbxid: the mailbox identification.
msg: address of the message to be sent.
msgsiz: the size (in bytes) of the message.


Broadcast


A client may broadcast a message to all processes linked to the server. A
message is initialized as above, with the cmdtyp set to BROADCAST. rcv_prcnum
need not be assigned a value. The server passes the message, message size, and
a pointer to the beginning of the client list to the broadcast() primitive
shown in Listing 3. broadcast() simply traverses the client list, passing the
mbxid for that entry in the linked list to send(), which performs the actual
I/O. broadcast() sends the message to all the clients, including the
originator of the request.


Deleting The IPC Link


When a client performs a graceful shutdown, it should notify the server of its
impending shutdown. Once notified, the server can remove the client from its
client list and close its corresponding mailbox. Both client and server call
close_comm_link() to close the mailbox. The client closes both of its
mailboxes after sending the server a DELMBX request.
Close_comm_link() deassigns the channel between a process and the specified
VAX/VMS mailbox using the SYS$DASSGN service. In this application, mailboxes
are created with the TEMPORARY_MBX attribute, which allows the system to
automatically delete the mailbox when all channels to the mailbox have been
deassigned. Channels may be deassigned explicitly using close_comm_link() or
implicitly upon process termination. The mailbox identification, mbxid, is the
only argument passed to close_comm_link().


Server Shutdown


The server performs a graceful shutdown when it receives a SHUTDOWN command
from a client. The server broadcasts a SHUTDOWN command to all its clients,
closes all mailboxes in the client list, and closes its received mailbox. It
then exits.


Flexible Messages


VAX/VMS mailboxes make no restriction on message format. A programmer can take
advantage of this flexibility through the union data structure. The MSGBUF
typedef may be logically divided into three parts: the message header, the
msgtyp field, and the message body (see Listing 1). The clients and the server
use the message header for routing. The message body is a union of whatever
types of data may be transferred between processes. The msgtyp field allows
the receiving process to intelligently process the data.
Listing 4 demonstrates the message format's flexibility. The client process
switches on the msgtyp field, which may be either TEXT or INT_ARRAY. In either
case, the receiver can access the correct member of the union. This allows the
programmer to create any number of message types by declaring a union that
includes each message type and by using a corresponding definition that
identifies each union member.


Conclusion


VAX/VMS mailboxes offer the programmer a powerful and flexible solution in IPC
applications. These primitives provide the programmer with basic IPC
functionality. The server architecture offers the advantages of a near-minimum
number of links and reduced message passing, yielding predictable performance
and good use of resources.
Figure 1
Figure 2
Figure 3
Table 1
mbxid: address into which the mailbox identification number is
 to be written by SYS$CREMBX.
mbxsiz: size (in bytes) of the mailbox to be created. If this
 argument is 0, the system default is used.
mbxname: character string of the name to be assigned to the
 mailbox.
Table 2
RCVEF: set the receive local event flag when a message is
 received.
mbxid: the mailbox id.
func: function to be performed; in this case, read a virtual
 block.
msgque[qtail].iosb: the address of the I/O status block of the next entry in
 message queue. Final completion status of SYS$QIO will

 be written here.
receive: the address of the interrupt service routine.
mbxid: argument to be passed to the interrupt service routine.
msgque[qtail].msg: the address of the message block of the next entry in the
 message queue. The message received will be written
 here.
sizeof(MSGBUF): the size of the buffer area into which the message is
 written.

Listing 1 (ipc.h)
#include <stdio.h>
#include <descrip.h> /* Pass-by-descriptor structures */
#include <ssdef.h> /* System services definitions */
#include <psldef.h> /* Processor status longword def's */
#include <iodef.h> /* I/O services definitions */

#define ADDMBX 0 /* Add server-client mailbox */
#define DELMBX 1 /* Delete server-client mailbox */
#define PASSTHRU 2 /* Pass message to other client */
#define BROADCAST 3 /* Broadcast to all clients */
#define SHUTDOWN 4 /* Shutdown */

#define MAX_NO_PROCESS 10 /* maximum clients */
#define QSIZE (2 * MAX_NO_PROCESS) /* Rcv queue size */
#define RCVEF 1 /* Receive event flag */

/*====== structure for final I/O completion (QIO) ========*/

typedef struct {
USHORT status; /* completion status */
USHORT bytcnt; /* bytes transferred */
ULONG sndPID; /* sender's PID */
} IO_STATUS_BLOCK;

/*================ Message buffer ========================*/

typedef union {
int array[100];
char text[100];
} MSG;

typedef struct {
int cmdtyp; /* server command type */
int msgtyp; /* message type */
int xmt_prcnum; /* sender's process number */
int rcv_prcnum; /* receiver's process number */
MSG msg; /* msg buffer: int or text */
} MSGBUF;

typedef struct {
IO_STATUS_BLOCK iosb;
MSGBUF msg;
} RCVBUF;

/*================= Process buffer =======================*/

typedef struct {
ULONG prcnum; /* process number */
USHORT mbxid; /* mailbox id */

struct CLIENT *link; /* next in list */
} CLIENT;


Listing 2 (server.c)
#include "ipc.h"

extern RCVBUF msgque[QSIZE];
extern BOOL qfull, qempty, qovrflo;
extern int qtail, qhead;

main()
{
int status, i;
CLIENT *p, *root = NULL;
char mbxname[16];
MSGBUF msgbuf;
USHORT rcvmbx;

/*++++++ Startup ++++++*/

status = open_comm_link(&rcvmbx,sizeof(MSGBUF),"SERVERMBX");
status = SYS$DCLAST(&receive, rcvmbx, PSL$C_SUPER);

/*++++++ Main loop ++++++*/

FOREVER {
status = SYS$WAITFR(RCVEF);
status = SYS$CLREF(RCVEF);
while (!qempty) {
dequeue(&msgbuf);
switch(msgbuf.cmdtyp) {

case ADDMBX:
p = search_list(root, msgbuf.xmt_prcnum);
if (p == NULL) {
insert_list_item(&root, msgbuf.xmt_prcnum);
sprintf(mbxname, "CLIENTMBX%d",
msgbuf.xmt_prcnum);
open_comm_link(&root->mbxid,
sizeof(MSGBUF), mbxname);
}
break;

case DELMBX:
p = search_list(root, msgbuf.xmt_prcnum);
if (p) {
close_comm_link(p->mbxid);
delete_list_item(&root, msgbuf.xmt_prcnum);
}
break;

case PASSTHRU:
p = search_list(root, msgbuf.rcv_prcnum);
if (p)
send(p->mbxid, &msgbuf, sizeof(MSGBUF));
break;

case BROADCAST:

broadcast(&msgbuf, sizeof(MSGBUF), root);
break;

case SHUTDOWN:
msgbuf.msgtyp = SHUTDOWN;
broadcast(&msgbuf, sizeof(MSGBUF), root);
while (root) {
close_comm_link(root->mbxid);
root = root->link;
}
close_comm_link(rcvmbx);
exit(SUCCEED);
break;

default:
break;
}
}
}
}


Listing 3 (comm.c)
#include "ipc.h"

ULONG open_comm_link(USHORT *mbxid, ULONG mbxsiz,
char *mbxname)
{
char permanence_flag = TEMPORARY_MBX;
ULONG protection = WORLD_ACCESS;
ULONG access_mode = PSL$C_USER;
$DESCRIPTOR (mbx_logical_name, mbxname);
mbx_logical_name.dsc$w_length = strlen(mbxname);

return (SYS$CREMBX(permanence_flag, mbxid, mbxsiz, mbxsiz,
protection, access_mode, &mbx_logical_name));
}

ULONG close_comm_link(USHORT mbxid)
{
return (SYS$DASSGN(mbxid));
}


ULONG send(USHORT mbxid, MSGBUF *msg, int msgsiz)
{
int status;
USHORT func = IO$_WRITEVBLK IO$M_NOW;
IO_STATUS_BLOCK iosb = {0, 0, 0};

status = SYS$QIOW(0, mbxid, func, &iosb, 0, 0, msg,
msgsiz, 0, 0, 0, 0);

return status;
}

ULONG broadcast(MSGBUF *msg, ULONG msgsiz, CLIENT *p)
{
while (p) {

send(p->mbxid, msg, msgsiz);
p = p->link;
}
return SUCCEED;
}

void receive(USHORT mbxid)
{
int status;
USHORT func = IO$_READVBLK;

extern RCVBUF msgque[QSIZE];
extern BOOL qfull, qempty, qovrflo;
extern int qtail, qhead;

if (!qfull) {
qempty = FALSE;
qtail = (++qtail) % QSIZE;
status = SYS$QIO (RCVEF, mbxid, func,
&msgque[qtail].iosb, &receive,
mbxid, &msgque[qtail].msg, sizeof(MSGBUF),
0, 0, 0, 0);
if (((qtail+1) % QSIZE) == qhead)
qfull = TRUE;
}
else {
qovrflo = TRUE;
}
}


Listing 4 (client.c)
#include "ipc.h"

extern RCVBUF msgque[QSIZE];
extern BOOL qfull, qempty, qovrflo;
extern int qtail, qhead;

main()
{
int i, process_num, status;
char mbxname[16];
MSGBUF msgbuf;
USHORT rcvmbx, xmtmbx;



/*++++++++++++ Startup ++++++++++++*/

status = get_process_no(&process_num);
sprintf(mbxname, "CLIENTMBX%d", process_num);
status = open_comm_link(&rcvmbx,
sizeof(MSGBUF),mbxname);
status = open_comm_link(&xmtmbx,
sizeof(MSGBUF),"SERVERMBX");
status = SYS$DCLAST(&receive, rcvmbx, PSL$C_SUPER);

msgbuf.xmt_prcnum = 0;
msgbuf.msgtyp = ADDMBX;

send(xmtmbx, &msgbuf, sizeof(MSGBUF));

/*++++++++++ Main loop ++++++++++*/

FOREVER {
status = SYS$WAITFR(RCVEF);
status = SYS$CLREF(RCVEF);
while (!qempty) {
dequeue(&msgbuf);
switch(msgbuf.msgtyp) {

case INT_ARRAY:
for (i=0; i < 100; i++)
printf("\n%d", msgbuf.msg.array[i]);
break;

case TEXT:
printf("%s", msgbuf.msg.text);
break;

case SHUTDOWN:
close_comm_link(rcvmbx);
close_comm_link(xmtmbx);
exit(SUCCEED);
break;

default:
break;
}
}
}
}































A Flexible Dynamic Array Allocator


Dick Hogaboom


Dick Hogaboom has an M.S. in Physics from Boston College. He currently works
for Telos Consulting Services and is contracted to MIT Lincoln Laboratory. You
can reach him at MIT LL, Air Traffic Surveillance, Group 42/L165, 244 Wood
St., Lexington, MA 02173-0173; Work phone: 617-981-0276, Home phone:
508-435-4091.


Many C applications require the dynamic allocation of memory, as lists,
queues, stacks, or arrays. To be useful some of these structures require
initialization. Lists, for example, usually require a defined structure for
link pointers. Stacks, on the other hand, are fundamentally structureless and
require only a few position pointers. Arrays are typically implemented as a
series of pointers [[[to pointers] to pointers] etc.]to data. Your compiler
usually handles all this statically at compile time. On occasion, however,
you'll need more memory than you've got. If you don't need all the structures
simultaneously, you can circumvent memory limitations by a heap space dynamic
allocation/free sequence.
I ran into this problem when I integrated several complex FORTRAN subroutines
into a C code control skeleton on a Sun 3/260 system. The subroutines were
based on an algorithm that set up temporary multidimensional arrays in a
well-defined sequence a perfect candidate for dynamic allocation. I was
running out of space fast and had several more modules to integrate, when I
decided to redesign the application with a dynamic array allocator.
The C language doesn't support raw memory allocation, much less dynamic array
allocation. However, the newly finalized ANSI C standard provides functions
for raw heap space memory allocation, malloc(), calloc(), realloc() and in
this case valloc() provide raw space that can be initialized with the
necessary array pointer structure.
The usual approach is to malloc() separately the space for the array pointer
structure and the array data area. Each call to malloc() returns the necessary
pointer that must be stored in the previous level of pointer indirection. The
disadvantage of this approach is that multiple malloc()s result in an array
structure whose component levels of pointers and data elements are not
necessarily contiguous. Each invocation of malloc() may seize space at widely
non-contiguous points in virtual memory, worse yet, if you are on a paged
virtual memory managed machine, the memory may reside on different virtual
pages. Any array reference can thus result in the trapping of one or more
pages, depending on the number of levels of indirection, into real memory very
inefficient compared to simple pointer indirection calculations on a single
page. Another disadvantage of the multiple malloc() scheme is backing out upon
allocation failure. Freeing memory upon allocation failure is complicated.
If you know the number of dimensions, dimension limits, and data size, you can
calculate memory requirements for both the data and pointers and allocate it
with a single call. If the allocation fails, you don't need to worry about
freeing memory. A successful allocation, however, will yield an optimal array
structure with the maximum of locality of reference. SunOS provides valloc(),
similar to malloc(), to position the beginning of the allocated memory on a
virtual page boundary. Arrays of less than a page will either be entirely in
or out of memory. Beware, though -- the valloc() function is not an ANSI
standard, and you sacrifice portability.


Function Overview


I know that a good number of dynamic array allocation routines already exist,
but most have limitations, like allocating only a certain C type or limiting
you to a fixed number of dimensions. I wanted an efficient and flexible
routine, and Listing 2 is the result. See Listing 1 for the function
declaration.


Function Parameters


Size refers to the size in bytes of the basic data type to be allocated.
Normally you'll obtain it by using the sizeof C operator. You can use any data
type that sizeof can be applied to -- I've tested arrays of int, double,
float, struct, enum, union, and typedef. The second parameter is the number of
dimensions, from one to ten. I chose the ten cutoff arbitrarily, since it
seemed unlikely that anyone would want anything larger. At any rate, the
choice only impacts the error check routine. The third parameter is a single
dimensional array of dimension sizes corresponding to each of the array
dimensions. If the array is to be a[10][10][5], then the first three elements
of dimensions[] would be 10, 10, and 5. The fourth parameter, start[], is the
starting dimension subscript for each dimension. If all dimensions are to
start at zero, as is usual in C, then this array would be initialized to zero
for each dimension. However, you can individually subscript each dimension
from an arbitrary integer starting subscript. Thus, using the array
a[10][10][5] with start[] set to -1, 0, 1 results in subscripts running from
-1 to 8, 0 to 9, and 1 to 5. The fifth parameter, err_code, returns zero upon
no error and a positive integer on error. I describe the possible errors and
their associated codes in the preamble of Listing 2.
The sixth parameter, free_ptr, is a pointer used to free the allocated space.
Don't free on the function return pointer, since arrays of non-zero
subscripted first dimensions will not point to the allocated space, but to
some offset based on the subscript offset of the first dimension. The last
parameter, init_ptr, is a pointer to the basic data type that contains
initialization data to be replicated throughout the elements of the array, or
to NULL if you prefer not to initialize. The initialization argument pointer
should point to something corresponding to the size given in the first
argument. If the first argument was sizeof(int), then the last argument should
point to int. If the first argument was sizeof(struct s), then the last
argument should be a type pointer to struct s. daa( ) will use the first
argument to get the size in bytes of whatever init_ptr points to.
The return value is a pointer to the start of the array, returned as NULL upon
error. To reference the array with the desired number of subscripts, you need
to cast this pointer to the type of the array. The rule is simple: first comes
the array type and then a number of stars equal to the number of dimensions.
If you need a two-dimensional integer array, then cast to (int **). If you're
making a ten-dimensional array, then use (int **********). The declaration is
very unusual, but correct! The pointer variable you use to reference the array
will need to be declared similarly, e.g. int **********array. Your
ten-dimensional array reference would be array[i][j][k][l][m][n][o][p][q][-].
Listing 2 gives you the complete daa( ) source code. You'll find examples of
argument setup, daa( ) invocation, array usage and error codes in the
documentary preamble. For brevity I've omitted the check for a NULL return,
but you should not.


Dissecting The Code


The code in daa.c starts off with some error-checking and proceeds to the
array setup required for the linked pointer structure. Then I calculate the
total size of the raw space necessary for the array (pointer structure plus
data storage), and allocate it with valloc(). I obtain both the last parameter
address, and the size of the basic data type. I use the data type size to get
that number of bytes from the stack and initialize the part of the just
allocated array space that is basic data type storage. Finally, the actual
work of pointer structure setup is done with a call to al(0, dim_ind) which
returns the final array pointer returned from daa().
The real meat of daa() is the recursive routine al() and two other recursive
routines that al() calls -- off() and doff(). The basic allocation routine,
al(), repeatedly calls itself with each new invocation descending to a lower
level in the pointer array hierarchy. The first argument, level, is
incremented by one for each successive array dimension. For a
three-dimensional array, the level would go from 0 to 2. The dim_ind[] array
tells each recursive invocation of al( ) exactly where within each level the
pointers returned from the next level are to be stored. dim_ind[] will have,
in some recursive call, every combination of array values. Initially dim_ind[]
will be zeroed. Successive calls will increment each dimension level by one
from zero to the maximum subscript. Thus, a two-dimensional array dimensioned
10x10 will see dim_ind[0] go from 0 to 9 and dim_ind[1] go from 0 to 9.
Every pointer is composed of the sum of three parts: 1) the constant base of
the array, 2) the offset of the arrays of pointers for a given level,
calculated by off() and 3) the data offset of the arrays of pointers within
each level, calculated by doff(). The last level is different since it's the
data element level which has a different element size, and because no further
levels are called. To accommodate non-zero starting subscripts, I had to make
adjustments to the pointers in three places. Each recursive call to al() has
its return value adjusted by an amount based on that level's desired starting
subscript from the start[] array. Basically, I adjust the level zero pointer
returned by daa( ) by the desired starting subscript of the first dimension. I
also adjust the passed-back pointer for one-dimensional arrays when no pointer
structure is needed.
The easiest way to verify the accuracy of the algorithm is to put a printf()
statement just before or after the recursive call to al(), and print out the
level, the dim_ind[] array, and the difference of the returned pointers and
the base pointer. The pointer difference gives the offset into the raw space
of each level and sublevel of pointer. This way, you can verify that the
correct number of pointers for each level is being allocated.


Conclusion


I've tested this routine in many environments and am confident in its
accuracy. It is as efficient as array access goes and flexible enough to
allocate arrays of anything, with the added wrinkle of non-zero subscripting.
You're still responsible for using the proper range of subscripts. To do
otherwise results in the same usage error as misaccessing traditional
zero-based C arrays. This routine is not strictly ANSI-conforming. Converting
daa to ANSI-compatible code requires changing several features, among them the
call to valloc() and the use of a pointer to void instead of a pointer to
char.

Listing 1
char *daa (size, no_dim, dimensions, start,
err_code, free_ptr, init_ptr)
unsigned size;
unsigned no_dim;
unsigned dimensions[];
int start[];
unsigned *err_code;
char **free_ptr;
char *init_ptr;



Listing 2
/********************************************************
*
* Module Name:
* daa -- dynamic array allocator
*
* Description:
* this dynamic array allocator is designed to be
* efficient, i.e., it uses only one valloc() for
* the entire array, and flexible. It can allocate
* arrays of up to 10 dimensions of any type that
* will return a size with the sizeof "C" operator.
* it is also very efficient from the point of
* view of array access. the single valloc()
* ensures locality of reference that eliminates
* excess paging encountered with dynamic array
* allocation schemes that use multiple malloc()'s.
* Unless the array is larger than a full page
* it will be contained entirely on one page.
* It can allocate arrays of structure, enum or
* union type. A corresponding free(free(ptr))
* of the allocated area must normally be done
* (unless you want to allocate to program
* termination). The sixth parameter returns a
* pointer to do a free on. do not free on the
* function return pointer since arrays of
* non-zero subscripted first dimensions will not
* point to the allocated space, but some offset
* based on the subscript offset of the first
* dimension. The array is initialized to the
* value pointed to by the last parameter, or
* not initialized if NULL. the last parameter
* pointer should point to something with a size
* the same as the size of the type in the sizeof
* first argument. arrays may be allocated to
* have one or more dimensions with non-zero
* integer(+ or -) start subscripts. thus,
* arrays may be one based or zero based or
* any based for that matter.
*
* parameters:
* size - size of the basic array data
* object, this will usually be
* passed from the sizeof() operator
* in the call
*
* no_dim - number of array dimensions
*
* dimensions - a single dimensional array of
* the dimensions of the array to
* be allocated
*
* start - a single dimensional array of
* integer start subscripts for each
* corresponding dimension of the
* dimensions array, may be negative
*
* err_code - pointer to returned error code
* value

*
* free_ptr - pointer to the base pointer of
* the vallocated array space
* for the caller to free
*
* init_ptr - initialization pointer parameter
* (NULL if no initialization is
* desired).
*
* returns:
* a pointer to char that points to the start of
* the dynamically allocated array area. this
* will normally need to be cast to the type of
* the subsequent array references. routine
* failure will return NULL.
*
* examples:
* char *fptr;
*
* allocate 3 dimensional 3x3x3 zeroed int array
* all zero based:
* int ***array;
* int init = 0;
* unsigned d[3] = {3,3,3};
* int st[3] = {0,0,0};
* array = (int ***)
* daa(sizeof(int), 3, d, st, &err_code,
* &fptr, &init);
* array[0][2][1] = 1; assign int
* free(fptr);
*
* allocate 2 dimensional 2x3 double array
* initialized to 1.0 and one based:
* double **array;
* double init = 1.0;
* unsigned d[2] = {2,3};
* int st[2] = {1,1};
* array = (double ***)
* daa(sizeof(double), 2, d, st, &err_code,
* &fptr, &init);
* array[1][2] = 1.5; assign double
* free(fptr);
*
* allocate 3 dimensional 4x3x2 array of structures
* initialized to the s_init structure and the
* first index zero based with the second index five
* based and the third index -2 based:
* struct s {
* int i;
* double d;
* };
* struct s s_init = {5, 2.5};
* struct s ***array;
* unsigned d[3] = {4,3,2};
* int st[3] = {0,5,-2};
* array = (struct s ***)
* daa(sizeof(struct s), 3, d, st, &err_code,
* &fptr, &s_init);
* array[1][5][-2].i = 1; assign first element

* array[3][6][-2].d = 1.5; assign 2nd element
* free(fptr);
*
* errors:
* error free operation returns 0 in *err_code.
* any of the following errors will result in a
* NULL pointer return and *err_code set to the
* error number.
* 1. number of dimensions must be > 0 and <= 10
* 2. size must be greater than 1
* 3. no dimension may be zero
* 4. memory must be successfully allocated
* (valloc() != NULL)
*
* failure to index the subscripts in the manner
* established by the starting index array will
* of course result in run time errors. this is
* not an array allocation error but an array
* usage error, similar to any array access on a
* static "C" zero based array outside the normal
* 0...n-1 bounds.
********************************************************
*/
#include <stdio.h>

#define MAX_DIM 10 /* max no. of array dimensions */

/* these variables are used in the recursive
pointer setup routines */
static unsigned long data_size; /* data unit size*/
static unsigned long num_dim; /* # of dimensions */
/* dimensions of array */
static unsigned long dim[MAX_DIM);
/* product of dimensions from 0 to
index, dim[0]*dim[1]...dim[index] */
static unsigned long dp[MAX_DIM];
/* start index of array dimensions */
static int st[MAX_DIM];
/* size of pointer */
static unsigned long ptr_size = sizeof(char *);
/* points to base of allocated array */
static char *base;

char *daa(size, no_dim, dimensions,
start, err_code, free_ptr, init_ptr)
unsigned size;
unsigned no_dim;
unsigned dimensions[];
int start[];
unsigned *err_code;
char **free_ptr;
char *init_ptr;
{
/* offset in pointer words of first data byte */
unsigned long data_off;
/* array of current dimension indices */
unsigned long dim_ind[MAX_DIM];
unsigned long i,j;
char *p_data, /* pointer to array data */

*p; /* tmp pointer */

char *al();
unsigned long doff();
unsigned long off();

*err_code = 0;
if (no_dim < 1 no_dim > MAX_DIM){
*err_code = 1;
return NULL;
}

if (size < 1){
*err_code = 2;
return NULL;
}

num_dim = no_dim;

/* set dim_ind to all zeros, the multiple invocations
* of the recursive array allocation routine al()
* will pass changed by 1 dim_ind arrays to each
* invocation of al(). set dim[] and dp[] from
* dimensions[] input array and st[] from start[]
* input array */
dp[0] = dimensions[0];
for (i = 0;i < num_dim;i++){
dim_ind[i] = 0;
if (dimensions[i) == 0){
*err_code = 3;
return NULL;
}
dim[i] = dimensions[i];
if (i > 0){
dp[i] = dp[i-1]*dimensions[i];
}
st[i] = start[i];
}
data_size = size;

/* allocate enough memory for the data plus
all the array pointers */
if ((base = (char *)
valloc((data_off = off(num_dim-1))*ptr_size +
dp[num_dim-1]*data_size)) == NULL){
*err_code = 4;
return NULL;
}
/* return allocated space pointer that caller
* should use to free space */
*free_ptr = base;

/* if init_ptr is NULL skip initialization */
if (init_ptr != NULL){
/* set p_data to point to the start of
p_data = base + data_off*ptr_size;

/* initialize the array */
for (i = 0;i < dp[num_dim-1];i++){

p = init_ptr;
for (j = 0;j < data_size;j++){
*p_data++ = *p++;
}
}
}
/* do array setup i.e. all the pointer stuff */
return al(0, dim_ind);
}
static char *
al(level, dim_ind)
unsigned long level;
unsigned long dim_ind[];
{
char **ptrs;
unsigned long i;
if (level < num_dim - 1){
ptrs = (char**) (base + off(level)*ptr_size +
((level == 0)?0:doff(level, dim_ind,
dp[level])*ptr_size));
for (i = 0;i < dim[level];i++){
dim_ind[level] = i;
ptrs[i] = al(level+1, dim_ind) -
st[level+1]*((level+1 == num_dim-1) ?
data_size:ptr_size);
}
if (level == 0){
ptrs -= st[0];
}
} else {
ptrs = (char **) (base + off(level)*ptr_size +
doff(level, dim_ind, dp[level])
*data_size - ((level == 0) ?
(st[0]*data_size):0));
}
return (char *) ptrs;
}
static unsigned long
doff(level, dim_ind, dim_prod)
unsigned long level;
unsigned long dim_ind[];
unsigned long dim_prod;
{
if (level == 0){
return 0;
} else {
return doff(level-1, dim_ind, dim_prod) +
(dim_prod/dp[level-1])*dim_ind[level-1);
}
}
static unsigned long
off(level)
unsigned long level;
{
if (level == 0){
return 0;
} else {
return dp[level-1] + off(level-1);
}

}






























































Standard C


Implementing <ctype.h>




P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is secretary of the
ANSI C standards committee, X3J11, and convenor of the ISO C standards
committee, WG14. His latest book is Standard C, which he co-authored with Jim
Brodie. You can reach him at pjp@plauger .uunet.


I conclude a two-part presentation this month by discussing ways to implement
the standard header <ctype.h> and the functions that go with it. Last month, I
reproduced the relevant portion of the C standard. I also discussed the
various classes of characters and how they change between implementations and
between locales.
The actual code for any of these functions is fairly simple. What is hard to
get right are a few subtleties that can affect portability, and the machinery
for making the functions adapt to changing locales.


Size Of Character Set


The implementation presented here follows the traditional approach. A
translation table captures the peculiarities of the target character set. Each
of the functions uses its argument as an index into the table. The function
tests the selected table element against a unique mask to determine whether
the character is in the class in question.
A translation table makes sense only if it is not too large. How big it gets
is a product of how many elements it contains and how big each element must
be. Let us consider the number of elements first, since that is the dominant
factor.
Standard C defines three "character" types -- char, signed char, and unsigned
char. All these types must be able to represent all the characters in the
target character set. All these types must have representations with the same
number of bits. All other types must have representations with some integer
multiple of the bits in a character type. A character type is represented by
at least eight bits.
The character classification functions each accept an argument of type int,
but with a limited range of values. Any value that type unsigned char can
represent is valid for an argument to one of these functions. Those values
start at zero and go up to the value specified by the macro UCHAR_MAX (defined
in the standard header <limits.h>).
One additional value is permitted -- it is specified by the macro EOF (defined
in the standard header <stdio.h>). Most sensible implementations give EOF the
value -1. This implementation is no exception. So the number of elements in a
translation table must be one more than the number of distinct values
representable by a character type.
The vast majority of C implementations use exactly eight bits to represent a
character type. That provides for a target character set with 256 members.
Hence, a translation table must contain 257elements. An implementation can,
however, use more bits. C has been implemented with nine, ten, 16, and even 32
bits to represent character types. A translation table that must represent all
the values in a 16-bit character is clearly unwieldy. It would contain 65,537
elements.
Using a translation table for the character classification functions carries
with it an implicit implementation restriction. Characters must be represented
with eight bits. At worst, they contain nine or ten bits. The larger they get,
the progressively sillier is the choice of a translation table.


Number Of Subclasses


Even with eight-bit characters, an implementation may not find a translation
table always suitable. Why? Let's look at the size of each table element. Last
month, I reproduced a diagram from Standard C, the book I wrote with Jim
Brodie. It neatly summarizes the character classes and how the functions in
<ctype.h> relate. Figure 1 is that same table, repeated here for ease of
reference.
The table contains eight nodes, represented by rounded rectangles. Each of
these nodes holds a proper subset of the basic C character set. Every class
contains some combination of these nodes. That means that each character in
the basic C character set is easily classified. Its entry in the translation
table can have a single bit set. The bit tells you what node the character
inhabits. You perform a bitwise AND between a mask and an eight-bit entry in
the translation table to test its membership in any of the classes.
Let us say, for example, that the translation table is called ctype. Then we
could define bits for the different nodes, and the isalpha macro, something
like
#define ISLOWER 01
#define ISUPPER 02

extern const unsigned char
ctype[];

#define isalpha(c) \
(ctype[c] &\ (ISLOWERISUPPER))
(In an actual implementation, the names must be less readable to avoid
collision with user-defined names.)
Eight is a nice number. If eight bits suffice to classify an arbitrary entry,
then the translation table can always be an array of unsigned char as in the
example shown earlier. If we need even one additional bit per entry, the size
of the translation table can double. On an implementation with hundreds of
kilobytes of storage, that is not so important. The difference between 257 and
514 bytes is hardly worth considering. On an implementation with just a few
kilobytes of available storage, however, you might be stingier.
I observed that any character in the basic C character set can be classified
by eight bits. But it is a rare implementation that has no additional
characters. Figure 1 shows where an implementor can add characters to the
classes. The double pluses mark the two functions that can recognize
additional characters even in the "C" locale. The single pluses mark the four
functions that can recognize additional characters only outside the "C"
locale.
On the surface, it looks like we need an additional six bits in each
translation table entry. Not so. Most of those pluses can be pushed to the
right, as it were. Each of the functions iscntrl, islower, ispunct, and
isupper has a node all its very own. We can simply mark extra characters for
such a function the same way as for the characters in the private node.
Two problems remain, outside the "C" locale at least:
The function isalpha can recognize characters that are recognized by neither
islower or isupper.
The function isspace can recognize characters that are recognized by neither
iscntrl or isprint.
For these two functions, you have no place to push additional characters. You
must either rule out locales with funny letters and space, or you must make
each element of the translation table big enough to hold ten classification
bits. Alternatively, you can divide the functions into subgroups, each with
its own smaller translation table. That is a hard guessing game to win, in
general.
Another important implementation issue is deciding the number of locales you
intend to support. If any chance exists that you may want to support locales
with such alphabetic or space characters, plan ahead. Declare the translation
table to have type array of short. If you are willing to rule out such
latitude, however, you can save space by declaring the translation table to
have type array of unsigned char. Since this implementation aims at maximum
portability, it takes the former course.
One subtle point should not get bypassed. I have consistently said that an
eight-bit translation table should have elements of type unsigned char. Many
implementations will tolerate type char, or even signed char, for the table
element type. A few might conceivable generate slightly better code for one of
these types, although I doubt it. You should always use an unsigned type,
however, if you intend to perform bitwise operations operations and the sign
bit can be set.
Why? Not all implementations represent integers in twos-complement. In other
representations, converting a signed representation to an unsigned one is not
so simple. Twos-complement leaves the bit pattern unchanged. Other
representations can alter low-order bits when converting an apparently
negative value. Performing a bitwise AND between a signed value and an
unsigned mask can cause surprises.
Granted, twos-complement machines dominate the marketplace today. It is hard
to avoid all coding practices that depend on twos-complement behavior to work
right. But you can make your code more portable, and more reusable, if you
avoid the obvious pitfalls.
You might even choose to implement the translation table as an array of
unsigned short instead of array of short. Since we're using only the low-order
ten bits, it shouldn't matter. It is more a question of aesthetics, or of
having uniform style rules. I can't bring myself to be quite that pedantic,
however.



Additional Assumptions


So far, I have assumed that characters are represented in eight bits (or not
much more). I have also assumed that a program can afford to include a
translation table of 514 bytes (or not much more). To show some real code, I
must make at least three more assumptions.
The first additional assumption concerns the case mapping functions tolower
and toupper. These functions differ from the others in this group. They don't
simply classify their argument, but return a character that may differ from
the argument character. I assume that they should be implemented with mapping
tables similar to the translation table shared by all the other functions. If
it makes sense to use a table to implement islower, then it makes sense to use
another table for tolower and toupper.
The next additional assumption is that the target character set is ASCII. The
letters stand for "American Standard Code for Information Interchange." While
hardly universal, this encoding is widely used among modern computers. An
international variant of ASCII also exists. ISO 646 has the same code values
and much the same glyphs, or visible forms of the characters. Some of the
punctuation in ASCII can be replaced with alternate glyphs in ISO 646,
however. That is how Europeans can introduce accented characters, such as 
and ˆ, without going beyond seven-bit codes.
This implementation is compatible with any variant of ISO 646 where none of
the punctuation characters are redefined as letters. It is easily changed to
match other ISO 646 variants, however. Just change the pertinent table
entries. The functions remain the same. You can also accommodate other
character sets just as easily. IBM's EBCDIC (for "Extended Binary-Coded
Decimal Interchange Code") also requires a simple change of table entries.
Just be sure that your table entries agree with the character constants (such
as 'a') produced by your C translator!
The final additional assumption is not so easily encapsulated. It concerns how
the implementation deals with static storage in the library. In the simplest
case, this is a nonproblem. The translator includes code from the Standard C
library as needed. Once included in the program, library code behaves just
like code supplied by the programmer.
An implementation that can run multiple programs, however, often benefits from
having shared libraries. All the code for the Standard C library occupies a
single place in computer memory. A C program linked to run in this environment
transfers control to functions in the shared library, rather than including
its own private copy of the library code. The obvious benefits are that each
program is smaller and can link faster.
A not-so-obvious drawback appears when several functions need to share a
writable static data object that is private to the library. You can't share
the same data object between different programs. You want to allocate a unique
version of each writable data object for each program and initialize it to its
required starting value.
Sadly, no common method exists for performing this feat. Operating systems and
linkers use ad hoc machinery to make shared libraries work at all. Some simply
disallow writable statics. Others require you to invoke special machinery to
set up and access writable statics. You must write your code in a special way.
The character classification functions need writable static data if they are
to adapt to changing locales. One approach is to rewrite the tables when the
locale changes. A better way that speeds changing locales is to alter pointers
to point to different (read-only) tables. It also minimizes the amount of
writable storage that might need special handling.
This presentation ignores the potential problems associated with writable
static data in the library. I minimize the use of writable statics as much as
possible. I also call attention to every writable static data object that must
be introduced. But I use no special notation for accessing such data.


The Code


The code for the functions declared in <ctype.h> is built around three
translation tables. Three writable pointers at all times point to the tables
corresponding to the current locale. Listing 1 contains the code for the
standard header <ctype. h>. Note that every function has a corresponding
macro.
I used fairly cryptic names for the macros that define the classification
bits. That helps save space for the presentation. It also speeds the
processing of standard headers in many implementations. I also elected to put
the letters 'a' through 'f' in two classes at once. Each is both a lower case
letter, _LO, and a hexadecimal digit, _XD. (The first six upper case letters
are similarly in two classes.) Nothing is gained by keeping the subclasses
absolutely disjoint, but this way the macros are slightly smaller and more
readable.
The code for the functions looks just like the macros. See Listing 2.
Finally, Listing 3 is the code for the three tables that the character
classification functions use.
Figure 1

Listing 1
/* ctype.h standard header
* copyright (c) 1990 by P.J. Plauger
*/
#ifndef _CTYPE
#define _CTYPE

/* _Ctype code bits */
#define _XA 0x200 /* extra alphabetic */
#define _XS 0x100 /* extra space */
#define _BB 0x80 /* BEL, BS, etc. */
#define _CN 0x40 /* CR, FF, HT, NL, VT */
#define _DI 0x20 /* '0'-'9' */
#define _LO 0x10 /* 'a'-'z' */
#define _PU 0x08 /* punctuation */
#define _SP 0x04 /* space */
#define _UP 0x02 /* 'A'-'Z' */
#define _XD 0x01 /* '0'-'9', 'A'-'F', 'a'-'f' */

int isalnum(int), isalpha(int), iscntrl(int);
int isdigit(int), isgraph(int), islower(int);
int isprint(int), ispunct(int), isspace(int);
int isupper(int), isxdigit(int);
int tolower(int), toupper(int);

extern const short *_Ctype, *_Tolower, *_Toupper;

#define isalnum(c) \
(_Ctype[(int)(c)] & (_DI_LO_UP_XA))
#define isalpha(c) \
(_Ctype[(int)(c)] & (_LO_UP_XA))
#define iscntrl(c) (_Ctype[(int)(c)] & (_BB_CN))
#define isdigit(c) (_Ctype[(int)(c)] & _DI)

#define isgraph(c) \
(_Ctype[(int)(c)] & (_DI_LO_PU_UP_XA))
#define islower(c) (_Ctype[(int)(c)] & _LO)
#define isprint(c) \
(_Ctype[(int)(c)] & (_DI_LO_PU_SP_UP_XA))
#define ispunct(c) ( _Ctype[(int)(c)] & _PU)
#define isspace(c) \
(_Ctype[(int)(c)] & (_CN)_SP)_XS))
#define isupper(c) (_Ctype[(int)(c)] & _UP)
#define isxdigit(c) (_Ctype[(int)(c)] & _XD)
#define tolower(c) (_Tolower[(int)(c)]
#define toupper(c) (_Toupper[(int)(c)]
#endif


Listing 2
/* isalnum function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for alphanumeric character
*/
#undef isalnum
int isalnum(int c)
{
return (_Ctype[c] & (_DI_LO_UP_XA));
}
_____________________________________________LINEEND____
/* isalpha function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for alphabetic character
*/
#undef isalpha
int isalpha(int c)
{
return (_Ctype[c} & (_LO_UP_XA));
}
_____________________________________________LINEEND____
/* iscntrl function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for control character
*/
#undef iscntrl
int iscntrl(int c)
{
return (_Ctype[c] & (_BB_CN));
}
_____________________________________________LINEEND____
/* isdigit function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>


/* test for digit
*/
#undef isdigit
int isdigit(int c)
{
return (_Ctype[c] & _DI);
}
_____________________________________________LINEEND____
/* isgraph function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for graphic character
*/
#undef isgraph
int isgraph(int c)
{
return (_Ctype[c] & (_DI _LO_PU_UP_XA));
}
_____________________________________________LINEEND____
/* islower function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for lowercase character
*/
#undef islower
int islower(int c)
{
return (_Ctype[c] & _LO);
}
_____________________________________________LINEEND____
/* isprint function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for printable character
*/
#undef isprint
int isprint(int c)
{
return (_Ctype[c] & (_DI_LO_PU_SP_UP_XA));
}
_____________________________________________LINEEND____
/* ispunct function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for punctuation character
*/
#undef ispunct
int ispunct(int c)
{
return (_Ctype[c] & _PU);

}
_____________________________________________LINEEND____
/* isspace function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for spacing character
*/
#undef isspace
int isspace(int c)
{
return (_Ctype[c] & (_CN_SP_XS));
}
_____________________________________________LINEEND____
/* isupper function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for uppercase character
*/
#undef isupper
int isupper(int c)
{
return (_Ctype[c] & _UP);
}
_____________________________________________LINEEND____
/* isxdigit function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* test for hexadecimal digit
*/
#undef isxdigit
int isxdigit(int c)
{
return (_Ctype[c] & _XD);
}
_____________________________________________LINEEND____
/* tolower function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>

/* convert to lowercase character
*/
#undef tolower
int tolower(int c)
{
return (_Tolower[c]);
}
_____________________________________________LINEEND____
/* toupper function
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>


/* convert to uppercase character
*/
#undef toupper
int toupper(int c)
{
return (_Toupper[c]);
}


Listing 3
/* _Ctype conversion table - ASCII version
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>
#include <limits.h>
#include <stdio.h>
#if EOF != -1 UCHAR_MAX != 255
#error WRONG CTYPE TABLE
#endif

#define XDI (_DI_XD)
#define XLO (_LO_XD)
#define XUP (_UP_XD)

static const short ctyp_tab[257] = {0, /* EOF */
_BB, _BB, _BB, _BB, _BB, _BB, _BB, _BB,
_BB, _CN, _CN, _CN, _CN, _CN, _BB, _BB,
_BB, _BB, _BB, _BB, _BB, _BB, _BB, _BB,
_BB, _BB, _BB, _BB, _BB, _BB, _BB, _BB,
_SP, _PU, _PU, _PU, _PU, _PU, _PU, _PU,
_PU, _PU, _PU, _PU, _PU, _PU, _PU, _PU,
XDI, XDI, XDI, XDI, XDI, XDI, XDI, XDI,
XDI, XDI, _PU, _PU, _PU, _PU, _PU, _PU,
_PU, XUP, XUP, XUP, XUP, XUP, XUP, _UP,
_UP, _UP, _UP, _UP, _UP, _UP, _UP, _UP,
_UP, _UP, _UP, _UP, _UP, _UP, _UP, _UP,
_UP, _UP, _UP, _PU, _PU, _PU, _PU, _PU,
_PU, XLO, XLO, XLO, XLO, XLO, XLO, _LO,
_LO, _LO, _LO, _LO, _LO, _LO, _LO, _LO,
_LO, _LO, _LO, _LO, _LO, _LO, _LO, _LO,
_LO, _LO, _LO, _PU, _PU, _PU, _PU, _BB,
}; /* rest all match nothing */

const short *_Ctype = &ctyp_tab[-EOF];
_____________________________________________LINEEND____
/* _Tolower conversion table - ASCII version
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>
#include <limits.h>
#include <stdio.h>
#if EOF != -1 UCHAR_MAX != 255
#error WRONG TOLOWER TABLE
#endif

static const short tolow_tab[257] = {E0F,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,

0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27,
0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37,
0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f,
0x40, 'a', 'b', 'c', 'd', 'e', 'f', 'g',
'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
'x', 'y', 'z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
0x60, 'a', 'b', 'c', 'd', 'e', 'f', 'g',
'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o',
'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
'x', 'y', 'z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,

0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff};

const short *_Tolower = &tolow_tab[-E0F];
_____________________________________________LINEEND____
/* _Toupper conversion table - ASCII version
* copyright (c) 1990 by P.J. Plauger
*/
#include <ctype.h>
#include <limits.h>
#include <stdio.h>
#if E0F != -1 UCHAR_MAX != 255
#error WRONG TOUPPER TABLE
#endif

static const short toup_tab[257] = {E0F,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06,0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e,0x0f,
0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16,0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e,0x1f,
0x20, 0x21, 0x22, 0x23, 0x24, 0x25, 0x26,0x27,
0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e,0x2f,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36,0x37,
0x38, 0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e,0x3f,
0x40, 'A', 'B', 'C', 'D', 'E', 'F', 'G',
'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
'X', 'Y', 'Z', 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
0x60, 'A', 'B', 'C', 'D', 'E', 'F', 'G',
'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O',
'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',

'X', 'Y', 'Z', 0x7b, 0x7c, 0x7d, 0x7e, 0x7f,
0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97,
0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7,
0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf,
0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf,
0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7,
0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7,
0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf,
0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7,
0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff};

const short *_Toupper = &toup_tab[-EOF];












































Doctor C's Pointers(R)


Environmental Control




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quarterly publication aimed at
implementors of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Nech Way, Reston, VA 22091
or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.


Operating systems such as UNIX, MS-DOS, and VAX/VMS support defining symbols
that a user program can access. The program supplies the name of such a symbol
and is given the string value currently assigned to that symbol. All examples
in this article were run using Microsoft's C compiler v6 under MS-DOS v3.3.
(Other systems may require different commands to achieve the same results.)
I will refer to these symbols as environment variables. The set of such
variables defined at one time makes up the program's environment.


Defining An Environment Variable


The simplest way to define an environment variable is via an operating system
command. To define a variable ABC under MS-DOS, you use the SET command.
set abc=def
To view all variables defined in the current environment, use SET without
arguments. My system produced the output
COMSPEC=C: \COMMAND. COM
PROMPT=$p $g
PATH=c:\dos;c:\util;
ABC=def
Note that while MS-DOS preserved the case of the variable's definition, it
converted the variable name to upper-case (from abc to ABC). (There is one
exception to this, however.) You can define the PATH variable in one of two
ways and the results from each can differ. For example, if you enter
PATH c:\;
the definition of the variable is converted to upper-case. However, in the
following case, it is not.
SET PATH=c:\;
This difference might be an issue if you are looking for a specific case only
in a variable's definition. In addition to SET, the library function putenv
can also be used to define a variable.


Accessing The Environment


Many systems support a third argument to main and traditionally that argument
has been called envp. Like argc, envp is an array of pointers to char where
each entry points to a null-terminated string containing a variable and its
definition, separated by an equals sign. An entry containing NULL indicates
the end of this array. Listing 1 displays the contents of the current
environment given the previous definitions.
Since envp is an array passed by address, it is the address of the first
pointer that is actually being passed. Therefore, I could also have declared
envp as
char **envp
(just as you can declare argv).
ANSI C does not include envp, although this feature is mentioned under "Common
Extensions." envp, however, is part of UNIX SVID and POSIX.
Another UNIX feature sometimes available under MS-DOS is a global object
called environ. This is typically declared as
/* stdlib.h */
extern char **environ;
This declaration allows the startup code to point environ to the beginning of
an array of pointers to char (representing the environment table). The user
can access the table by subscripting environ (See Listing 2). If the
environment table is moved for any reason (as discussed later), environ is
simply made to point to the new location.
Note that while numerous compilers define environ as previously shown, they
document it as
extern char *environ[];
implying instead that environ is an array. It is not; it is a pointer.
environ is not part of ANSI C. You should conditionally omit its declaration
from stdlib.h when compiling in strict ANSI mode.


Defining Variables Via The Library


ANSI C defines a library routine getenv to access environment variables from
within a program. The getenv in Listing 3 finds a defined variable by
performing a case-insensitive search on the variable name.

It is interesting that ANSI C contains this function, since the standard does
not define what an environment is or how it works. If nothing like an
environment exists for a given standard C system, getenv can simply return a
NULL value for any given variable to indicate no match was found (or in this
case, that no environment exists).


Changing The Environment


UNIX and some other systems provide the putenv function to let the user add a
new variable definition or change an existing one. However, putenv is not part
of ANSI C.
Listing 4 defines a new variable test and then invokes a text editor called
see using system. I make the editor load a copy of COMMAND.COM to display the
current environment table. As expected, it contained
envp[4] ==> test=1234
However, when the original program terminated, test was no longer defined.
That is, when you use putenv to define a variable, that definition remains in
force only while that program is running. Any work putenv does is reflected
from within that program, and by any programs invoked from that program using
system, exec, or spawn. Each spawned subprogram inherits its parent's
environment. You cannot, however, use putenv to change your top-most level
environment.
When given a variable definition, putenv either adds the definition to the
environment (if no such variable exists) or changes the existing definition.
Consequently, the environment can grow. To extend the environment table
pointed to by environ, the library may have to allocate new space at runtime
and copy over the existing contents. (It could use realloc, for example.) As a
result, the table pointed to by envp will no longer be the current table.
Therefore, once you use putenv, you should access variable definitions using
getenv rather than envp.
The actual strings representing the definitions are actually part of the
user's program. For example in Listing 5, ev is an automatic array containing
the definition for test. ev is defined via putenv, and one of the environ
table entries is initialized to &ev[O]. As a result, changing the contents of
the array ev indirectly changes the definition of the variable test. (As
shown, the 2 is changed to x.)
By changing ev to contain "xyz", the pointer in environ now points to that
string and the definition of test is lost. When the environment table is
displayed, the following entry is shown:
envp[4] ==> xyz
As a result, you have an invalid format entry in the table.
You've seen now that modifying such strings directly in your program might
indirectly affect the contents of environ. In particular, if you pass an
automatic array to putenv and then execute a return statement, that array will
no longer exist and its environ entry will point to where it used to be. A
subsequent attempt to access that variable using getenv will result in
undefined behavior.
If ev is an automatic pointer instead, the problem of table corruption does
not occur. For example:
char *ev = "test=1234";
ev = "xyz";
produced the output
string defined as >1234<
string defined as >1234<
The string literal "test=1234" is a static array of char allocated at
compile-time. When the address of the string is given to putenv, it is placed
in environ. When ev is made to point elsewhere, the entry in environ remains
intact. This would not be true if the contents of the memory to which ev
points were changed.


Constructing Full Directory Paths


Some MS-DOS compilers provide a library function called _searchenv, which
allows traversing a variable definition containing a list of directory path
names. Given a filename and a variable name, _searchenv searches all
directories specified by the variable to construct the full pathname for the
specified file.
The _searchenv in Listing 6 is case-sensitive. The variable PATH does not
match path. The file link.exe is found and has a full file specification of
c:\dos\link.exe. As you can see, the listing preserves the filename's case in
the path generated. While this is not a problem on MS-DOS (since the filenames
are not case-sensitive), you should check case-sensitivity on other systems.

Listing 1
#include <stdio.h>

main(int argc, char *argv[], char *envp[])
{
int i = 0;

while (envp[i] != NULL) {
printf("envp[%d] ==> %s\n", i, envp[i]);
++i;
}
printf("\nenvironment variable count = %d\n", i);
}



Output

envp[0] ==> COMSPEC=C:\COMMAND.COM
envp[1] ==> PROMPT=$p $g
envp[2] ==> PATH=c:\dos;c:\util;
envp[3] ==> ABC=def

environment variable count = 4


Listing 2
#include <stdio.h>

#include <stdlib.h>

main()
{
int i = 0;

while (environ[i] != NULL) {
printf("environ[%d] ==> %s\n", i, environ[i]);
++i;
}
printf("\nenvironment variable count = %d\n", i);
}



Output

environ[0] ==> COMSPEC=C:\COMMAND.COM 
environ[1] ==> PROMPT=$p $g
environ[2] ==> PATH=c:\dos;c:\util;
environ[3] ==> ABC=def

environment variable count = 4


Listing 3
#include <stdio.h>
#include <stdlib.h>

main()
{
char *pc;
char string[21];

while (scanf("%20s", string) != EOF) {
pc = getenv(string);
if (pc == NULL)
printf("no such string defined\n");
else
printf("%s defined as >%s<\n", string, pc);
}
}



Output

path
path defined as >c:\dos;c:\util;<
Path
Path defined as >c:\dos;c:\util;<
PATH
PATH defined as >c:\dos;c:\util;<
xyz
no such string defined


Listing 4
#include <stdio.h>

#include <stdlib.h>

main()
{
char *pc;

if (putenv("test=1234") == -1)
printf("cannot define string\n");
else {
printf("string definition added\n");
pc = getenv("test");
printf("string defined as >%s<\n", pc);
system("see");
}
}



Output

string definition added
string defined as >1234<


Listing 5
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

main()
{
char *pc;
char ev[] = "test=1234";

if (putenv(ev) == -1)
printf("cannot define string\n");
else {
printf("string definition added\n");
pc = getenv("test");
printf("string defined as >%s<\n", pc);

ev[6] = 'x';
pc = getenv("test");
printf("string defined as >%s<\n", pc);

strcpy(ev, "xyz");
pc = getenv("test");
if (pc == NULL)
printf("no such string\n");
else
printf("string defined as >%s<\n", pc);
system( "see" );
}
}
Output
string definition added
string defined as >1234<
string defined as >1x34<
no such string



Listing 6
#include <stdio.h>
#include <stdlib.h>

main()
{
char filename[21];
char env_var[21];
char path[61];

while (scanf("%20s %20s", filename, env_var) ! = EOF) {
_searchenv(filename, env_var, path);
if (*path == '\0')
printf("no such file\n");
else
printf("path is >%s<\n", path);
}
}



Output

link.exe path
no such file
link.exe PATH
path is >c:\dos\link.exe<
LINK.exe PATH
path is >c:\dos\LINK.exe<
dummy PATH
no such file






























Applying C++


Hiding The Implementation, Part 2




Tsvi Bar-David


Tsvi Bar-David is president of Deerworks, a C++ and Object-Oriented design
training company, and currently a faculty member in the Software Engineering
Department at Monmouth College. He received his PhD in mathematics from the
University of California at Berkeley. Previously, he was employed at Bell Labs
in the development and delivery of UNIX, C++, and Object-Oriented courses. He
can be contacted at 411 Valentine St., Highland Park, NJ 08904 (201) 745-7458.


This month is dedicated to the premise that the correct way to use C++ is to
hide the implementation of functions, types and objects at all costs.
In my last column, I defined implementation hiding as the only way one
programming entity can access or modify another entity by using an interface
of (member) functions. If you substitute the term "object" for "entity", you
recover a fundamental principle of object-oriented programming: the only way
to inspect or modify an object is to send it a message. In C++ terms, the
programmer invokes a member function in the public interface of the object.
The benefits of implementation hiding fall into two major categories:
abstraction and re-use.
Abstraction (conceptual simplicity): it is easier to design and implement
systems built from entities which observe implementation hiding, because the
programmer concerns himself with only the procedural interface of the various
entities. These properties are typically simpler (more abstract) than the
algorithms that implement them.
Re-use: decoupling the interface from the implementation allows the programmer
to re-implement program entities without modifying any other part of the
program. This modularity also enhances portability across operating systems
and hardware platforms, allowing you to re-use more of the code that you
write. Reusing code decreases the cost of maintenance, which is now more than
half the cost of any software product.
In the last column, I applied implementation hiding to ordinary functions,
private and public visibility within the class template, public class
derivation and semantic hierarchy, (does this belong?) and private derivation.
This month I will apply implementation hiding to protected visibility and
class derivation, friend functions, static data members and global variables.


Protected Visibility And Class Derivation


Protected visibility is a C++ feature that allows a base class to present a
different interface to derived classes from the usual interface presented by
private/public visibility. By Stan Lippman's definition [1], a protected
member m in a base class B
class B {
... protected:
int m;
...
};
is visible to any member function of any class derived from B (and to any
member function of B itself). m is not visible (as if private) to anybody
else.
Apropos, the visibility of all data members in Smalltalk classes is protected.
That is, a great grandchild class can see all the data members of all the
classes in its inheritance chain all the way back to the root of the class
hierarchy.
Protected visibility is normally used to simplify the implementation of
derived classes by permitting them access to protected base class data
members. This usage violates implementation and data hiding, coupling the
implementation of base and derived classes. The situation is worse than it
appears. From a design and re-use perspective, a good base class has many
(children) classes derived from it. With the above usage of protected scope,
when the data structure of a base class changes (and it will), each member
function's code of every derived class may have to be modified. This is a
multiplicative maintenance nightmare.
The solution is to apply the principle of implementation hiding by forbidding
the presence of data members in the protected portion of any class. Instead,
provide a (base) class with an interface of protected member functions.
Derived classes are thus insulated from changes of implementation in the base
class, because they never directly access any data members -- of any
visibility -- in the base class, only member functions of either public or
protected scope. When the base class implementation changes, the protected
interface is re-implemented along with the rest of the class.
To illustrate these and other problems and their solution, look at a hierarchy
based on class File in Listing 1. You may have seen this class in previous
columns. It is implemented on top of the standard I/O FILE abstraction. The
constructor handles opening a file. Upon the death of the File object, the
destructor automatically handles closing a file.
Imagine two classes -- Rfile and Wfile -- derived from File. An Rfile object
can only be opened for reading
class Rfile : public File {
public:
Rfile( char *name = "") : ( name, "r");
};
and a Wfile object can only be opened for writing
class Wfile : public File {
public:
Rfile( char *name = "") : ( name, "w" );
};
The above implementations use constructors to ensure that an Rfile object can
only be opened for reading and a Wfile object can only be opened for writing.
Both Rfile and Wfile objects are closed properly because they inherit the
(public) File destructor. However, you can still attempt to write to an Rfile
object
Rfile rf( "foo");
rf.put( 'a');
or read from a Wfile object
Wfile wf( "bar" );
int c = wf.get();
Both actions are nonsense at runtime, but nonetheless permissible under the
current scenario, because put () and get() are in the public portion of the
base class File and are thus visible to all users of derived class objects.
How can protected scope help solve this problem? Place get() and put() (as
well as peek() and unget()) in a protected portion of class File (see Listing
2) and selectively override get() and put() in the derived classes (see
Listing 3).
Using the scope resolution operator :: prevents recursion. Note that it is now
a syntax (compile time) error to write (put()) to an Rfile object or read
(get()) from a Wfile object respectively.



Friend Functions


C+ + defines a friend function of a class (say class B) as an external
function or member function of some other class (say class A) that has direct
access to the private members of class B. Here is a simple example of an
external function, print(), which displays the value of an object capable of
incrementing itself. Note that the friendship declaration is found in the
class Inc. That is, the class controls the set of functions with which it is
intimate (see Listing 4).
Friend functions, in general, establish implementation relationships between
classes. When the data structure of class Inc changes, it is likely that the
code of the friend function print() must be rewritten. Thus, untrammeled
friendship violates implementation hiding.
In situations that require friend functions, you can use friendship
responsibly to preserve implementation hiding. In some cases, several
programming entities (functions and/or objects) require privileged access to
common private data structures. Languages such as Modula-2 and Ada handle
these situations with a programming language construct called a module
(package in Ada) [7]. This construct allows the programmer to encapsulate
private data structures with precisely the functions requiring access to them.
Functions outside the module have no direct access to objects of any type
defined private in the module. Thus, modules support implementation hiding.
Caution: a module is not an object! (I have probably just generated a topic
for another column.)
You can use friendship to preserve implementation hiding. If a class B has
friend functions, provide B with a special interface of private member
functions. Implement the friends of B strictly in terms of this private
interface, taking care not to let the friend functions touch private data
members in B. Note that no one knows anything about this private interface.
Also, when it comes time to re-implement class B, recode the private interface
along with the rest of B. The implementations of the friend functions do not
change.
For a more challenging example of responsible friendship, imagine a list of
integers class, class List, implemented as a linked list of Nodes. (Faithful
readers will recognize this from an earlier column.) See Listing 5.
The public interface of List provides Lisp-like list semantics. That is, you
can insert or delete elements only at the head of the list. You can iterate
over all the elements in a given List object. You can view the iterator as a
type of object in its own right with the following public interface:
class It {
public:
It( List &);
Truth isend();
T get(); // precondition: !isend()
};
The It constructor associates an iterator object with the list it iterates
over. The Boolean function isend() returns TRUE when iteration is completed,
and FALSE otherwise. get() returns the next value in the list and prepares to
fetch the value after it. For example, the following code fragment prints out
all of the values in a list of integers (see Listing 6).
The interesting question is how to implement class It. The simplest approach
is to make It a friend of List so that It's member functions can directly
access the linked-list-of-nodes structure of List, as follows:
class List {
friend It::It( List &);
// the rest of class List as above
};
The above declares that the It constructor, and only that member function, is
a friend of class List. Why is this sufficient?
See Listing 7. The implementation of It is fine as far as it goes. But it
depends painfully upon the current implementation of List. When the
implementation of List changes, so must It's. Another maintenance disaster!
In the following solution, I still implement It using friendship, but now the
member functions of It depend only upon a private member function of List --
tail() -- returning a new List object which is the tail of the list!
In Listing 8, I am being sloppy by declaring that every member function of It
is a friend of List. This is more than enough. Can this friendship be limited
to a select few?
Now see Listing 9. The latter implementation of It completely parallels the
former; yet the latter is invariant under changes in the data structure of
List. It suffices for the owner of List to properly re-write the member
function tail(). By the way, both implementations perform at the same speed,
due to inline optimization of function call overhead, which C+ + provides by
default for member functions whose bodies are found within the class template.
Note in the above that nothing in C+ + prevents It member functions from
direct access to the data members of List objects. Rather, the benefits of
implementation hiding motivate us to practice this form of safe(r)
programming.


Static Data Members


A static data member of a class is not part of the data storage of any
particular instance of the class; rather it is shared by all the objects of
that type. It is accessible to all class member functions, both static and
non-static. A static member function of a class cannot be invoked against an
instance of the class. In some sense, it can only be invoked against the class
itself, as if the class were an object of a meta-type. Static class members
(both data and function) are affected by the visibility attributes -- public,
protected, private -- in just the same way as their non-static cousins.
Static members may thus be used to implement the concept of type as object.
The static data members represent the "state" of the type. The public static
member functions constitute the public interface, i.e. the set of messages to
which the type responds.
What is the relationship of static members to implementation hiding? I make
the claim that most C-style global variables are just artifacts of the
implementation of one type or another. Make these global variables private
static data members of a class. Access them through public static member
functions (see Listing 10).
In the Frog class, the state of the Frog type is the total population of frog
instances, kept in the static member count. Data hiding requires that this
data be private, so the only way to query or modify it is through the member
functions. The Frog instance constructors and destructors increment and
decrement this count appropriately. To query the count, I send the message
howmany() to class Frog using the scope resolution operator, as below:
...
Frog kermit( "kermit");
...
printf( "there are %d Frogs in
the program\n",
Frog::howmany() );
This is a particularly clean way of handling global variables, with several
advantages. No other part of the program even knows that the global variable
exists; it is a hidden part of the class implementation. Second, the textual
encapsulation of the static data member within the class template reminds the
implementor to do the right thing with it when the time comes to re-implement
the class. Third, static data members avoid global name collision at link
time.
For example, Mary in building 20 is responsible for Frog and implements the
population count as an ordinary C- style global variable. Joe over in building
30 doesn't know Mary. He is responsible for Salamander and has the same bright
idea. He also implements the population count for Salamander with a C-style
global variable with the same name, count. As long as Mary and Joe's code is
never linked together, there is no problem. But then Doris the supervisor
decides to get smart and reuse both Mary and Joe's code in a new project. As
any C programmer (with 20-20 hindsight) will tell you, disaster strikes in the
form of a name collision on the identifier count. This tragedy could have been
averted had Mary and Joe separately practiced data hiding, and hid their
counts as private static data members in Frog and Salamander respectively.
Even if both counts are public (heaven forbid!) but static, there is still no
name collision, because member identifiers are resolved (in the global symbol
table) relative to the name of the class.


Summary


This concludes the second column devoted to implementation hiding. I have
applied a simple principle -- decoupling interface from implementation -- to
as many syntactic features as possible using C+ +. For me, the exercise
demonstrates that, even though programming languages, such as C+ +, are
growing in power, they still don't do everything we want them to do. So, no
amount of sophisticated software tools can substitute for programmers who
build their programs with intelligence and grace.
References
[1] B. W. Kernighan and P. J. Plauger, Software Tools in Pascal, 1981, Addison
Wesley.
[2] Stan Lippman, C+ + Primer, Addison-Wesley, 1989.
[3] B. Liskov and J. Guttag, Abstraction and Specification in Program
Development, 1986, MIT Press.
[4] B. Liskov, "Data Abstraction and Hierarchy", addendum to Conference
Proceedings, Object Oriented Programming Systems Languages and Applications
(OOPSLA), 1987.
[5] Alan Snyder, "Encapsulation and Inheritance in Object-Oriented Programming
Languages", Proceedings of the ACM Conference on Object Oriented Programming
Systems Languages and Applications (OOPSLA), SIGPLAN Notices 21, November
1986.
[6] Bjarne Stroustrup, The C+ + Programming Language, 1987, Addison Wesley.
[7] Niklaus Wirth, Programming in Modula-2, Springer-Verlag, 1983.


Listing 1
# include <stdio.h>
typedef int Truth;
crass File { // Implementation layered on FILE
public:
File( char *name = "", char *mode = "r")
{
if( *name )
fp = fopen( name, mode);
else if( *mode == 'r' )
fp = stdin;
else
fp = stdout;
state = (int)fp;
}
~File() { if( fp) fclose( fp); }

Truth isok() { return state; }

Truth iseof() { return feof( fp); }

int get() { return getc( fp); }
// unget and peek useful for lexical scanners
void unget(int c) { (void)ungetc( c, fp); }

int peek() { int c = get(); unget(c); return c; }

void put( int c) { putc( c, fp); }
private:
FILE *fp;
int state;
};


Listing 2
# include <stdio.h>
typedef int Truth;
class File { // Implementation layered on FILE
public:
File( char *name = "", char *mode = "r") {/* implemented as before */}

~File() { if( fp) fclose( fp); }

Truth isok() { return state; }

Truth iseof() { return feof( fp); }
protected: // only visible to derived class member functions
int get() { return getc( fp); }

void unget(int c) { (void)ungetc( c, fp); }

int peek() { int c = get(); unget(c); return c; }

void put( int c) { putc( c, fp); }
private:
FILE *fp;
int state;
};



Listing 3
class Rfile : public File {
public:
Rfile( char *name = "") : ( name, "r");

int get() { return File::get(); }
};

class Wfile : public File {
public:
Rfile( char *name = "") : ( name, "w" );

void put( int c) { File::put( c); }
};


Listing 4
class Inc {
public:
Inc( int n = 0) { val = n; }
void increment() { val++; }
private:
friend void print( Inc);
int val;
};

void print( Inc i) { printf( "%d ", i.val); }

...
Inc i(3);
i.increment();
print( i);


Listing 5
typedef int T;

// Boolean type
typedef int Truth;

// Node of T is a parametric type.
// A node contains a value of type T. Each node can link
// itself to another node
class Node {
public:
Node( T x) { val = x; Next = 0; }
Node *next() { return Next; }
void link( Node *neighbor) { Next = neighbor; }
T value() { return val; }
private:
Node *Next;
T val;
};

// List of T is a Lisp-style list of T values.
class List {
public:
List() { head = 0; }

~List() {}

Truth isempty() { return head == 0; }

// get returns the value at the head of the List
// without deleting it. This is car in LISP.
// precondition: !isempty() (List is not empty)
T get() { return head->value(); }

// add inserts a new element with value x at the head
// of the List
void add(T x)
{
Node *p = new Node(x); // error return of new should
// be checked ...
if( !isempty() ) // member fun. calls member fun.
p->link( head);
head = p;
}

// del deletes value at the head of the list
// and returns the value
// precondition: !isempty() (List is not empty)
T del()
{
Node *t = head;
T r = head->value();
head = head->next();
delete t;
return r;
}
private:
Node *head;
};


Listing 6
List l;
// fill the list with integers using List::add()
...
It i( l); // bind the iterator to the list l
while( !i.isend() )
printf( "%d\n", i.get() );


Listing 7
class It {
public:
It( List &l) { current = head;}
Truth isend() { return (current == 0); }
T get()
{
T r = current->value();
current = current->next();
return r;
}
private:
Node *current;
};



Listing 8
class List {
public:
List() { head = 0; }
~List() {}
Truth isempty() { return head == 0; }
T get() { return head->value(); }
void add(T x) { /* as before */ }
T del() { /* as before */ }
private:
Node *head;

friend It; // It is still a friend of List
// tail returns the tail of the List. cdr in LISP.
List tail()
{
List r;
if( !isempty() )
r.head = head->next();
return r;
}
};


Listing 9
class It {
public:
It( List &l) { current = l;}
Truth isend() { return current.isempty(); }
T get()
{
T r = current.get();
current = current.tail();
return r;
}
private:
List current;
};


Listing 10
class Frog {
public:
Frog( char *Name)
{
name = Name;
count++;
}

~Frog() { count--; }

// instance interface
void jump() { distance += 100; }
void hop() { distance += 50; }
void skip() { distance += 25; }
void swim() { distance += 25; }
void print()

{
printf( "Frog name is %s,
distance from pond is %d\n" );
}

static int howmany() { return count; }
private:
// instance variables
int distance;
char *name;

// type-as-object variables
static int count;
};

















































Questions & Answers


Pointers, Padding, And Poking




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for
Ken to (919) 493-4390. When you hear the answering message, press the * button
on your telephone. Ken also receives email at kpugh@dukeac.ac.duke.edu
(Internet ) or dukeac!kpugh (UUCP).


Q
I am a new and very impressed subscriber to CUG, and a C programmer of roughly
three years.
My question here deals with the acquisition of the size of a date object -- ye
'ole array of pointers to type char.
To my question, I find no decent explanation, all the way from the second
edition of K&R to my TURBO C User's Guide. None of my C programming friends or
associates have given me an answer worth sleeping over, hence, this letter to
you!
Of course, I've enclosed code (see Listing 1). With the array of character
pointers called message, I've done a bunch of printf's to spew out information
regarding its location and such. To no surprise, all of the information is
just as expected, according to every description of array and pointer
interchangeability I've read. All except one, that is.
Since the name of the array (message) is a pointer containing the address
where the array starts (&message[0]), why can't I get the actual size of the
array in bytes by doing a sizeof(message) in the final printf statement. If
the variable name message is a pointer to the start of the array, it should be
only two bytes long (in 16 bit machines anyway). Normally, an array is sized
by multiplying the number of elements by the data type. If I copy the array
name to another suitable variable of type (char *[]), say for example
char *array_copy[]
then a sizeof(array_copy) returns a size of 2, as expected. Maybe I need a
doctor to cure a blind eye, because I just don't see the logic. Please help.
Thanks so much for listening!
Peter Upczak
Santee, CA
A
One statement in your question needs clarification. The name of the array
(message) is not a pointer containing the address of where the array starts.
(That implies that the name has its own memory location.) The name of the
array represents two items. When used in an expression, it is a constant value
of type pointer, which is the address. When used as the operand of sizeof, it
represents the entire object.
In your example, the array message is a two-element array of pointers. Thus,
its size is two times the size of a pointer (two bytes) or four bytes.
Confusion about arrays/pointers exists because the same notation is used in
two different places and has two different meanings. It can also be used with
different effect, both to declare a variable and to declare a parameter. I'll
use your example with a bit more description for the rest of the readers.
The array subscript notation does not declare an array when used to declare a
parameter. The array subscript notation states that the parameter it is
receiving will be a pointer that holds the starting address of an array. The
parameter acts like a local variable and is truly a pointer.
In your function new_array_name, char *array_copy[] declares that array_copy
will receive an address which it will perceive as the starting address of an
array of pointers to char. You could have declared it as char **array__copy.
In a parameter declaration, these two mean exactly the same thing. When you
pass an array name to this function, the address value is passed and
initializes the parameter value.
The compiler allows you to declare the parameter as char *array_copy[2]. It
ignores the size declaration (since an array is not being declared in the
parameter list) and simply treats it as if you had written char *array_copy[];
Suppose you had the variables declared all in one function as:
function ()
{
char *message[2] =
{"aaaa", "bbbbb" };
char **array_copy;
array_copy = message;
}
The size of message is four bytes (two pointers of two bytes each) and the
size of array_copy is two bytes (size of a pointer). The value of message in
the assignment is simply the address of the array (two bytes long), and it is
assigned to array_copy.
Note that in declarations of local variables, you cannot declare an array
without explicitly or implicitly stating its size.
auto char *array_copy[];
is an illegal declaration. Therefore you cannot copy the array name message to
this variable in a function.
To qualify the preceding statement, let me say that there are some conditions
under which this can be done. You can make a reference to a global array using
this syntax. For example:
extern char *message[];
is a valid reference to an external array of char pointers which is defined
elsewhere in the program. However, you cannot apply sizeof to message, unless
the size of the array is declared elsewhere within the file.
If I still haven't made myself perfectly clear, Steve Clamage of TauMetric
offers another explanation:
I just read your dissertation on array vs. pointer in the August C Users
Journal. This seems to be the most misunderstood subject in C.
I have found a simple way to explain the difference in the declaration of an
array vs. a pointer:
The declaration
char data[10];
means that data is the address of an array of 10 chars. So when you later use
data[i], the meaning is to add i to the address of the data array and retrieve
the char at that address.
The declaration
char *data;
means that data is the address of a pointer which contains the address of a
char (or the first address of an array of chars). So when you later use
data[i], the meaning is to add i to the contents of the pointer stored in the
variable data and retrieve the char at that address. (-SC)
In short, if T is a type, then pointer to T is not the same as array of T.
When you declare an object in more than one place, use the same declaration
for it each time. (-KP)

Q
Perhaps you can give me an answer to a question I have regarding user-defined
data types.
In Pascal (at least in Apple Pascal) you can use the following statement to
create a data type of LONG:
TYPE LONG : INTEGER[36];
Variables of type LONG can represent integers up to 36 digits in length and
can be operated on by all the arithmetic operators with the exception of MOD.
Is there any possible way to achieve this in C?
Another question I have, has to do with necessity of casting when assigning
the value of a pointer to one data type to a pointer to a different data type.
Both of the enclosed programs (Listing 2 and Listing 3) run perfectly on my
system (Aztec C running on an Apple IIc). Is this just a non-standard quirk of
my system or is it that the assignment operator, = , transfers only the value
of a pointer and not the attributes (scalar multiple)?
Also note in June issue of CUJ, (page 98, Listing 3, line 14), the return of
the function, malloc, supposedly a pointer to char is assigned without a cast
to variable which has been defined to be a pointer to a struct.
Stanley Cohen
N. Valley Stream, NY
A
You could write your own functions to perform arithmetic of the type you
suggest. Robert Ward used to sell a package that performed something like what
you suggest.
The K&R versions of C, such as Aztec on the Apple, did not distinguish much
between types of pointers. You can freely assign a pointer to one data type to
a pointer to another data type without a single complaint from the compiler.
Only the value of the pointer (the address) is transferred.
Freely passing around addresses without some control can lead to particularly
horrendous debugging problems. Thus the ANSI standard tightened up the
assignment of pointer values. You can only assign a pointer to particular data
type to a pointer to the same type. You may do the assignment with a cast, as:
cptr = (char *) matA;
cptr = (char *) recptr;
ANSI C now has a pointer to void type. This is like "type O" blood -- the
universal donor. A pointer to any type can be assigned to a pointer to void.
Likewise a pointer to void can be assigned to a pointer to any type. You could
declare and assign without a cast as:
void *void_pointer;
void_pointer = matA;
void_pointer = recptr;
You cannot increment a void pointer, since there is no memory size associated
with void, so void_pointer++ is illegal.
The malloc function is now declared as returning a void pointer:
void*malloc();
Therefore, the return from malloc can be assigned without a cast. I tend to
use the cast anyway out of habit, but it is not necessary under ANSI C.
There is one instance where this tightening of pointer types appears to be a
slight bother. Plain char, unsigned char and signed char pointers are all
treated as different types. Thus you need to use a cast to make an assignment.
This makes initializing unsigned char pointers a little messy. A string
literal is of type pointer to plain char. So the declaration must read:
unsigned char *p = (unsigned
char *) "abc";
Note that you could have written out the matrix in Listing 2 as:
write(fd, matA, sizeof(MAT));
Even though the old description of write shows that it requires a char
pointer, it will accept matA as the address. Under ANSI C, the second argument
to write is described as a pointer to void. Thus, it is perfectly legal to
pass it matA.
Q
I have a question that I hope you can help me with. I recently started working
with a new compiler from Lattice which claims ANSI conformance. In the
following program fragment:
char *ptr;
((long *) ptr)++;
it complains about requiring an lvalue for ++. I've used constructs like this
for years on various compilers, but ANSI seems to have changed a lot of what I
"knew." In speaking with Lattice about the problem, they said that this
particular treatment of "casts not being 1values" was required by ANSI, in the
name of "portability."
I realize that what I want is not completely portable, however it is more
portable than the alternative (assembly language). I see no reason to restrict
such useful behavior for all machines -- it could be an error on machines that
cannot perform the required conversions, and perhaps a "portability warning"
on machines that can.
My current workaround is to use:
char *cptr;
long *lptr;
lptr = (long *) cptr;
lptr++;
cptr = (char *) lptr;
which seems like longhand for the previous fragment, and it is no more
portable. Also, it produces worse code, especially when dereferencing (*) the
pointer along with ++ (even after optimization.)
If ANSI has removed or standardized this functionality, then it seems we have
lost a valuable feature of the language. I hope you can shed some light on
this situation.
Hans-Gabriel Ridder
Colorado Springs, CO
A
As I mentioned in a previous column, the standard states that "a cast converts
the value of the expression to the named type." It also states that a "cast
that specifies an implicit conversion or no conversion has no effect on the
type or value of an expression."
The ++ operator can only be applied to an lvalue. An lvalue represents a
memory address where a value can be stored. (It is that which can be on the
lefthand side of an assignment statement.) The ++ operator cannot be applied
to an expression, so Lattice is correct in its statement.
Alternatively you could use:
lptr = (long *) cptr;
cptr = (char *) ++lptr;
which would cut down one statement. I prefer:
cptr += sizeof(long);
This should not generate much more code (if any) than ((long *) ptr)++. I
prefer maintaining this construct in code, since it more clearly shows what
the address will be incremented by. If you want, you can dereference it to a
single char as
*(cptr += sizeof(long))
instead of using
* (char *) (((long *) ptr)++)

I don't think ANSI tried to standardize this out of the language. It was
simply a loose end that got tightened up. [You can also get the old behavior
by writing:
((long*)&ptr)++
Requiring the address of operator is wordier, but makes for a more consistent
language. -- Ed.]
Q
I have been writing C programs for several years and there is a point I find
puzzling. Most of the programs involved some sort of manipulation of data
bases. Structures are handy tools for this, but I understand C does not
guarantee that the elements of structure will be contiguous in memory. One
should therefore not make a declared structure the recipient of a record read
from a disk, but should read each field or subfield individually into its
place in the structure. This can be dreadfully slow. If one declares an array
big enough for a record, then the record can be read in one operation.
Pointers or indices can access the data in the fields and subfields, which is
much slower than accessing elements of a structure. The same situation exists
in writing from memory to a file.
I have worked out a method that has produced code that runs day after day
without problems. (If it works, it must be legal C?) I declare a structure and
a pointer to this structure. Then I set up an array big enough to hold the
structure and initialize the structure pointer to point to the array. Now,
I've indirectly created a structure that is in contiguous memory. I can read
or write with the array as the target, and access the elements of the
structure through the structure pointers. Both of these are fast operations.
Is this a proper use of C capabilities?
R. Palmer Benedict
Wellesley, MA
A
Structure members are not guaranteed to be contiguous because of access
considerations. On some machines, you can only access numerical values on word
or double word boundaries. If characters are intermixed with numerical values,
then packing bytes are inserted after some characters to round out the number
to a word or double word boundary. These bytes are normally invisible to the
programmer. However for a particular machine/compiler combination, they will
always occur in the same place. The example program in Listing 4 will write
out two structures and read them back in. This will work regardless of the
packing arrangement.
If you write the structure to a disk file and read it back in, everything will
work fine. But, if you write it to a disk file on one machine (say MS-DOS
8086) and read it back in on another type of machine (say a MacIntosh 68000)
it won't read correctly. The packing bytes and the internal representation of
the values could be different.
Even with the same compiler, the structure may be packed differently. The
Microsoft compiler has a -Zp option, which packs structures and eliminates
packing bytes. This is possible since there are no word alignment requirements
on the 8086 line. Without this option, packing bytes are added to align
integers to even byte boundaries.
Q
Looking through some old listings, I noticed that some C programs can only be
run in Turbo C because of the Peek and Poke functions. Other DOS and BIOS
functions are in Microsoft C under different names, but not Peek and Poke.
Could you please make some Microsoft C equivalents?
Anthony Whitford
Sidney, BC CANADA
A
Although I will present these functions for you, I do not suggest writing a C
program that peeks and pokes (other than for poking around in screen memory).
You didn't say what memory model you want them for, so let me give you some
simple equivalents. Note that you will have to pass each function a far
pointer (four byte pointer). The first two bytes are the segment value and the
second two bytes are the offset. See Listing 5.
For example:
poke(OXB8000000, 'A');
puts an 'A' into screen memory. You can always directly access memory without
using a function. If you are using the large or compact memory model, you
could use a char pointer. For example:
char *pc = 0XB8000000;
*pc = 'A';
or even:
*((char *) OXB8000000) = 'A';
Q
I'm an amateur programmer working with Turbo-C V2.0. I have a problem solving
a program that holds a large static array of structures. When compiled under
Compact or Huge Memory model, the same error appears:
"Too much global data defined in file"
Could you help me solve this problem? (The program is shown in Listing 6.)
Thank you.
Abdel Hindi
Montreal, Canada
A
I'm not sure how big the n is in your structure template, or how big the
strings were that you were using. I presume either or both must be fairly
large. With a compact model, the size of an array is limited to 64K bytes. A
value of n greater than 64K bytes / 4 bytes / 200 elements, or about 80, would
be too big.
If the n was smaller than this, then the error is due to the total length of
the character strings. I can only find implicit references to an assumption
that constant data in a module is placed in one segment and that there is no
way to specify this to be otherwise. Thus, the total length of the character
strings in a single source file cannot exceed 64K. With n equal 10 and an
average string length of 35 bytes, you would exceed this figure (10 times 200
times 35 is 70,000). If a reader has any other ideas on this, please let me
know.
To overcome this limitation, you could read a file to initialize the character
data. You could read it a string at a time into a buffer, determine the
length, perform a malloc() to get space to put it, copy the buffer into that
space, and assign the pointer to each member of each element in OBJECT. Since
fgets returns the string with a new-line character, you need to take it out.
The code would look something like the code in Listing 7, which does not
include error checking.
Q
I've been stung by a canned software package and I can't stop the burning!
While I was tuning an application consisting of a window package, an expert
system, a mountain of application code, and an embedded database, I discovered
that the application wasn't using the system's free() and malloc().
After much research I found that the database package that we integrated with
had developed its own free and malloc routines which performed more slowly
than the system free and malloc for my window package.
I tried relinking with the C library listed first in the list of libraries,
but that gave me a double declaration error. Next I tried extracting free and
malloc from their libraries; however, when I tried relinking, I had unresolved
externals!
Any suggestion?
Robert Schweiss
College Park, Maryland
A
The solution depends on the compiler/linker you are using. Some linkers allow
you to ignore the double declarations of errors and continue with the link.
For example, Microsoft's has a/NOE option. Be careful if you use it, for it
also allows the link to complete if there are missing externals.
When you remove a module from a library, all functions in that module are
removed. The module that contained free and malloc probably also contained
other functions that were required. You need to include a file with equivalent
functions and the same names in your link before you access that library. For
example, if the original source file for the package contains code as in
Listing 8, you will need to create a file with some_other function in it that
does the same thing as the one in the library.


Obscure Name Contest


See Table 1 for some unusual names that I have authored or run across in my
years of programming.


Readers' Replies





More on LPT1, etc.


In the recent issue of TCJ (July '90) you responded to Chaiyos Gosolsatit of
Lewiston, NY about how to print the output to printer 2 (LPT2) instead of LPT1
(stdprn). You suggested that he use the MS-DOS MODE command on the DOS command
line before executing the program. This is a round about way of doing what
could be done in a much more practical manner (I think).
The following program will do what he wants in a more straightforward manner.
I've used fopen ("LPT2", "mode") on several occasions with both LPT1 and LPT2
(and I would guess that LPT3 would work as well) to allow the user to select
which printer, local or remote, on a network he wishes to print to. See
Listing 9.
Mike Fox
Ohio State
A similar reply was received from Ian Cargill of Surrey, England.


Include Filenames


I am replying to Mr. Jim Howell's question in the C Users Journal July 1990,
p. 92. On the attached sheet you'll find INCTEST. C (Listing 10) which solves
the problem for Turbo C 2.0 and QuickC 2.0. In Turbo C it is sufficient to
simply define MH_PRG to the corresponding application name (progl in the
example). Doing the same in Quick C produces the include directive #include
<MH_PRG_A.h> which is not what we want. Although your suggestion to use
#ifdefs is certainly the most portable way to solve this problem, I definitely
prefer the more elegant Turbo C 2.0 behavior. I use a similar construct to
include user-specific source files in my applications. Because each user has
his own serial number I only need to write a file name r_usrXXX. c where XXX
is the user's serial number. The application then automatically includes this
file. This saves me the time to expand all the #ifdefs for -- hopefully --
lots of new users. See Listing 10. Thank your for your great column.
Matthias Hansen
Rendsburg, West Germany
Table 1
IDENTIFIER COMMENT AUTHOR

From communications software:

stuffy() Character "put" routine Michael Ferguson
popeye() Character "get" routine Michael Ferguson
porno Table entry for a physical Jess Henson
 "port number"

From networking software:

lslsnacuei() "LSC switching network Gerald Hill
 alarm controller unknown
 event interrupt: handler

lslsnacemf() "LSC switching network Gerald Hill
 alarm controller event
 mask failure" handler

Listing 1
/*
BY : PETE UPCZAK

COMMENTS : COMPILED UNDER BORLAND INTERNATIONAL'S
TURBO C, V2.0.
*/
#include <stdio.h>
#include <stdlib.h>
void new_array_name(char *array_copy[])
{

/* info on array in main()) referenced by different name */

printf("the value returned by 'array_copy' is %xh\n", array_copy);
printf("the value returned by 'sizeof(array_copy)' is %d\n",
sizeof(array_copy) );

return;

}


void main ()
{
static char *message[] = { /* the object in question */
"aaaa",
"bbbb",
};
void new_array_name (char *[]);

/* info on array element 1 */
printf("the value returned by '&message[1]' is %xh\n",
&message[1]);
printf("the value returned by 'message + 1)' is %xh\n",
(message + 1) );
printf("the value returned by '*(message + 1)' is %xh\n",
*(message + 1));
printf("the value returned by 'message[1]' is %xh\n",
message [1]);
printf("sizeof(*(message + 1)) is %d\n", sizeof(*(message +
1)) );
printf("sizeof(message[1]) is %d\n", sizeof(message[1]) );
printf("%s, %s\n\n", message [1], *(message + 1) );

/* info on array element 0 */

printf("the value returned by '&message[0]' is %xh\n",
&message [0]);
printf("the value returned by 'message' is %xh\n",
message );
printf("the value returned by 'message [0]' is %xh\n",
message [0] );
printf("the value returned by *message' is %xh\n",
*message);
printf("sizeof(*message) is %d\n", sizeof(*message) );
printf("sizeof(message) is %d\n", sizeof(message[0]) );
printf("%s, %s\n\n", message[0], *message );

/* info on the copy of the array name */

new_array_name(message);

/* info on the array name itself */

printf("sizeof(&message[0]) is %d\n", sizeof(&message[0]) );
printf("sizeof(message) is %d\n", sizeof(message) );

}
Output

the value returned by '&message[1]' is 196h
the value returned by '(message + 1)' is 196h
the value returned by '*(message + 1)' is 1FBh
the value returned by 'message[1]' is 1FBh
sizeof(*message + 1)) is 2
sizeof(message[1]) is 2
bbbbb, bbbbb

the value returned by '&message[0]' is 194h
the value returned by 'message' is 194h
the value returned by 'message[0]' is 1F5h

the value returned by '*message' is 1F5h
sizeof(*message) is 2
sizeof(message[0]) is 2
aaaaa, aaaaa

the value returned by 'array_copy' is 194h
the value returned by 'sizeof(array_copy)' is 2
sizeof(&message[0] ) is 2
sizeof(message) is 4


Listing 2
#include <stdio.h>

typedef int MAT [4] [4];

main()
{
MAT matA, matB;
int i, j, fd, n = 4;
char *cptr;

puts("\014\n\n ENTER MATRIX ROW BY ROW");

for (i = 0; i < n; ++i)
for(j = 0; j < n; ++j)
scanf("%d", &matA[i] [j] );

fd = creat("mats.data",4);

cptr = matA; /* no cast -- even tho the scalar of matA is 32 */

write(fd, cptr, sizeof(MAT)); /* cptr is taken to be the
pointer to an array of 32 char */

close(fd);

fd = open("mats.data",2);

cptr = matB; /* cptr now points to the other matrix -- no
cast */
read(fd, cptr, sizeof(MAT));
close(fd);

puts("\n\n\n");
for (i = 0; i < n; ++i)
{
for (j = 0; j < n; ++j) /* 4 x 4 matrix of */
printf("%4d", matB[i][j]); /* integers is
output */
putchar('\n'); /* to screen */
}
}


Listing 3
#include <stdio.h>

typedef struct

{
char fname[25];
char lname[25];
} REC;

main()
{
REC rec, *recptr;
char charstr[25], *cptr;
int i;

/***********/

strcpy(rec.fname, "Stanley");
strcpy(rec.lname, "Cohen");

recptr = &rec;
cptr = recptr; /*** value of recptr is assigned to cptr
NO cast ***/

i = 0;;
while( charstr[i++] = *cptr++); /*** pointer arithmetic
on cptr ***/

puts(charstr); /*** first name is output to screen
***/

cptr = recptr;

printf("\n\address pointed to by cptr -> %d", cptr);
printf("\naddress pointed to by recptr -> %d", recptr);

cptr++;
recptr++;

printf("\n\naddress pointed to by cptr -> %d", cptr);
printf("\naddress pointed to by recptr -> %d", recptr);

}


Listing 4
#include <stdio.h>

struct s_date
{
int day;
int month;
char filler; /* Just to get some packing bytes (perhaps) */
int year;
};

main()
{
FILE *file;
static struct s_date date = {1,2,' ',3};
static struct s_date date2 = {4,5,' ',6};

printf("\n Size of the structure is %d",

sizeof(struct s_date));

/* Write two structures out */
file = fopen("DATA.DAT","w");
fwrite(&date, sizeof(struct s_date), 1, file);
fwrite(&date2, sizeof(struct s_date), 1, file);
fclose(file);

file = fopen("DATA.DAT","r");
/* Note reversal of which date is read */
fread(&date2, sizeof(struct s_date), 1, file);
fread(&date, sizeof(struct s_date), 1, file);
fclose(file);
}


Listing 5
int peek(address)
far unsigned char *address;
{
return *address;
}

int poke(address, value)
far unsigned char *address;
unsigned int value;
{
*address = value;
return;
}


Listing 6
struct object {
char *Element_1;
char *Element_2;
...
char *Element_n;
};

main()
{
static struct object OBJECT[200] = {
{ "aaaa","bbbb",...},
....
};
};


Listing 7
char buffer[1000];
int length;
char *pc;
FILE *file;
file = fopen("CHARDATA.DAT","r");
for (i = 0; i < 200; i++)
{
/* For each element */
/* Read one line */

fgets(buffer, 1000, file);
/* Determine the length */
length = strlen(buffer);
/* Put a NUL character over the new-line '\n' */
buffer[length - 1] = 0;
/* Allocate space for the string */
pc = malloc(length);
/* Copy the string */
strcpy(pc, buffer);
/* Assign the pointer to an element */
OBJECT[i].Element_1 = pc;

/* Begin repetition for next element */
...
}


Listing 8
free()
{
/* Free routine */
}
malloc()
{
/* Malloc routine */
}
some_other()
{
/* Some other routine */
}


Listing 9
#include <stdio.h>
FILE *pfile;

void main(void)
{
pfile = fopen("LPT2","W");
fprintf(pfile,"Hello World\n");
}


Listing 10
/* This macro could be moved into your personal header file */

#define MH_CONCATTOKEN( name1, name2) name 1 ## name2

/* Differentiate between Turbo and Quick C */

#if defined( _TURBOC_)

/* Turbo C version */

#define MH_PRG prog1

#define MH_INCFILE_A MH_CONCATTOKEN( MH_PRG, _a )
#define MH_INCFILE_B MH_CONCATTOKEN( MH_PRG, _b )


#elif defined( _QC )

/* Quick C version */

#define MH_INCFILE_A MH_CONCATTOKEN( prog1, _a )
#define MH_INCFILE_B MH_CONCATTOKEN( prog1, _b )

#else
#error Possibly unsupported compiler encounter
#endif

#define MH_INCLUDE_A <##MH_INCFILE_A##.h>
#define MH_INCLUDE_B <##MH_INCFILE_B##.h>

/* Include the files */

#include MH_INCLUDE_A
#include MH_INCLUDE_B













































Implementer's Notebook


Implementing Software Timers




Don Libes


Don Libes is a computer scientist at the National Institute of Standards and
Technology. He is also the author of Life With UNIX, published by
Prentice-Hall. His electronic mail address is libes@cme.nist.gov. He can also
be reached at NIST, Bldg. 220, Rm A-127, Gaithersburg, MD 20899.


This month I'll describe a set of functions to implement software timers.
Software timers make up for limitations in hardware timers. For example, while
most computers have clock hardware, you can typically only have the clock
trigger an interrupt for one time in the future.
When running multiple tasks, you will want to have the clock keep track of
multiple timers concurrently so that interrupts can be generated correctly
even if the time periods overlap. Operating systems do this all the time.
Robert Ward discussed the related problem of building a general purpose
scheduler ("Practical Schedulers for Real-Time Applications") in the April
1990 CUJ. In the "Additional Ideas" section, Robert described the usefulness
of a timer scheduling queue. "Events can specify the timing of other events by
putting a timer programming request in a special queue." That is exactly what
the software in this column will do. (Thanks for the lead in, Robert.)
The code in this column has other uses as well. For example, it can simulate
multiple timers in environments such as a UNIX process, which only allows the
user one software timer. Even if you aren't interested in software timers, I
think you will find this an intriguing column. Using simple techniques and
data structures, this C code produces powerful results. The code was tricky to
get right, and my commentary should be interesting if only as some more
practice in reading and writing C code.


Timers


Implementing the timers as a separate piece of software reduces the complexity
of the scheduler. Some programmers like this kind of modularization, and some
don't. Similarly some operating systems do this, and some don't. I think it
makes the code easier to write, read and correct (oops).
Timers allow tasks to be run at some time in the future. When its time
arrives, the task is scheduled to be run. The responsibility of actually
running tasks is then turned over to someone else, such as the scheduler. To
communicate with the scheduler, I'll set up a common data structure called a
timer (see Listing 1). I've also included a few other definitions that will be
needed later on. The TIME typedef declares all relative time variables. You
can complete this definition based on what your needs are.
Each timer will be represented by a timer struct. The set of timers will be
maintained in an array, timers. The first element of each timer declares
whether the timer is in use. The second element of a timer is the amount of
time being waited for. This is periodically updated as time passes. event is a
pointer to a value that is initially set to 0. When it is time to run the
task, *event is set to 1. We can imagine that the scheduler also keeps an
event pointer. Every so often, the scheduler reexamines the event. When the
scheduler finds that the event has been set to 1, it knows that the timer has
expired and the associated task can be run. (Notice how simple this is. Other
schedulers or other scheduler data structures could enable runnability,
without worrying or even knowing about timers.)
The code in Listing 2 initializes the timers. It runs through the array
setting each inuse flag to FALSE. This for loop will become idiomatic to you
by the end of this column.
Now we can write the routines to schedule the timers. First, I'll show
timer_undeclare, which is a little simpler than its counterpart,
timer_declare.
There are a variety of ways to keep track of the timers. Machines which don't
have sophisticated clock hardware usually call an interrupt handler at every
clock tick. The software then maintains the system time in a register, as well
as checking for timer entries that have expired.
More intelligent machines can maintain the clock in hardware, only
interrupting the CPU after a given time period has expired. By having the
clock interrupt when an event is waiting, you can get a tremendous speedup.
This technique is also common in software simulations and thread
implementation.
Reading the clock may require an operating system call, but for our purposes
we will assume the variable time_now is automatically updated by the hardware.
volatile indicates that the variable should not be cached in a register but
read from storage each time.
volatile TIME time_now;
I'll define several variables for shorthands. timer_next will point to the
timer entry that we next expect to expire. time_timer_set will contain the
system time when the hardware timer was last set.


Undeclaring Timers -- Why And How?


timer_undeclare does just what its name implies, it undeclares a timer.
Undeclaring a timer is an important operation in some applications. For
example, network code sets timers like crazy. In some protocols, each packet
sent generates a timer. If the sender doesn't receive an acknowledgement after
a given interval, the timer forces it to resend a packet. If the sender does
receive an acknowledgement, it undeclares the timer. If all goes well, every
timer declared is later undeclared.
timer_undeclare (Listing 3) is performed with interrupts disabled. This is
necessary because we are going to have an interrupt handler that can access
the same data. Because this data is shared, access must be strictly
controlled. I've shown the interrupt manipulation as a function call, but you
must use whatever is appropriate to your system. This is system dependent.
timer_undeclare starts by checking the validity of the argument as a timer
entry. We will see later that the system clock can implicitly undeclare timer
entries. Thus we must attempt to assure ourselves that a timer to be
undeclared is still declared.
Once assured the timer is valid, timer_undeclare marks the entry invalid. If
the timer happens to be the very one next expected to expire, the physical
timer must be restarted for the next shorter timer. Before doing that, all the
timer entries have to be updated by the amount of time that has elapsed since
the timer was last set. This is done by timers_update which also calculates
the next shortest timer. Looking for the shortest timer in that function is a
little obscure but happens to be convenient since timers_update has to look at
every timer anyway.
timers_update (Listing 4) goes through the timers, subtracting the given time
from each. If any reach 0 this way, they are triggered by setting the event
flag. Any lag in the difference between when a timer was requested and
timers_update is called, is accounted for by basing the latency against
time_now and also collecting timers that have "gone negative" in
timers_update. (Why might a timer go negative?) Lastly, we also remember the
lowest non-zero timer to wait for as timer_next.
timer_last is just a temporary. It is a permanently non-schedulable timer that
will only show up when all the other timers have been scheduled.


Declaring Timers


timer_declare (Listing 5) takes a time and an event address as arguments. When
the time expires, the value that event points to will be set. (This occurs in
timers_update under the comment /* tell scheduler */.) timer_declare returns a
pointer to a timer. This pointer is the same one that timer_undeclare takes as
an argument.
As with its counterpart, interrupts are disabled in timer_declare to prevent
concurrent access to the shared data structure.
First timer_declare allocates a timer. If no timers are available, it returns
a NULL so the caller can fail or retry later.
Once a timer is allocated and initialized, you must check for the three cases
in which the physical timer must be changed:
There are no other timers. In this case, go ahead and start the physical timer
with the time of this timer.
There are other timers, but this new one is the shortest of all the others; In
this case, restart the physical timer to the new time. But first update all
the other timers by the amount of time that has elapsed since the physical
timer was last set.
There are other timers, and this new one is not the shortest. There is nothing
to do in this case. However, for legibility it is broken into its own case
which contains only a comment. So it is clear what is going on when the
previous else-if test fails.
Before enabling interrupts and returning, the timer's inuse flag is set. The
flag is set afterwards rather than with the earlier timer settings to prevent
timers_update from updating the timer with a time period that occurred before
the timer was even declared



Handling Timer Interrupts


The only remaining routine is the interrupt handler (Listing 6), which is
actually called when the physical clock expires. When the interrupt handler is
called, we are guaranteed that the time described by timer_next has elapsed.
Each time the interrupt handler is called, a timer has expired. By calling
timers_update, all the timers will be decremented and any timers that have
expired will have their event flags enabled. This will also set up timer_next
so that the physical timer can be restarted for the next timer we expect to
occur.
Examine one special case. Suppose you have only one timer set up. Now imagine
that you have called timer_undeclare and just as interrupts are disabled, the
physical clock ticks down all the way. Since interrupts are disabled, the
interrupt will be delivered immediately after interrupts are enabled. But they
will be enabled after the timer has been deleted. So you see a situation where
an interrupt will be delivered for a timer that no longer exists. What occurs
in the interrupt handler?
timers_update is called. It finds nothing to update. As a consequence of this,
timer_next is set to 0. The remainder of the interrupt handler already handles
the case of no remaining timers, and the handler returns normally.
This is an example of the kind of special casing you have to keep in mind when
writing the code. (In fact, my first implementation didn't handle this right,
and it was painful to debug. Debuggers don't work very well when fooling
around with interrupts!)


Conclusion


I have presented an implementation of timers. The code is carefully designed
to be relatively free of special demands it places on a scheduler. For
example, it doesn't close off the scheduler from using a different kind of
timer at the same time.
One thought that may have occurred to you while reading this is why are the
timers maintained as an array rather than say, a linked list. Using a linked
list would avoid the overhead of stepping through arrays (which can be almost
entirely empty). Keeping the list sorted by time makes the timers_update
function much simpler.
On the other hand, using a timer complicates the other functions. For example,
timer_undeclare either requires you to use a doubly-linked list or search the
entire list from the beginning each time. Real-time systems typically avoid
dynamic structures to begin with. For example, using malloc/free from a
process-wide heap can take an indeterminate amount of time that is difficult
to estimate. If I were to recode this using linked lists, I would use a malloc
implementation from a small pool of timer-only buffers, which in effect is
very similar to what I've done here with arrays. There would be a tradeoff in
space and time, which you might prefer, depending upon your application.
If you decide to recode or just modify my implementation, be careful. Always
imagine the worst thing that can happen when two processes attempt to access
the same data structure at the same time. Happy interruptions!
Debugging timing routines is different from other code, since unrelated events
in the computer can make your programs behave differently. Even putting in
printf statements can change critical execution paths. It is extremely
aggravating when problems disappear only when you are debugging. Furthermore,
most debuggers do not work well when interrupts are disabled. Ed Barkmeyer was
of great help debugging the timer code and teaching me to persevere when I saw
code behaving in ways that had to be impossible. Thanks to Sarah Wallace who
debugged this column and also forced me to make all the explanations much
clearer.

Listing 1
#include <stdio.h>

#define TRUE 1
#define FALSE 0

#define MAX_TIMERS ... /* number of timers */
typedef ... TIME; /* how time is actually stored */
#define VERY_LONG_TIME ... /* longest time possible */

struct timer {
int inuse; /* TRUE if in use */
TIME time; /* relative time to wait */
char *event; /* set to TRUE at timeout */
} timers[MAX_TIMERS]; /* set of timers */


Listing 2
void
timers_init() {
struct timer *t;

for (t=timers;t<&timers[MAX_TIMERS];t++)
t->inuse = FALSE;
}


Listing 3
struct timer *timer_next = NULL;/* timer we expect to run down next */
TIME time_timer_set; /* time when physical timer was set */

void timers_update(); /* see discussion below */

void
timer_undeclare(t)
struct timer *t;
{
disable_interrupts();

if (!t->inuse) return;
t->inuse = FALSE;

/* check if we were waiting on this one */
if (t == timer_next) {
timers_update(time_now - time_timer_set);
if (timer_next) {
start_physical_timer(timer_next->time);
time_timer_set = time_now;
}
}
enable_interrupts();
}


Listing 4
/* subtract time from all timers, enabling any that run out along the way*/
void
timers_update(time)
TIME time;
{
static struct timer timer_last = {
FALSE /* in use */,
VERY_LONG_TIME /* time */,
NULL /* event pointer */
};

struct timer *t;

timer_next = &timer_last;

for (t=timers;t<&timers[MAX_TIMERS];t++) {
if (t->inuse) {
if (time < t->time) { /* unexpired */
t->time -= time;
if (t->time < timer_next->time)
timer_next = t;
} else { /* expired */
/* tell scheduler */
*t->event = TRUE;
t->inuse = 0; /* remove timer */
}
}
}

/* reset timer_next if no timers found */
if (!timer_next->inuse) timer_next = 0;
}


Listing 5
struct timer *
timer_declare(time,event)
unsigned int time; /* time to wait in 10msec ticks */
char *event;
{
struct timer *t;

disable_interrupts();


for (t=timers;t<&timers[MAX_TIMERS];t++) {
if (!t->inuse) break;
}

/* out of timers? */
if (t == &timers[MAX_TIMERS]) return(0);

/* install new timer */
t->event = event;
t->time = time;
if (!timer_next) {
/* no timers set at all, so this is shortest */
time_timer_set = time_now;
start_physical_timer((timer_next = t)->time);
} else if ((time + time_now) < (timer_next->time + time_timer_set)) {
/* new timer is shorter than current one, so */
timers_update(time_now - time_timer_set);
time_timer_set = time_now;
start_physical_timer((timer_next = t)->time);
} else {
/* new timer is longer, than current one */
}
t->inuse = TRUE;
enable_interrupts();
return(t);
}


Listing 6
void
timer_interrupt_handler() {
timers_update(time_now - time_timer_set);

/* start physical timer for next shortest time if one exists */
if (timer_next) {
time_timer_set = time_now;
start_physical_timer(timer_next->time);
}
}























MSI's CodeRunneR Makes Your TSRs Run


Victor R. Volkman


Victor R. Volkman received a BS in computer science from Michigan
Technological University in 1986. Mr. Volkman is the new Products Editor for
The C Users Journal. He is currently employed as software engineer at Cimage
Corporation of Ann Arbor, Michigan. He can be reached at the HAL 9000 BBS,
313-663-4173, 1200/2400/9600 baud or any BBS in the W-Net Network.


CodeRunneR by Microsystems Software, Inc. (Framingham, Mass.) is a library
designed to help you write Terminate-and-Stay-Resident programs (TSRs). TSR
programs are able to run in the background and later be awakened and brought
to the foreground by an external event, such as an interrupt, hot-key, or
timer. A TSR program can then pop-up windows, accept keyboard input, perform
processing and become the foreground program. Every version of MS-DOS is able
to install TSRs. Without expert guidance, writing your own TSR program can be
a difficult and frustrating task. You can either scour technical books and
articles for hints and tricks, or you can start with a complete packaged
library solution. The CodeRunner library provides a large set of functions
designed to operate in the hostile environment that a resident program must
live in. Using the TSR template provided with CodeRunneR, you can get your
first TSR application up and running in days instead of weeks.
CodeRunneR version 1.05A is shipped on both a 5¬-inch 360K diskette and
3«-inch 720K diskette in the same package. The version I received also
included the Professional Developer's Kits #1 and #3 (PDK). The PDKs are
additional library modules which provide such services as complete Expanded
Memory Specification (EMS) support and interrupt-driven serial I/O. The
CodeRunneR library is available in object form only; source is priced
seperately. The library is compatible with both Microsoft C, Borland Turbo C,
Watcom C and Zortec C. CodeRunneR is available in Small and mixed models only
(code <64K, data <64K). No support is available for Tiny, Compact, Medium,
Large or Huge memory models. The list price for CodeRunneR without PDKs is
$149.
Although the documentation does not mention hardware requirements, CodeRunneR
should work well on any system from an 8088 to an 80486. The documentation
doesn't mention operating system requirements either. However, I had no
difficulty using CodeRunneR with DOS 3.3 on a 20Mhz 80386 computer and 4Mb
RAM. A subsequent call to MSI technical support revealed that CodeRunneR-based
products will run well on any computer with DOS 2.0 or later.
Developing products with CodeRunneR requires only a modest C background.
However, I recommend that only software developers with detailed knowledge of
MS-DOS and PC architecture attempt writing their own TSRs. You will need to
learn this background material to climb the learning curve quickly.
In addition to producing RAM-resident programs, the CodeRunneR library offers
several other features, including function-level granularity, hot-key entry
points, a Binary Coded Decimal (BCD) math library, a multitasking scheduler, a
replacement runtime library for C and elimination of initialization code and
data segments. Still more features are available through the PDKs which must
be purchased separately.


Library Granularity


The CodeRunneR library is touted as having function-level granularity.
Function-level granularity means that none of the functions in the library
require support from any other function. Normally, if you invoked a print f()
call against the run-time library supplied with your compiler, you would
expect it to call in a host of other support functions. Each of these
functions might reference another set of functions. Soon the linker could be
pulling in the greater part of the runtime library, even if some of the
functions are not used by your application. CodeRunneR avoids this phenomenon
by coding so that each function requires no additional support functions.
Coding in this fashion ensures that only minimum library support is linked
into your program. The disadvantage is that you lose opportunities for size
reductions in similar library functions containing redundant code.
Nevertheless, in many cases a library with function-level granularity will
impose less of a burden on the application than a library without it.


Hot-Key Activation


Many TSR programs require the ability to be invoked at any time by the press
of a hot-key. The CodeRunneR hot-key facility provides a simple method for
managing up to 256 hot-key entry points simultaneously. Many keystroke
combinations can be specified. Hot-key support must be installed in the
initialization section of your application via the install_hk() function. A
sample invocation might be:
install_hk(hk_list, service, stack_size, lock_mask);
The first argument of install_hk() is a pointer to an array of hot-keys which
will activate the TSR, for example:
int hk_list[] : {KEY_W + MASK_LEFT_SHIFT +
MASK_LEFT_CTRL,
KEY_T + MASK_LEFT_ALT + MASK_RIGHT_ALT,
0};
The keyboard mask for keys on the left and right sides must be specified
individually. Note that only two masks may be combined with any other
non-shifting key.
The second argument is a pointer to the dispatching function which will be
called whenever a key in the hk_list[] is pressed. The function service() must
be of type void and must receive the scan code of the actual keypress as its
argument. The service() function is supplied by the application but usually
follows a standard sequence of operation. First, service() checks the current
video mode and changes the mode if neccessary. Next, service() dispatches the
actual function associated with the hot-key. After returning, service() checks
the stack to determine how much has been used. Last, if TSR removal has been
requested (perhaps by a hot-key), then service() calls uninstall() to remove
itself.
The third argument to install_hk() is the local stack size for this invocation
of the TSR. The stack size is in bytes and must always be an even number. The
local stack is allocated from the global stack area designated in the call to
stay_resident() which established this TSR. If the TSR is to be reentrant,
then the sum of the local stack sizes from all execution threads must be less
than the size of the global stack area.
The final argument specifies the reentry control mask. This mask allows you to
selectively disable reentrancy due to keyboard or scheduler interrupts. For
example, you may want to prevent the user from popping up your TSR via a
hot-key if it is still processing the previous hot-key request. The actual
reentrancy check is a semaphore which must be clear prior to calling the
service() function.
Since the install_hk() function may be invoked only once per TSR, some
additional hot-key management functions are needed after the code becomes
resident. The _hk_add() and _hk_remove() functions allow hot-keys to be added
and removed at any time. The _hk_used() function tells whether a given hot-key
is in the current active list. Last, the _hkey_again variable is true if the
same hot-key was hit twice in a row.


BCD Math Library


Since CodeRunneR does not support floating point arithmetic, a Binary Coded
Decimal (BCD) library is provided for fixed-point arithmetic. The primary
advantage of BCD is that the storage format is more accurate than binary
approximations. Also, BCD numbers can be converted to and from strings more
readily than floating point numbers can. The primary disadvantage is that BCD
arithmetic is always slower than floating point arithmetic. Also, because BCD
is not a native C data type, every expression must be broken down into a set
of function calls employing at most three operands. Breaking down expressions
reduces readability since parenthetical grouping and operator precedence are
not syntactically available.
CodeRunneR is initially shipped with the BCD library precision of 16 digits.
After users register the product, the company mails 3 new copies of the
library configured for 12, 24 and 248 digits of precision. The new library
provides the following operator functions: add, subtract, multiply, divide,
square-root, divide-by-2, compare, round, and truncate. It does not provide
trigonometric or exponential functions. MSI claims that the entire BCD library
adds less than 1K to the size of your application.


Replacement Runtime Library


The runtime library supplied with every C compiler sold is not suitable for
use in the harsh environment that a TSR program must live in. For example, the
familiar heap model where malloc() and free() can be used indiscriminately
does not apply. A TSR must allocate all the contiguous memory it ever plans to
use before becoming resident. The TSR must also remember to set its Program
Segment Prefix (PSP) every time it is invoked. Additionally, it must respect
the InDOS flag to avoid making reentrant requests to DOS functions already in
progress. Rather than building fire-walls around your existing library, it is
easier to start over with an alternate runtime library. CodeRunneR takes this
approach.
Using CodeRunneR requires giving up nearly all of the functions supplied with
your compiler's runtime library. Only functions which do not access code or
data in the startup module may be used. (CodeRunneR requires that you use its
startup module, which is incompatible with most of the compiler's runtime
library.) Unfortunately, this means runtime library functions involving file
I/O, console I/O, memory allocation and floating point may not be used. Using
any of these functions results in unresolved externals at link time.
The good news is that CodeRunneR provides a replacement library for many I/O
and string functions. To achieve source-level compatability, you must include
the file SIO.H. This file specifies macro replacements for the runtime library
functions. For example, the printf() function is replaced by _printf(), the
strlen() function is replaced by str_len() and so on. The actual amount of
source-level compatability varies by function (see Figure 1). (I will
elaborate on this in the case study accompanying this report.)


Event Scheduler



CodeRunneR includes two distinct schedulers for timed activation of your TSR.
The Master scheduler is the more sophisticated of the two and can support more
than 10,000 events. The Tiny scheduler may be used for applications that never
need to schedule more than one event in advance. The activation time of an
event is specified as number of clock ticks that the scheduler will wait. The
smallest interval is one clock tick, which is about 1/18th of a second (55
ms). The longest interval is 232 ticks, which is about 7.6 years.
Scheduler support must be installed in the initialization section of your
application. The Master scheduler is initialized via the install_sch()
function. A sample invocation might be:
install_sch(service, stack_size, lock_mask);
The first argument is a pointer to a dispatching function that is called each
time an event is due to be processed. This is similar to activating hot-keys
as described earlier. The service function will be expected to receive a
single parameter, which is a pointer to the event that triggered its
activation. Events used with the Master scheduler are defined by an event_rec
structure as follows:
struct event_rec {
int link; /* pointer to next event */
unsigned long tick_cnt; /* number of ticks
 before event */
};
The second argument is the stack size for the activation of the service
function. The same type of stack requirements described in the hot-key service
function apply here as well.
The last argument is the reentry control mask. This argument also acts
identically to its counterpart in the install_hk() function mentioned earlier.
A lock_mask value of 01h prevents another event from becoming due and
reentering the service function before it is finished with the current event.
If an event is locked-out because of the reentry control mask, it will try
again on each subsequent clock tick until it is successful.
The library provides a number of event-management functions to allow dynamic
manipulation of the event list. The add_event() function accepts a pointer to
an event record and sorts the event list. The del_event() function allows any
event record to be deleted. The _transfer_time() function suspends the
scheduler so the event list can be manipulated directly by the application.
The _prep_nx_event() function resumes the scheduler after being suspended by
_transfer_time(). The no_event variable is always updated to indicate the
number of events remaining in the list.


Eliminating Initialization Code And Data


MSI introduces the concepts of disposable code and disposable data to allow
TSRs to occupy the minimum possible resident space. Disposable code refers to
the set functions that are called only during program initialization.
Functions that are not disposable are referred to as permanent code.
Typically, disposable code is used for parsing command lines, reading
configuration files and displaying sign-on screens. All disposable functions
must be physically grouped into modules containing only this type of function.
Disposable data is defined as global variables that are only referenced during
program initialization. Data which is not disposable is referred to as
permanent data. Uses of disposable data include sign-on screens, error
messages and keyword lists for parsing command lines. All disposable data must
be physically grouped into modules containing only this type of data.
The link stream for your CodeRunneR application must be precisely configured
so that disposable code and data can operate. Each type of module must appear
in a specific order (see Figure 2). The actual compaction of the program takes
place after your initialization code has been executed and the main() function
exits. The startup module copies the permanent data down into the area
formerly occupied by the initialization code and data. Next, the startup
module calls DOS INT 21h, function 31h to become resident. The parameter to
this DOS call is the amount of memory to reserve for the TSR in paragraphs
(16-byte blocks). This memory is reserved starting with the PSP of the
application requesting residency. The CodeRunneR startup module subtracts the
size of the disposable code and data areas when calculating the memory to
reserve. Thus the actual resident size is decreased by the combined size of
the disposable areas (see Figure 4).
The amount of space actually gained by this technique depends upon the
application. The results of my case study showed a total size reduction of 1K.
More dramatic savings are possible by installing or swapping the entire TSR
into EMS memory until needed. However, EMS usage requires purchase of one PDK
as well as CodeRunneR.


PDK Add-Ons


The MSI Professional Developer's Kits (PDKs) provide important support for
features which are not available in the CodeRunneR base product. You must
already have an existing copy of CodeRunneR to use any of these extensions.
PDK#1 includes extensions for serial I/O, Expanded Memory Specification (EMS
4.0), a print spooler, and registration copy protection for your applications.
Example programs, such as a complete RAM-resident terminal program, are also
included.
The PDK#1 serial I/O support allows interrupt-driven or polled communication
at speeds up to 115K baud. Low-level service routines may be defined for all
four of the 8250 and 16550 UART interrupts: Modem Status, Transmit Ready,
Received Byte and Receive Error. Other serial I/O functions allow you to
compute CRCs against buffers, emulate VT-100 terminals (ANSI) directly, and
manage hardware and software flow control.
The EMS support in PDK#1 provides several services. First, the move_to_lim()
function provides your TSR with a single-step operation to install its code
and/or data into EMS memory. When a TSR is installed in EMS memory, as little
as 1K of resident code in DOS memory is required. TSR programs which require
far heap allocation, interrupt-driven communication or re-vectoring software
interrupts cannot be loaded into EMS memory. Second, the EMS functions enable
you to allocate, deallocate and manipulate 16K EMS pages and page mappings.
The PDK#1 documentation for these functions assumes that you are already
familiar with EMS addressing schemes.
Last, PDK#1 provides a registration copy-protection scheme for applications
that you develop. This set of functions allows you to create a shareware
application that may be partially or wholly crippled until registered by the
end-user. The registration process can encode the user's name and company
right into your executable. A checksum and debugger detection mechanism can
prevent users or viruses from modifying your executable.
Since the PDK#3 package was still in Beta testing during this evaluation, I
will remark on its contents only briefly. PDK#3 introduces several ways to
swap out either the TSR or foreground application. These programs may be
swapped out to EMS or disk to make room for spawning other DOS applications.
After the spawned application exits, the TSR or foreground program swaps
itself back into DOS memory. The other major feature in PDK#3 is the ability
to save and restore graphics displays. You can save and restore the mouse
status as well.


Documentation And Support


The CodeRunneR documentation consists of a 300-page paperback book. This
manual must be considered as only a reference guide since nearly all of the
text is devoted to describing each individual function in the CodeRunneR
library. Some limited tutorial information is scattered in various sections,
but comprises less than one-fifth of the text. It seems that the author
intends you to simply scan the source code for the pop-up utilites provided
with CodeRunneR to deduce their operation.
An introductory section entitled "Hints for Efficient C Programming" is a
jumble of general C programming techniques and specific hints for writing
CodeRunneR applications. Some of the hints amount to a quixotic indictment of
the efficiency of C compilers. For example, hint #24 admonishes the reader
that modular, readable, structured and object-oriented coding practices are a
rich source of inefficiency.
The CodeRunneR manual is written for seasoned MS-DOS programmers. For example,
if you don't understand the structure of the Program Segment Prefix (PSP),
this book won't explain it for you. You will need a reference book, such as
Ray Duncan's Advanced MS-DOS (2nd Ed.), which details the parameters of every
BIOS and DOS interrupt. The text makes frequent references to operating system
calls, such as "See BIOS INT 10h" and "See DOS INT 21h, function 3Eh". The
"File I/O" function descriptions are especially terse, often mentioning little
more than the INT 21h function code.
In addition to the highly technical treatment, the lack of illustration adds
to the complexity. Not a single graphic illustration appears in the entire
text. Illustrations would make the text less dense and make abstractions, such
as the memory map and scheduler operations, easier to comprehend.
The documentation is accurate for the most part, but does contain numerous
typographical errors and has some problems with the example code fragments.
Many function descriptions advise you to turn to a different page number for
an example. However, in many cases, the example code fragment referenced does
not include a call to the function in question. For instance, each of the six
function descriptions in the "Software Interrupt" section advise you to see
page 4-142 for an example. Contrary to this statement, there are examples for
only two out of the six functions in this section.
There is also some confusion involved with the descriptions of the BCD
library. Page 1-3 states that the BCD package has "variable precision to 248
digits". Later in the same paragraph, it states the BCD package has "248 bits
of precision". These are not the same thing when base-10 or BCD arithmetic is
considered. This kind of ambiguity occurs on page 1-7. It is finally made
clear in on page 5-1, stating that there are 248 digits of precision rather
than 248 bits of precision.
Each PDK is sold as a separate product and comes with its own set of
documentation. The documentation for the PDK #1 extensions consists of a
100-page spiral-bound book. Unlike the CodeRunneR manual, the PDK #1 book does
not have an index. The PDK #3 extensions are still in beta testing so its
documentation will not be considered. If you were using all of the PDK
extensions with CodeRunner, you might have some difficulty locating the
reference for a particular function. (There is not a master index across all
of the books.)
MSI provides support through two different channels. First, they offer a
customer support hot-line and fax, which is available during business hours.
This support is available for an unlimited time after registering the
software. Second, the company offers support through its own Bulletin Board
System (BBS).
MSI has one of the best BBS support lines I have seen yet. The BBS runs 22
hours/day with eleven phone lines; eight of these support up to 2,400 baud and
three support 9,600 baud and 14,400 baud. Registered users of MSI products
have unlimited download privileges and are allowed up to three hours per day
on the BBS. Most importantly, all new releases and beta versions of MSI
software are posted immediately on the BBS. During the course of my review, I
took advantage of this and upgraded from CodeRunner v1.05 to v1.06. Users
without a modem can upgrade to the latest version of CodeRunneR by having a
diskette mailed to them for a $10 service charge. In addition to the very
latest versions of MSI products, the BBS also has approximately 1,000
megabytes of public domain software online via CD-ROM. The response time to
the questions I posted on the BBS was excellent. The questions I called in on
a Saturday morning were answered before noon on the same day.


Case Study: Porting Gnomon


In order to measure the effectiveness of CodeRunneR as a development tool, I
decided to port a node monitoring utility I had developed to use with PCBoard
bulletin boards. This program, named Gnomon, was released into the public
domain in December 1989. Gnomon displays the activity of each BBS node on a
network that has several different workstations (nodes) all running PCBoard.
Gnomon reads the BBS status log files every few seconds and displays the
latest information in a window. Gnomon's usefulness was hampered because I had
to tie up an entire workstation with it or else drop what I was working on to
bring it up. Consequently, Gnomon was an excellent candidate to be converted
from a standalone program to a TSR program.
For lack of a better method, I began the porting process by first building the
program template provided in the CodeRunneR examples directory. Next, I broke
up my source code to provide a hot-key pop-up entry point besides the
initialization entry point in main(). Then I replaced the initialization and
pop-up functions in the template program with those from my own application.
The remainder of my effort was spent reconciling the CodeRunneR library with
the MSC 5.1 runtime library.
The first incompatibility I came across was the handling of the main()
function in the startup module. Unlike the standard runtime startup module,
the CodeRunneR startup module does not pass argc and argv to the main()
function. Instead, the program arguments are left in an externally defined
null-terminated string called cmd_line. If your program relies on argc and
argv, then you will have to parse the command line and initialize them
yourself.
The next compatibility hurdle was dealing with the file I/O functions.
Coderunner provides some substitute functions which can be enabled by
including the SIO.H file. However, these are only stub functions for the DOS
INT 21h, file access calls. This means that only block I/O file functions
(open (), read(), etc.) are supported and stream I/O functions are not.
Luckily, my application used only one significant stream I/O function, fgets
(), which I then had to rewrite for myself. The CodeRunneR I/O functions do
not match the standard naming conventions. Instead, the block I/O functions
are named like the stream I/O functions (fopen(), fread(), etc.) are supposed
to be.
Besides the stream I/O functions, there were a few block I/O functions missing
which my application required (see Figure 1). First, the lseek() function was
not included in SIO.H even though I discovered later it could have been mapped
to the CodeRunneR fpos() function. Next, I found I could also use fpos() to
replace the filelength() function. The only block I/O function I had to
provide for myself was sopen(). I rewrote the sopen() call, which allows files
to be opened with sharing permissions, as a stub that calls INT 21h, function
3Dh.
I found the remainder of the runtime functions that were not directly replaced
by SIO.H to be readily replaceable by other CodeRunneR functions. For example,
memset() was replaced by filchr() and int86() was replaced by intn(). In many
cases, the parameters were identical to those in the MSC runtime library. If
the CodeRunneR function names were more similar to those of the runtime
library, this step could have been reduced.
After completing the porting process, I found the end-product to be a reliable
and well-behaved TSR. I made note of the memory requirements at each stage of
the port to determine the effect of the CodeRunneR library (see Figure 3). The
CodeRunneR library was more compact than the MSC 5.1 library and reduced the
memory requirement from 19K to 15K. Next, I isolated the disposable
initialization code, which consisted largely of command-line parameter
parsing, into a separate module. Isolating the initialization code realized an
additional 1K savings to the resident size of the program. After optimization,
my newly-converted TSR used 5K less RAM than in its original non-resident
incarnation. Finally, I used PDK#1 to load the TSR in EMS memory and reduced
the total DOS memory requirement to 1K.
I tested my CodeRunneR TSR application in conjunction with the Quarterdeck
Extended Memory Manager (QEMM) v5.0 and DESQview 2.26. The test was designed
to see under what circumstances the TSR could successfully determine if it was
already resident when invoked a second time from the DOS command line. By
detecting a second load attempt, the TSR avoids memory that would be wasted by
another copy of itself as well as possible interrupt-handler conflicts. For
completeness, the TSR must check both low and high areas of DOS memory.
The CodeRunneR function second_load() does this check and also returns a
pointer to the PSP of the original copy of the TSR. The second_load() function
operates by searching the DOS internal memory allocation lists and ferreting
out memory blocks whose PSP points to a CodeRunneR TSR. Each CodeRunneR TSR
contains an eight-byte header with a unique four-character ID and version
number assigned by the application developer.

The ability to load TSR programs into "high" DOS memory is one of the primary
features of QEMM and other 386 control programs such as 386-to-the-MAX by
Qualitas, Inc. High DOS memory is defined as the unused portion of the 384K
RAM above the 640K barrier. High DOS memory must be divided between BIOS ROMs,
display adapter memory, EMS page frames, and other memory-mapped I/O areas.
Any gaps remaining after these required areas have been filled may be occupied
by DOS drivers and TSRs. Loading TSRs into high memory is performed by the
LOADHI.COM facility of QEMM and the corresponding 386LOAD.COM program of
386MAX.
The results of my test proved CodeRunneR able to detect the second load in all
but one of the cases tested (see Figure 4). I was only able to fool CodeRunneR
by first installing the TSR in high DOS memory (via LOADHI.COM) and then
installing the TSR in low DOS memory. With v1.05 of the library, a CodeRunneR
TSR only searches for PSPs in DOS memory earlier than its current location.
This allowed a high load of the TSR to pass undetected from the second load.
This problem has been resolved in a subsequent release.
Since the GNOMON program also used the DESQview Applications Programming
Interface (API) for C, this provided another compatibility test for
CodeRunneR. CodeRunneR did not interfere with the DESQview API calls. I had no
difficulty loading the TSR before DESQview and invoking it with a hot-key
later in a DOS window. Similarly, I could also load it in a DESQview window
after DESQview had already been loaded.


Conclusion


CodeRunneR is an effective tool for creating new TSRs and porting existing
applications to a TSR environment. However, CodeRunneR could be even more
effective with two additional improvements. First, crucial functions available
only in the PDKs should be moved to the base CodeRunneR product. Features such
as EMS handling and graphics mode support are so basic that they should not
require the user to buy addon packages. Second, the replacement runtime
library should be enhanced to cover the full range of functions expected in
the C runtime library. At a minimum, the I/O functions should be expanded to
encompass the stream mode functions as well. CodeRunneR functions that nearly
duplicate C runtime library functions in name or in usage should be made fully
compatible.
The CodeRunneR product excels in its TSR capabilities, coexistance with other
DOS applications and technical support. If you are an experienced MS-DOS
developer, then you should consider CodeRunneR for your TSR needs.
References
Duncan, Ray. Advanced MS-DOS Programming, 2nd Edition. Redmond, Washington:
Microsoft Press. 1988. (A comprehensive catalog of DOS and BIOS interrupts.
Includes many low-level details of the MS-DOS environment. A reference such as
this is neccessary for CodeRunneR developement).
Davidson, Mark. "TSR Libraries: Pop-up Programming". Computer Language, May
1990. pp. 127-136. (A round-up of four TSR building libraries, including
CodeRunneR).
Volkman, Victor R. "Multitasking with the DESQview API Library". The C Users
Journal, July 1990, pp. 99-109. (Another application developed using the
DESQview API library).


Credits


PCBoard is a registered trademark of the Clark Development Corporation, Salt
Lake City, UT.
QEMM and DESQview are registered trademarks of Quarderdeck Office Systems,
Inc.
386-to-the-MAX is a registered trademark of Qualitas, Inc.
CodeRunneR is a registered trademark of Microsystems Software, Inc.
CodeRunneR
Micro Systems Software, Inc.
600 Worcester Rd.
Framingham, MA 01701
(508) 626-8511
Figure 1 Comparison Of Runtime Library Functions
MSC Library CodeRunneR Compatibility comments
 function equivalent
-------------------------------------------------------------------
 _aNlmul *none* MSC 5.1 intrinsic multiply long int
 _aNldiv *none* MSC 5.1 intrinsic divide long int
 _ctype *none* Wrote my own macro for is_digit()
 _strtime *none* Wrote my own function with CRR dos_time()
 atoi dec2w Use two arguments instead of one
 close fclose File handle only, not stream I/O
 fgets *none* Wrote my own replacement function
 filelength *none* Replaced with modified lseek() call
 fstat *none* Decided I could live without it
 getch pckey Must mask off scancode in high byte
 int86 intn Substituted other CRR functions for these
 kbhit anykey Direct replacement
 lseek fpos Direct replacement
 memset filchr Direct replacement
 printf _printf Doesn't support floats
 read fread File handle only, not stream I/O
 sopen *none* Wrote my own with INT 21h, function 3Dh
 sprintf _sprintf Doesn't support floats
 strtok *none* Wrote my own function
Figure 2 CodeRunneR Memory Maps
Figure 3 Memory Usage With CodeRunneR Development
 Development Phase Run-time DOS
 Memory Required
---------------------------------------------------------
Original non-resident GNOMON.EXE 19K bytes
as built with MSC 5.1 library


Non-resident GNOMON.EXE 15K bytes
as built with CodeRunneR library

Resident GNOMON.EXE un-optimized 15K bytes
as built with CodeRunneR library

Resident GNOMON.EXE with disposable code 14K bytes
as built with CodeRunneR library

Resident GNOMON.EXE loaded into LIM EMS 1K bytes
as built with CodeRunneR and PDK#1
Figure 4 Second-Load Detection Test
 TSR loading sequence attempted Detected?
-----------------------------------------------------------
Load TSR twice in Low DOS memory Yes

Load TSR twice in High DOS memory Yes

Load TSR in Low DOS, then in High DOS memory Yes

Load TSR in High DOS, then in Low DOS memory Yes

Load TSR before DESQview, then inside DOS window Yes








































The C Toolbox


Derrel Blain


Derrel R. Blain currently works as a writer for Greenleaf Software. He holds a
BA in Computer Science and an MA in English.


If you are familiar with programming and PCs, and you want a thorough
grounding in C, you should read The C Tool-box by William James Hunt. This
book will lead you from a simple definition of C statements in the first
chapter to writing a Control-Break handler and a terminal emulation program in
the last chapters.
This is the second edition of The C Toolbox. (The first edition was reviewed
by Andrew Binstock in The C Users Journal, Dec/Jan 1988 issue.) If a
programming book is worthy of a second edition two years later, you may be
sure it is a useful book. Binstock recommended this book without reservation
then, and I do the same now. Hunt has updated this edition to make it as
useful as ever to beginning and intermediate C programmers.
The author has carefully mapped his instruction so the reader quickly develops
useful C programming skills. The concepts and routines of previous chapters
provide groundwork for the next. For instance, chapter 2 presents C's storage
and data type conventions, but mingles this explanation with a discussion of
elementary I/O for C. Chapter 3 elaborates on these concepts into a complete
set of modules for displaying ASCII files. The next chapter continues to build
on these ideas. By the time readers have worked through chapter 4, they have a
sound collection of modules for displaying files. These include a set of
simple screen output routines for paging through the displayed file. These
programs are not apprentice lessons which, once learned, are only discarded.
In addition to being a very good tutorial on C, this book also provides a
collection of essential algorithms. In PC computing, some basic tasks must be
performed no matter what the language, like reading and writing files, sorting
and displaying data, and communicating via parallel and serial ports. This
book provides solid instruction on handling these tasks in C. The results are
functional code that the reader can use.
Recreating code from scratch is a waste of time. If you intend to develop a
professional application in C, go out and buy a professional library -- it's
well worth your time and money. Of course, many programmers do not need the
sophistication of a professional library.
My programming efforts concern research in digitally sampled sound and
linguistic data, so I write many routines that only have a one-time use.
However, even code meant for a one-time application needs routines for common
tasks, such as sorting a collection of data records. In past years, I wrote
much of this code in Pascal. When I began using C, its apparently chaotic
syntax gave me fits. This book would have saved me time in learning some of
C's peculiar traits, while helping me to produce code that was immediately
useful. For instance, using the files from the The C Toolbox's companion disk,
I had a working C program for sorting text files of words into a lexicon in
about 20 minutes.
The companion disk contains all the exercises and examples that appear in the
book. The code for each chapter is kept in separate directories, so each of
the pieces is easy to find. Hunt even provides project files for Turbo C
users, making compilation of all the modules even easier.
The attention to detail of this book is admirable. Hunt includes a chapter on
simple optimization for C code and an appendix on the PC memory models.
Chapter 7 furnishes a respectable introduction to low-level access of the PC
BIOS. If you are learning C, you should read this book. Practically everything
you need is here.
The C Toolbox
William James Hunt
Addison-Wesley
$22.95, 501 pages
ISBN 0-201-51815-5







































UNIX Programming: Methods and Tools


Chris Flatters


Chris Flatters is a scientific programmer who works with FORTRAN and
(occasionally) C on UNIX and VMS systems. Outside working hours he fiddles
with C and C++ on an IBM PC.


UNIX Programming: Methods and Tools is a tutorial introducing the UNIX shell
and utilities. It mainly focuses on the Bourne shell but devotes one chapter
to the C shell. The book, which is intended for those with little or no
computing experience, discusses only the more general UNIX tools and does not
cover the UNIX text processors (nroff and troff) or the specialized
programming tools.
The book is well-organized. Each chapter begins by listing information that
will be covered, and each chapter also includes an extensive set of exercises
(some of the exercises include solutions at the end of the book) and
suggestions for further reading. The author also includes experiments for the
reader to try out at the keyboard. Finally, each chapter contains a number of
tables summarizing commands, symbols and keywords covered in that chapter.
Unfortunately the summary tables (and many other tables in the text) are
poorly laid out and difficult to read. Also, they contain a number of serious
misprints.
The opening chapter introduces the reader to the UNIX system. It contains a
short history of the development of the UNIX system and explains the relation
between the shell and the operating system. The user is taken through the
log-on process and shown how to issue simple commands, including those that
allow users to communicate with each other (write and mail). Unfortunately,
the author assumes that the user's default log-in shell is the Bourne shell,
which is not the case on many systems. Although the author notes that other
shells exist he does not give any guidance on how they may be selected. There
is also a curiously anachronistic note in that the author assumes that the
erase character is set to # and the kill character (which erases a whole line)
to @ (the reader is shown how to change these assignments with the stty
command). These characters are indeed the defaults for the UNIX system but
date back to the days when computer users had to talk to the computer using
teletypes. This is no longer the case. Most UNIX systems now automatically
redefine the erase and kill keys to be more suitable for terminals.
After basic training, the author presents the finer details of the UNIX file
system. The level of detail is probably inappropriate to the beginner -- the
author suggests studying octal dumps of directories and discusses i-nodes (the
internal structures used by UNIX to describe files). This section of the book
does, however, contain useful information on setting file protections.
The author then introduces the user to the concept of processes and explains
that the various ways of invoking a UNIX command may or may not involve the
creation of a new child process. Although the author explains some important
points, the discussion of processes is incomplete. For example, the fact that
a new process must be created to handle the second and subsequent commands in
a pipe (so that non-exported variables will be unknown to these commands) is
overlooked.
Next, the book discusses shell programming proper. Shell programming is taught
mainly by example. In many cases, the author presents a simple procedure and
adds features in subsequent examples. This rudimentary approach is appropriate
for the novice, but readers who have some programming experience will find
progress painfully slow.
The author devotes a chapter to techniques for designing and building
procedures. Although beginning programmers need to be exposed to this topic,
some of the techniques are treated only superficially.
After shell programming, the author discusses the more advanced
general-purpose UNIX utilities: the grep family of text searching programs;
the stream editor, sed; and the awk pattern matching language. The description
of grep and sed is basically solid but is marred by a garbled account of
regular expressions. The book covers the original version of awk and not the
more recent version described by Aho, Kernigan and Weinberger in The awk
Programming Language (this development was probably too recent for inclusion).
Nonethless, the reader is introduced to awk basics, including the relationship
between awk and C.
The book concludes with a short description of the C shell. The description is
by no means comprehensive, but it does present features of the C shell that
are not available in the Bourne shell.
In the preface the author claims that the procedures developed in the examples
throughout the book are intended to be useful tools. He does not, however,
achieve his aim. Most of the more sophisticated examples are programs for
playing word games rather than accomplishing any useful task. The author's
intent was probably defeated by the comprehensive UNIX utilities.
Readers with even moderate programming experience will almost certainly find
UNIX Programming: Methods and Tools pitched at too low a level. Although it
does contain some useful information, I would also hesitate to recommend this
book to a rank beginner. It has a number of flaws and gives the impression of
being rather diffuse and academic. In many ways the book appears to be an
introduction to programming using the UNIX shell. I don't think this approach
will appeal to many UNIX users, most of whom (even the beginners) will want to
use the shell to achieve some specific objective. These users will find that
there is surprisingly little hard information in UNIX Programming: Methods and
Tools and the information that is present is often made inaccessible by a poor
index.
UNIX Programming:
Methods and Tools
James F. Peters III
Harcourt-Brace-Jovanovich, 1988
447 pages
ISBN 0-15-593021-4



































Editor's Forum
With this issue, I pick up the torch from Robert Ward as editor of The C Users
Journal. His is a tough act to follow. As founder and (still) publisher of the
magazine, he edited the first two dozen or so issues. He has established a
style that has won the hearts of tens of thousands of demanding readers. It is
a style that I, too, admire and respect.
I view my principal task as new editor of CUJ as honoring its style. Change is
inevitable. The magazine evolved under Robert's stewardship and it will
continue to evolve under mine. But change for the sake of change is not a
thing that I strive for. Opinionated as I am, stubborn as I am, I still cling
to the engineer's golden rule -- If it ain't broke, don't fix it.
I believe that the technical slant of CUJ does not need fixing. This is a
magazine written by C programmers and for C programmers. I may solicit
contributions a bit more agressively, to have more input to choose from. I may
edit a bit more severely, to minimize wording that is vague or misleading. But
I intend to keep the range and mix of topics much the same.
CUJ is still Robert Ward's baby. He has spent hours telling me how he does
various editing tasks. (Translation -- Please keep doing it this way.) He has
anguished visibly each time he passes off another task to me. I sometimes feel
like a high-school sophomore being scrutinized by a girl's father before her
first date.
Robert is a good techie and a good explainer. Those attributes lie at the
heart of his editorial focus. Tell 'em what C is all about and tell 'em as
clearly as possible. Every article I select or write will continue to serve
those two purposes.
On a recent business trip, I experienced my first emergency landing. We lost
an engine on takeoff, so we had to dump fuel and land still heavy and short
one engine. Crash trucks and ambulances chased us down the runway. I was
writing a C function during the takeoff. During the evacuation instructions, I
finished it off. My last act before assuming the head-between-the-knees crash
position was to make sure that the braces balanced.
I guess I'll be a techie to the end. That, if anything, qualifies me to edit
this magazine.
P. J. Plauger
Editor



















































New Products


Industry-Related News & Announcements




Eiffel Environment To News Workstation


Sony Microsystems and Interactive Software Engineering announced an agreement
in which Interactive's Eiffel advanced object-oriented language, method and
computer-aided software engineering environment will be used with Sony's News
family of UNIX workstations.
The Eiffel package includes a complete set of libraries, which cover
fundamental data structures, X Window Systems-based graphics and multiple
windowing software, and other applications.
For more information, contact Sony Microsystems Company, 652 River Oaks
Parkway, San Jose, CA 95134, (408) 434-6644.


Debugger Software, Version 5


The Periscope Company is now shipping Version 5 Periscope software. Version 5
has a new user interface with optional pull-down menus and enhanced online
help.
Version 5 software supports source-level and symbolic debugging of Microsoft C
6.0, Borland C++ 1.0, and Borland Pascal 5.5. Users may select Code View or
Turbo Debugger function-key emulation instead of using the standard Periscope
functions keys, or they can customize the function keys.
For more information, contact The Periscope Company, 1197 Peachtree St., Plaza
Level, Atlanta, GA 30361, (404) 875-8080; FAX (404) 872-1973.


Borland's Turbo C++ For Educational Market


Borland International has shipped an academic edition of Turbo C++, its
object-oriented programming software. The product is available through the
Borland Scholar program for $39.95.
For more information, contact Borland International, 1800 Green Hills Road,
P.O. Box 660001, Scotts Valley, CA 95066-0001, (408) 439-1880.


C Compiler For TI's TMS320C25


BSO/Tasking has released an optimizing compiler for the TMS320C25, Texas
Instruments' Digital Signal Processor, together with a companion assembler.
The C compiler is a full ANSI C implementation that generates assembly
language. It supports the writing interrupts at the C level. The compiler is
supplied with C libraries in source form.
The assembler produces relocatable object code, listing files, and diagnostic
messages. It also accepts TI assembly mnemonics and directives, and supports
HP and TI-tagged formats. The assembler utilities included are librarian,
cross-reference generator, and object code-converters.
The compiler takes advantages of the features of the DSP chip and produces
highly optimized code. The compiler has full ANSI C implementation and
supports TI assembly format. The compiler has the same calling sequence on all
hosts, and the same source programs and generated assembly code can be used on
all hosts.
The compiler and assembler versions run on a number of host platforms
including IBM PC, VAX, VAXstation and HP9000. The versions for Sun,
DECstation, and Apollo will be available soon.
Prices start at $1,695 for the compiler, assembler, linker, and librarian. For
more information, contact BSO/Tasking at 128 Technology Center, P.O. Box 9164,
Waltham, MA 02254-9164, 1-800-458-8276; FAX (617) 894-0551.


Binary Specification In 386/486 Open Systems Market


AT&T's UNIX System Laboratories, Intel Corporation, and The Santa Cruz
Operation have agreed to define an extended common binary applications
compatiblity specification. Application developers following this
specification can create a single version of their application which will run
on all 386/486-based operating systems that conform to the specification.
This new specification will support international open system standards for
source-language compatibility. For more information, contact UNIX System
Laboratories at (201) 829-7212.


X-Window Co-Processor Technology


International Software Corporation is planning to sell the Intel 82786
X-Window co-processor technology embedded in its PixC workstation product.
PixC consists of a graphics co-processor board, mouse, high resolution monitor
and software, and it integrates X-terminal capabilities into existing 80286 or
80386 systems.
For more information, contact International Software, 433 Park Point Drive,
Suite 275, Golden, CO 80401, (303) 526-0388.


Object Translation Code



TransWare Enterprises is now offering an object translation code (OTC) that
translates Intel and MS-DOS-compatible object modules form their Object Module
Format (OMF) to UNIX's Common Object File Format (COFF).
OTC is run on a standalone program. It supports several MS-DOS compilers,
including Microsoft MASM, Borland Turbo Assembler, Pharlap ASM-386, MetaWare
High C 386 and Lahey F77L-EM/32 to generate COFF modules.
OTC is linked with existing language products to perform in-line translation
of OMF records to COFF sections. Standard implementation allows OTC to
co-exist with current OMF object module generator, so users can generate both
OMF and COFF modules at the same time.
OTC includes three utility programs. It also supports a list option that
displays the complete contents of an Intel OMF file in human-readable form.
The OTC programmer's tool sells for $129.95. Contact TransWare Enterprises,
5091 Durango Court, San Jose, CA 95118, (408) 723-2102.


Windows And Menus for Xenix and UNIX


Greenleaf Software has released a windowing development tool. The tool
supplies user interface capabilities for windows, menus, mice, and other
features. Greenleaf DataWindows furnishes the developer with a uniform user
interface development tool for the C language on UNIX and Xenix machines. It
requires no source code changes at the application level when porting programs
between MS-DOS, OS/2, SCO Xenix, UNIX System V, and VMS operating systems.
The Greenleaf development tool provides logical windows, device independence,
transaction-oriented data entry, pop-up, pull-down, Lotus style menu systems
and list boxes. It also provides hierarchical menu systems with unlimited
nesting, three kinds of context sensitive help, function calls, and down-level
menu calls for any menu item, automatic indication of hot-key, accelarator key
and toggle indicators, and horizontal and vertical scrolling of menu levels.
Greenleaf DataWindows supports any Microsoft compatible mouse for MS-DOS
machines, VGA, EGA, and monochrome monitors. It supprots Microsoft C, Quick C,
Turbo C, Lattice C, UNIX, Xenix or VMS system compilers. The package includes
source code, documentation, and free technical support. The price is $895.
Contact Greenleaf Software, 16479 Dallas Parkway, Suite 570, Dallas, TX 75248,
(800) 523-9830.


Transaction-Based Data Management Tool


C-Trieve/Windows from Coromandel Industries is a new transaction-based ISAM
file-manager that allows developers to build large custom data management
applications with less code in C or C++. It runs under Microsoft Windows and
is compatible with MS Windows 2.XX and 3.0.
C-Trieve/Windows is a library of routines that allows developers to build
large data management applications efficiently. It also reduces coding and
debugging requirements for transaction-based applications.
C-Trieve/Windows sells for $395. Contact Coromandel Industries, 70-15 Austin
St., Third Floor, Forest Hills, NY 11375, (718) 793-7963; FAX (718) 793-9710.


C++ For The QNX Operating System


Computer Innovations is now offering C++ for the QNX Operating system. C++ for
QNX combines the standard v2.1 C++ language compiler and a development
environment familiar to QNX users of its C86 C compiler.
The package includes v2.1 C++ compiler, integrated CC compiler/linker/debugger
driver, SID++ C++ full-screen, source-level debugger, standard C++ class
libraries and needed utilities. The C++ language definition supported is
standard and portable to other environments. The package includes
documentation to help developers learn and use C++.
The price is $500 for a sinlge machine. For more information, contact Computer
Innovations, 980 Shrewsbury Ave., Tinton Falls, NJ 07724, (201) 542-5920.


Developers' Kits


Stallion Technologies has released three developers' kits, each for a
different type of application. The Hardware Development Kit is for a highly
specialized serial I/O environment. The Board Level Interface Kit is used for
developing office automation systems. The Protocol Development Kit provides
tools for the developer to adapt Stallion's hardware and software to a
specialized, high-speed UNIX or MS-DOS environment.
Contact Stallion Technologies at 983-B University Ave., Suite 104, Los Gatos,
CA 95030, (408) 395-5775; FAX (408) 395-6396.


Meta Window-TC


Metagraphics Software and BayWare have introduced MetaWindow-TC, a version of
the MetaWindow Plus graphics library optimized for Japanese PCs.
MetaWindow-TC provides software developers an object-oriented graphics
environment for implementing custom graphical user interfaces and
applications. It comes standard with 16, 24, and 32 point bitmap fonts for
more than 7,000 Japanese characters.
For more information contact BayWare, 201 San Antonio Circle, Suite 128,
Mountain View, CA 94040, (415) 949-3190; FAX (415) 949-1413.


C Development Environment


Interactive Development Environments has introduced the C Development
Environment, a combination of UNIX development tools and IDE training and
consulting.
The C Development Environment supports the entire C software development
process and can be used to develop new applications and maintain existing
systems. It combines Software through Pictures, Saber-C, and FrameMaker or
Interleaf TPS CorePlus.
The single license price of Software through Pictures ranges from $5,000 to
$21,000 depending on the configuration. Single license prices for Saber-C,
FrameMaker and Interleaf TPS CorePlus are $2,495, $2,500, and $2,500
respectively.
For more information, contact Interactive Development Environments, 595 Market
St., 10th Floor, San Francisco, CA 94105, (415) 543-0900; FAX (415) 543-0145.


Development Tool For Windows 3.0


Xian Corporation is now shipping Winpro/3, the newest addition to its line of
software development tools. Winpro/3 is an interactive application designer
with an integrated C code generator for creating Windows 3.0 programs.

Winpro/3's graphical interface allows users to design a main application
window and its menus, dialogs, icons, bitmaps, and cursors. Once the initial
design is complete, Winpro/3 generates and compiles a fully-commented,
executable C code framework ready for further development.
Winpro/3 requires the Microsoft Windows 3.0 Software Development Kit and
Microsoft C 5.1 or 6.0. It sells for $895. For more information, contact Xian
Corporation, 625 North Monroe, St., Ridgewood, NJ 07450, (201) 447-3270; FAX
(201) 447-2547.


Metaware Updates High-C Package


MetaWare has released v2.3 of its ANSI-conforming High C compiler for Extended
MS-DOS on the 386/486 and v1.0 of its 32-bits source-level debugger.
The compiler includes global optimization. The full power of protected mode on
the 386 and 486 is supported in conjunction with Phar Lap's
386/MS-DOS-Extender. Specific support for the 486 is provided under toggle
control. Version 2.3 supports the 32-bit source-level debugger, an execution
profiler, a one-step compile-and-link driver, an editor, a graphics library
for the 386/486 in protected mode, and MetaWare's MS-DOS Helper. The High C
complier also features a code generator for Weitek's 1167, 3167, the Intel
387, and compatible instruction sets, including support of in-line
transcendentals and floating-point long doubles (80 bits).
The High C compiler sells for $895, and the source-level debugger sells for
$295. For more information, contact MetaWare, 2161 Deleware Ave., Santa Cruz,
CA 95060-5706, (408) 429-6382; FAX (408) 429-9273.


Source Code Generator


DialogCoder from The Software Organization is a source code generator that
manages dialog box controls and control relationships.
DialogCoder provides a complete development environment for creating,
modifying and maintaining dialogs. It supports all dialog controls and styles,
including bitmaps and custom controls. DialogCoder generates commented source
code and integrates with CASEW and WindowMaker.
DialogCoder requires Windows 3.0 and is priced at $499. For more information,
contact The Software Organization, P.O. Box 1926, Brookline, MA 02146, (617)
354-2012.


CAD/CAM Development Kit


The CAD/CAM Developer's Kit (CCDK) is now available from Building Block
Software. CCDK is a toolbox of C functions for writing third-party CAD/CAM
applications.
The kit provides full 3D DXF, 2D and 3D display and geometry operations, and
list management capabilites. It also provides intersections of ellipses and
splines, with themselves and other objects. The kit uses object hierarchies
and data encapsulation.
CCDK supports full DXF Release 10 functionality, enabling programs to read and
write both ASCII and binary DXF formats. Information is read in a true 3D
format and stored in DXF files, including layers, colors, and line patterns.
The kit provides a range of 2D and 3D geometry operations, including
translation, rotation, scaling, mirroring, intersection, offsetting, trimming,
selection, projection, bounding box and extremums computations, point,
tangent, and curvature evaluations, and linear approximation.
The kit is compatible with a variety of MS-DOS-based ANSI-standard C
compilers, including Microsoft C v5.1 and 6.0, and Borland Turbo C v2.0.
Single programmer license start at $1,295. Contact Building Block Software,
P.O. Box 1373, Somerville, MA 02144, (617) 628-5217.


IDE Contracts To Resell Saber-C And FrameMaker


Interactive Development Environments has expanded its product line by signing
reseller agreements with Saber Software, Frame Technology, and Interleaf.
Saber-C is a C programming environment that increases programmer productivity
and improves software quality. Saber-C features include automatic static and
runtime error detection. FrameMaker and Interleaf TPS are workstation
publishing tools for producing technical documentation.
For more information, contact Interactive Development Environments, 595 Market
St., 10th Floor, San Francisco, CA 94105, (415) 543-0900; FAX (415) 543-0145.


DISC Offers UNIX Version Of PVCS Version Control


Digital Information Systems Corporation (DISC) has released the UNIX version
of Sage Software's Polytron Version Control System (PVCS) and PolyMake.
DISC is marketing both PVCS and PolyMake as the DBL Synergy Configuration
Management System. It is currently ported to more than 50 UNIX platforms plus
Xenix, Ultrix, AIX and A/ux. DISC is also releasing a VMS version.
For more information contact DISC, 11070 White Rock Road, Suite 210, Rancho
Cordova, CA 95670-6099, (916) 635-7300; FAX (916) 635-6549.


Development System Supports Open Desktop Applications


The Santa Cruz Operation is now shipping the Open Desktop Development System,
an upgrade to the Open Desktop, the complete graphical operating system for
standard 386 and 486 PCs based on Industry Standard, Extended Industry
Standard or Micro Channel Architecture.
The Open Desktop Development System includes a complete set of
industry-standard Application Programming Interfaces and programming tools.
The price is $1,495. Contact The Santa Cruz Operation, 400 Encinal St., Santa
Cruz, CA 95061, (408) 425-7222.


Graphics Development Toolkit


Graphic Software Systems has released its new Graphics Development Toolkit
(GDT) for UNIX in the Interactive 386/IX environment. Current releases of the
toolkit support both MS-DOS and OS/2 operating environments.
The UNIX GDT supports display devices through an X Window System driver and
also offers a wide variety of hard copy device support. It sells for $995.
Contact Graphic Software Systems, 9590 SW Gemini Drive, Beaverton, OR 97005,
(503) 641-2200; FAX (503) 643-8642.
































































New Releases


Updates




CUG159 Adventure


Bob Withers (TX) has updated the adventure game, CUG159 Adventure, which was
originally written by Willie Crowther. The code now compiles under Microsoft C
v5.10 or later for both MS-DOS and OS/2. The modified code also eliminates the
constant disk access during game play.


CUG266 microPLOX


Bob Patton (TX) has updated his original graph/charts drawing program,
microPLOX. This v5.0 release includes internal restructuring of the program
and several new features: multi-color support for color monitors (CGA, EGA,
and VGA), and multiple data file use from one command file. The program was
developed under Watcom v7.0.


CUG292 ASxxxx C Cross Assembers


Alan R. Baldwin has released v1.5 of ASxxxx assemblers and linkers. These new
releases can now move byte index, direct page mode, and byte PC relative
address checking from the assembler to the linker. This change has allowed the
following enhancements:
The .setdp directive now has a common format for AS68xx assemblers.
Direct page variables may be externally defined with their addresses resolved
at link time.
Byte index offsets may be external references and resolved at link time.
Byte PC relative instructions (i.e. branches) may reference external labels or
labels in other areas.
The ASxxxx assemblers can generate a listing file that flags the data that
will be relocated by the linker. The REL file format now has an additional
directive for paging information and has additional flags for page0, page, and
unsigned byte formats. The new liker is compatible with the first version of
the ASxxxx assemblers.


CUG297 Small Prolog


Henri de Feruady (France) has reported a bug in his "Small Prolog". The bug is
located on the line number 134 in file pralloc.c.
Str_mem = os_alloc (heap_size);
should be replaced with:
Str_mem = os_alloc (str_size);


New Releases




CUG327 Panels for C


Contributed by J. Brown (KS), a shareware package, Panels for C is a
collection of user interface routines (windows and menus) for IBM PC. Unlike
other window libraries, screen fields and attributes that are defined in an
ASCII text file are interpreted at runtime. Thus, fine-tuning user interfaces
is possible without recompiling the program. The distribution disk includes a
small model object code for Microsoft C, and demo C source and executable
code. To obtain the C source code, please contact the author at BC Systems,
P.O. Box 781202, Wichita, KS 67278.


CUG328 WTWG


David Blum (CA) has submitted a shareware package, WTWG, with routines for
Window Text mode or Window Graphics mode 1.0. WTWG provides drawing boxes,
overlapping windows, mouse-selectable buttons, scroll bars, save/restore
screens, text/graphics mode operations, transparent integration of mouse and
keyboard (the mouse is just another key.), virtual memory system using
expanded memory, RAM or disk space, pulldown and popup menus,
context-sensitive help, programmer-definable hot keys, keyboard macros (record
or playback keystrokes), onscreen clock in either text or graphics mode,
simple mouse-driven drawing, screen-saver, data-entry forms, and data
validations. The disk includes Turbo C small and large model libraries and
Microsoft C medium model library, demo C source and project/batch files,
utilities for on-line help, keyboard macros, and file manipulation, and
documentation. To obtain the C source code, please contact the author at 1710
Glyndon Ave., Venice, CA 90291.



CUG329 UNIX Tools for PC


This volume contains a collection of submissions. Most of the programs were
derived from some UNIX commands and rewritten to compile under MS-DOS or OS/2.
The distribution disk includes all the C source code.
Robert Jr. Artigas (TN) has ported UNIX utilites, cat (concatenate files), cut
(cut out selected fields of each line of a file), paste (merge same lines of
several files or subsequent lines of one file), tr (transliterate characters),
wc (word count), vis (visual display of files) and egrep (regular expression
matcher search utility) to MS-DOS and OS/2 environments. egrep uses regular
expression functions developed by Henry Spencer (Canada).
Martin D. Winnick (CA) has modified a cross-referencer, XC by Philip N. Hisley
(MD), using Microsoft Quick C. The program is now called XCXREF and processes
more symbols from input text.
Arkin Asaf (Israel) has contributed cflow, define and dprintf. cflow is a
program that displays a function dependency tree from input C source files.
The program doesn't preprocess and parse the code, but it does a good job of
displaying the function dependency tree. It even distinguishes between
function definition and function declarations. define constructs C definitions
from a plain English description. For example, from a sentence, foo is
function returning pointer to array of int., define ouputs int (*foo())[].
dprintf is the implementation of an ANSI standard C function, printf. It is
highly portable and expandable.
Henri de Feraudy (France) has submitted a string substitution utility, csubst.
csubst extracts strings or substitutes strings in C source code. The string
extraction helps create a substitution table that can be used for
substitution. The program was developed with the help of lex (lexical
analyzer). Makefiles for Turbo C, Mark Williams C, Zortech C, and QuickC are
included as well as the C source files.


CUG330 CTask


CTask, v2.1, contributed by Thomas Wagner (West Germany), is a set of routines
that allows your C program to execute functions in parallel, without you
having to build in sophiscated polling and switching schemes. CTask handles
the switching of processor time with a priority-based, pre-emptive scheduler.
It provides a fairly complete set of routines for inter-task communication,
event signaling, and task interlocking. CTask also includes a number of
drivers for MS-DOS which build on the basic functions that allow you to
include serial I/O, printer buffering, and concurrent access to MS-DOS
functions into your programs with little programming effort. To compile CTask,
Microsoft C v5.1 or later, or Turbo C v2.0 or later are required. Microsoft
MASM 5.1 or later, or TASM 1.01 or later is required for the assembly parts.
The disk includes well-written documentation, C and assembly source code,
library modules for Microsoft C and Turbo C, make files, and sample
application source code.

















































We Have Mail
Dear Robert:
I was pleased to see Bob Withers' article on "Unnamed Pipes," and the example
program ROUTEMSG, published in the July 1990 issue. This is the first public
domain software I have seen for OS/2 and I think some points are worth noting:
OS/2 offers functionality not easily available in DOS.
Availability of, and magazines being prepared to publish, PD software
represent a form of 'coming of age' for an operating system.
It will require many thousands of man hours before OS/2 is as thoroughly
understood as DOS. A few programmers are riding their first bicycles around
the OS/2 API.
Bob presented a clever program which achieves the rare status of being both
technically innovative and genuinely useful. More on OS/2 please.
Yours faithfully,
Richard Howells
Farmarsh House, Cotmarsh,
Broad Town, Swindon,
SN4 7RA,
United Kingdom
I think your appraisal that "a few programmers are riding their bikes around
the OS/2 API" is right on target. Our most recent surveys indicate that most
programmers are not yet ready to take the OS/2 leap. As more working
programmers begin to mount the learning curve, we'll run more OS/2 coverage.
-- rlw
Dear Mr. Ward,
Regarding David Grey Stahl's letter in the August 1990 issue of The C User's
Journal, I would like to offer some information which might help clarify some
things.
Unfortunately, I don't use MSC, but I did spend the better part of a day
investigating how Turbo C handles its malloc() and free() routines. I went to
the heart of the matter by single stepping through some malloc() and free()
calls using Turbo Debugger. Perhaps Mr. Stahl or another MSC user could do the
same with Code View.
The first thing to note is that each malloc() ed block of memory is
immediately preceded by a bookkeeping header. In the small data memory models,
this consists of a near pointer and an unsigned integer, for a total of four
bytes. In the large data models, this is a far pointer and an unsigned long,
for a total of eight bytes. The pointer points to the preceding contiguous
block of memory, and the unsigned holds the size of the block, as well as a
bit indicating whether the block is free or allocated.
In small data models, see Figure 1. These values must be addressable in the
segment, so the maximum usable data in a segment is limited by the size.e of
the overhead block. There is also another limitation. To ensure that a pointer
returned by malloc() is aligned to hold any structure, as well as to make the
whole memory allocation more efficient, memory is allocated in blocks of eight
in small data models, and in blocks of 16 in large data models. From now on, I
will talk about small data models, but the same holds true in large data
models. Simply substitute the proper sizes.
A request for one byte is really a request for five bytes when the header is
included. The next larger blocksize is eight, so you are returned a block of
eight bytes. A request for two bytes (really six) also results in the
allocation of an eight-byte block. A request for five bytes is really asking
for nine bytes, resulting in the allocation of 16 bytes.
Since malloc() takes an argument of type size t, and size t is typedefed to
unsigned int, the maximum requestable size is 65535 bytes. Out of a possible
65535 bytes, it is clear that seven bytes cannot be allocated (65535 == 8191 *
8 + 7). Also, out of the 65528 available bytes, we lose four to the header,
bringing us to a grand total of 65524 usable data bytes. This number drops to
65512 in large data models (65535 = = 4095*16 + 15, so 65520 available, less
eight bytes header overhead).
Unfortunately, requests for sizes in the limbo range above the maximum usable
size but below 65535 can succeed, but are obviously not safe to use. The data
starts four bytes into the segment, and trying to address the high bytes can
lead to segment wrap-around -- a very dangerous prospect, since writing to a
wrapped address will surely destroy the header information, and the system
will likely crash on the next malloc() or free().
The maintenance of the heap is actually very clever, while at the same time
very simple. The runtime library has three pointers it uses to maintain the
doubly-linked list of memory blocks: first, last, and rover. first points to
the first allocated block, last points to the last allocated block, and rover
points to the first freed block. I did say that the list was doubly-linked.
The header contains a pointer to the previous block. But it also contains the
size of the block. Knowing the address and the size of a structure, we can
easily obtain the address of the memory which follows it by simply adding the
base address and the size.
But what about the flag value? Well, if you recall that blocks are allocated
in blocks of size 8, it should be clear that the three low-order bits of the
size will not be used. Turbo C uses the low-order bit as a flag indicating
whether the block is allocated or free. It simply sets the bit at allocation,
and resets it to 0 when the block is freed. It masks off the bit to correctly
calculate addresses. When malloc() receives a request the following happens:
1. If the size of the request, n, is 0, return NULL.
2. Determine the proper allocation size by adding 12 and anding off three low
bits. for example, let n = 21 or 0x0015. Add 12, 0x0015 + 0x000C = 0x0021. And
off low-order bits, 0x0021 & 0xFFF8 = 0x0020. So, we will need to get 32
bytes.
3. Attempt to allocate the memory. If rover is NULL, we simply see if we can
increase the heap. If we can, we do so, and set the block's prev-pointer to
last (the old heap top). Then we set last to point to the new heap top. If
rover is not NULL, there are freed blocks to investigate. Starting at the
block pointed to by rover, we walk the list of blocks until we find a free
block large enough to hold the request. This block is divied up, and we use
part of it while the rest goes back as free memory. If no freed blocks are
large enough, we fall back and try to expand the heap as described above. The
prev-pointers and size/flag bytes of all blocks involved are adjusted as
necessary.
If after all this we cannot get the memory we return NULL, otherwise we return
the address of the allocated block's data area.
A call to free() is a little simpler.
1. If we try to free NULL, simply return.
2. Mark the block as freed by resetting the flag bit.
3. If the block following this block is marked as freed, add its size to the
size of the current block.
4. If the block preceding this block (via the prev-ptr in the header) is
marked as freed, add the size of the current block to the preceding block.
5. If the resultant block is below the block pointed to by rover in the heap,
make rover point to the resultant block.
In Turbo C there are memory models which have both near and far heaps, and the
above is duplicated. That is, there is a doubly-linked list of block in the
near heap, and a second, separate list of blocks in the far heap. Calls to
malloc() result in near heap blocks, and calls to farmalloc() result in far
heap blocks. In the compact, large, and huge models, all blocks come from the
far heap, with calls to malloc() being re-routed through formalloc(). TurboC
uses the variables first, last, and rover to maintain the near heap, and
first, last, and rover to maintain the far heap.
While all of this is specific to TurboC, I am confident that Microsoft employs
a similar scheme.
In closing, I'd like to say that although the segmented Intel architecture
makes pointers and dynamic memory management tricky, there are a few caveats
worth remembering. Most important among these is understanding what the
compiler is doing. This includes knowing the effects of the memory model in
use, the limitations of operand sizes and sizes of intermediate results, and
the results of the standard type conversions. Nearly every time someone says
that there is a bug in the compiler or the libraries, it can be tracked down
to the programmer not being aware of what he or she is really asking the
compiler to do.
Sincerely,
Michael S. Percy
420 Gallo Way
grimlok@hubcap.clemson.edu
Rather than guess, how about we let Microsoft explain it. See the next letter
please. -- rlw
Dear Mr. Ward:
First off, let me congratulate you on a fine magazine, one that I read with
great interest each month.
I am writing in response to the letters you published from Jim Schimandle
(February 1990) and David Grey Stahl (August 1990), regarding the behavior of
malloc in a large model Microsoft C5. 1 program (also applicable to Microsoft
C6.0).
The Microsoft C heap manager uses a sub-allocation scheme to minimize the
number of calls made to DOS for memory, because getting memory from DOS is
relatively slow. When malloc is first called, the heap manager asks DOS for an
initial amount of heap space that is typically (for small blocks) much larger
than the amount requested. Subsequent malloc calls are allocated out of this
extra space, until it is exhausted. Then DOS must be called again to grow the
segment. When the heap manager grows a segment, it rounds up the amount
required to the next larger multiple of the global variable _amblksiz whose
default value is 8K. When the segment can no longer be grown, a fresh segment
is requested from DOS, and the process continues. This is described on page 33
of the Microsoft C5.1 Runtime Library Reference (and in the Microsoft C6.0
online help).
Mr. Schimandle's program allocates a series of 48,000 byte blocks. On the
first call to malloc, a segment of 48K bytes is requested from DOS. On the
second call to malloc, the heap manager sees that it cannot grow the segment
to accommodate the second request because 48K + 48K is greater than 64K, the
maximum size of a segment. Hence, it must request a new segment of 48K bytes
from DOS. This continues until memory is exhausted. The program then calls
free to return these blocks to the heap manager.
The second half of Mr. Schimandle's program now tries to allocate a series of
60,000 byte blocks. The heap manager finds that none of the segments it owns
are large enough to satisfy the request and none of them can be grown (except
possibly the last one). Accordingly, it asks DOS for a new 60K segment. But
DOS does not have a segment this large left to give, and so the request fails.
If a program's memory requests are all large, then it may very well be
reasonable to go to DOS directly for every allocation and free. The Microsoft
C runtime library provides two ways to do this:
1. Replace the calls to malloc and free with calls to _dos_allocmem and
_dos_freemem.
2. Include malloc.h and replace the calls to malloc and free with calls to
halloc and hfree. These calls have somewhat more overhead because they are
designed for allocations of greater than 64K.
In Microsoft C6.0, the program could call _heapmin before the second series of
calls to malloc. The call to heapmin would cause the heap manager to scan all
of its segments and return to DOS those that do not contain allocated blocks.
As Mr. Stahl points out in his letter, the mechanism used by malloc is a
compromise between speed and fragmentation. I hope this will help shed some
light on the subject.
Sincerely,
Richard Gillmann
Utilities Group Development Manager
Microsoft Corporation

One Microsoft Way
Redmond, WA 98052
I greatly appreciate your response. It's nice to know the compiler vendors are
watching and that they're willing to share important information about their
products with the programmers who depend upon them. -- rlw
Dear Sir:
In testing Rex Jaeschke's assertion that there is no such thing as a negative
constant in C ("Operators And The Precedence Table," C Users Journal, August,
1990) I find that the code generated by one compiler yields the size of an int
when asked for sizeof(--32768) and that the code coughed out by a competing
compiler returns the size of a long. With either compiler, constants more
negative than --32768 are tagged as longs and those less negative but still
less positive than +32768 are sized as ints.
This seems to mean that the compiler that is capable of storing --32768 in an
int-sized object is indeed using a signed format to store the constant. Rex's
comment therefore leads me to ask whether this behavior is just an unusual
practice in C or whether it constitutes a violation of the ANSI Standard.
Although this side issue proved to be interesting, the exposition of the
precedence table in the article was quite valuable to me and I look forward to
future columns by Doctor C.
Sincerely,
Maynard A. Wright
6930 Enright Drive
Citrus Heights, CA 95621
--32768 must be long by ANSI rules.-- pjp
Dear Mr. Ward:
I would like to offer some comments on the code presented in Mr. Felice's
CCITT Cyclical Redundancy Check article in your September issue. First, taking
the complement of the CRC prior to swapping the bytes does nothing for the
algorithm. Second, taking the complement destroys a very elegant and useful
property of the algorithm. The property I am referring to allows that if the
CRC of a buffer is appended to the buffer, the CRC of this new buffer will be
0. This property is often used in communications hardware and software to
easily check the integrity of data at intermediate points. Another handy
change to the code presented would be to allow the CRC of a large chunk of
data to be taken incrementally. This is easy enough to accomplish by adding a
third parameter to the CRC call. This parameter is the starting constant for
the first call. On subsequent calls one would pass in the CRC returned from
the previous call. This is extremely handy on systems with limited memory or
buffer space. I have enclosed a listing of my version of the algorithm, with
examples showing the zero CRC property.
I enjoy your magazine greatly and look forward to Dr. Plauger's term as
editor. Keep up the good work.
Sincerely
David W. Boyd
Sterline Software
1404 Fort Crook Rd
Bellevue, NE 68005
Dear Dr. Plauger:
Congratulations on your appointment as editor! It's a pleasure to read your
"Standard C" column -- technical writing at its best.
Last June I sent Kenneth Pugh the coding question that follows. No answer. I
figure if anyone knows a solution you do. If you or an associate could help me
out I would truly appreciate it:
Using C library functions, how can I show on the screen the ASCII character
for each of these seven control codes or hexadecimal values?
BEL 07 CR 0d
BS 08 SUB la
HT 09 ESC lb
LF 0a
I want to display the seven characters the way printed charts picture them.
But with printf(), for example, I beep, backspace, tab, etc. Help!
Good luck on the new job!
Sincerely,
Dale Wharton
2290 St-Antoine Street
Montreal QC H3J 1A7
Thanks for your kind words. Here are two ways to make nonprinting characters
printable. The first is more robust, since it replaces any nonprinting
character with a printable alternative. You must be content with a (suitably
delimited) hexadecimal alternative:
void printx(int c) {

printf(isprint(c) ? "%c" : "[%.2x]", c); }
If you want to print the names of certain control characters, as your letter
indicates, you have to enumerate them:
void printxxx(int c) {
static char esc[] =
"\a\b\t\n\ r\x1a\x1b";
static char prt[] =
"BEL\0BS\0\0HT\0\0LF\0\0CR\0\0SUB\0ESC";

char *s = strchr(esc, c);

if (s)
printf(" [%s] ", prt [4*(s-prt)]);
else
printf("%c", c);
}
Both of these approaches use more than one column to represent the funny
characters. Unless your display has special graphics that you can map the
characters to, you have to live with such a limitation. -- pjp
Dear Ms. Ward:
Enclosed please find my check for $28 to renew my membership in the CUG and
reinstate my subscription to The C Users Journal. I let my subscription lapse
because the content of your magazine was several levels above my ability in C.
My original membership was opened when I first purchased Borland's
professional language package. At the time my main interest was in Pascal and
I had not had any formal training in C. Since then I have had some formal
training in C through the University of New Haven graduate program in Computer
Science and the content of your magazine as well as the other membership
benefits of CUG are more relevant.
Very truly yours
Alan Donn
Prentice Williams Rd.
Rural Route 2 Box 43A

Stonington, CT 06378
We're glad to have you back. -- rlw
Dear Mr. Ward,
I am a relatively new C Programmer and enjoy your magazine very much. I have
learned something new in every issue.
One of the things that I learned in the August 1990 issue is that a person
cannot believe everything he reads. I am referring to the Book Review on page
125-6 by Harold C. Ogg. He "reviewed" the book C Programmer's Toolkit by Jack
Purdum.
I bought that book two months ago after looking through it in the book store.
My initial feelings were the same as Mr. Ogg's until I tried to compile the
menu functions. The functions are so poorly written that they will not
compile! In my many attempts to compile the code provided on the disk I found
several small errors. For example there were several lines that did not end
with a semicolon. It was easy to fix, but it took time and was annoying.
I also discovered some serious errors. One example is the function read_key().
In the header information for this function it says that the argument list is
void. In the header file menu.h the function is declared with two arguments.
In the function definition it has one argument. When read_key() is called by
the function box_select() it is called with two arguments. When it is called
by the function menu choice() it has one argument.
This is not the only example I could list. This book is hacker's code of the
worst kind. I bought the book to save myself time not to debug code. I get
enough practice on my own code.
I think that Mr. Ogg should have bought the book and tried to use some of the
code instead of reading it in the book store. I am also less willing to
believe any book review I read because of the shoddy work shown in this
review. Your readers deserve better!
Steven Pierce
5249 Woodash Circle
West Valley City, UT 84120
May I suggest you forward the errors you've identified to Jack? I know he
takes his books very seriously and will be concerned to correct them. (We can
forward them for you if you don't have his address.) -- rlw
Dear Mr. Ward,
Thanks for helping me communicate with Jim Hendrix. He shipped his
out-of-print book by priority mail! I've found it helpful, nicely organized,
and am glad to have it; thanks again to you.
In the September issue, "We Have Mail" there is a letter from Bill Sacramone.
He has put many of my thoughts into words.
Now I wonder what I might have missed:
1. In the 1st quarter of 1990 -- Would it be possible to send me photocopies
of the index pages of those three issues? Then I could send $7.50 each for the
intriguing issues (presuming availability).
2. How about those '88 and '89 listings? Would they be useful or make much
sense without the text that accompanied them? Could be interesting to try to
figure out, but there are plenty of those challenges in my own programs!
I think I'll try just the '89 listings for $20 If that works then I'll
consider the '88 listings.
3. What about the earlier years of CUJ? Any plan to publish "The Best of CUJ
Vol.I-V?"
As far as being of use to the group, I am willing but not able. I would be
interested in knowing if there are a few other QL users among your 30,000-plus
subscribers, if so perhaps we could form a QL-SIG within CUG.
About formats; My QL works with its own QDOS operating system using the 68008
chip. It handles disks of 720K using 512 bytes per sector, 9 sectors per
track, 80 tracks per side, and two sides of either 3.5" or 5.25" disks.
I have an emulator/simulator for the QL that allows me to run IBM/PC-DOS
programs from either 360K or 720K disks. My preference is for the 3.5" with
720K. Thus at home I can use small-C on the QL, and Mix-C or Turbo-C on the
QL/IBM/emulator. At the University of Delaware I have access to the computing
facilities. The system there is UNIX on a Vax 8650 until the end of 1990 when
it will be supplanted by UNIX on Sun 4/490s. Actually the conversion has begun
so I hope to convert soon and move my files over to the newer system. As I
understand it my internet address will change from: hlschaaf@vaxl.udel.edu to
hlschaaf-@brahms.udel.edu
I do hope the publisher's style will prevail when the new editor takes over.
Hopefully the publisher will still find time to keep in touch with the readers
?
Thanks again,
H.L. Schaaf
3402 South Rockfield Drive
Wilmington, DE 19810-3229
I'm glad we were able to help. Better than an index to the back issues, we
have a "back issue availabilty" guide that lists the major articles in each of
the issues still in print.
A "Best of CUJ" volume has been on our project list forever. Maybe now that I
don't have to edit, I can make some progress on that.
Don't worry about the publisher's style "prevailing" -- that's what it means
to be publisher. Seriously, I expect you'll be pleased once "pjp" is fully in
harness. I think you'll find he is equally committed to maintaining a dialog
-- and quite possibly better equipped to say something useful. -- rlw
Dear Mr. Ward,
I'm sure you're not used to getting letters from individuals like me. You see,
Mr. Ward, I haven't got the foggiest what the CUJ writes about DOS, or RECS,
or how many BYTES there are in a Big Mac. As far as computers in general
however, I do know that the longer a hacker works at one, the more pain in the
neck he gets. (I hear this from my chiropractor.)
I subscribe to CUJ since Leor joined your staff. He sent me one number at that
time and when I told him that I kind of like it, he said: "We need new
subscribers!" Well, like any parent I'm watching Leor's productivity. I
noticed for example that one CUJ reader (D.J. Omond, of Athelstone, Australia)
decided to continue subscribing to CUJ thanks to Leor's article. ("We have
mail," July 1990).
Incidentally, I just found a few issues of "BDS C Users' Group," which Leor
used to send me. That goes back to 1981, starting with Volume 1 -- Issue 1. It
gives me great pleasure to see the progress of your hard work.
Best wishes.
Sincerely,
Peter Zolman
P.S. I recently renewed my subscription. (Not necessarily because of Leor's
articles). I wonder whether taking into account my age, I shouldn't be
entitled to some kind of a discount. After all, I do get discounts all over
(e.g. Home Improvement Center, a pharmacy, Santa Barbara Music Assoc. Seasonal
Concerts, Time magazine and even at the Burger King.).
Peter M. Zolman
58 Via Alicia St.
Santa Barbara, CA 93108
You're absolutely right -- I'm not used to getting letters from individuals
like you. But it's fun all the same.
We're glad Leor has joined our little group. You have every reason to be proud
of his work.
Sorry, but we don't offer "senior citizens discounts." How about our standard
"employee's father" discount instead? I've told circulation to sign you up for
a complimentary subscription. -- rlw
Dear Mr. Ward:
In the letters section of volume 8, number 8 (Aug. 1990) you requested
recommendations for good technical bookstores. Here is my favorite: Book
Scientific, 18 E16th St., New York, NY 10003, (212) 206-1310. They have a
pretty good selection in a rather small store, and they are willing to order
from the publisher. They accept phone orders, and shipping and handling costs
are minimal (personal checks accepted, though they don't take plastic).
Another bookstore (general, but has a large technical section) which I like
very much is Borders Bookstore in Ann Arbor, Michigan. Unfortunately I don't
have their address handy at the moment.
As others will probably tell you as well, there are the more commonly known
computer book sellers: Cucumber Books, Jim Joyce's Bookstore, etc.
Several book clubs (e.g. McGraw-Hill's Electronics and Controller Engineers'
Book Club, Library of Computer and Information Sciences, etc.) also have a
fairly good selection at discount prices (though they'll get you with the
shipping and handling costs).
Sincerely,
Fuat C. Baran
Columbia University in the City of
New York
University Center For
Computing Activities
UNIX Systems Group
612 West 115 Street

New York, NY 10025
fuat@columbia.edu
Thanks much. I had never heard of Book Scientific. I'll have to drop by if I
ever get to New York again. -- rlw
Dear CUJ:
Charles Havener's "Pricing A Meal: An Object-Oriented Example In C++" (August
1990) ignores the issue of developing C++ programs when expansion by end users
is a requirement.
I assume:
the cafeteria manager is making a one-time purchase of cash-register PC
software,
the program will be used at least daily,
no provision is made for code modification.
In other words, the manager has the ability to point and click to change the
price of an item -- not the ability to declare C++ classes and recompile.
Havener's implementation fails to provide for:
1. A meal consisting of anything other than exactly one appetizer, one entree,
and one dessert.
2. Additional classifications, such as Drink, at the level of
Appetizer/Entree/Dessert.
3. Additional classifications one level lower (e.g. Soup as another
Appetizer).
4. Changes in prices, discounts, or descriptions. As presented, the program
needs to be recompiled if any of these change.
Here is a solution which addresses points 1, 3, and 4 (point 2 is dropped for
lack of space and similarity with point 3). A full implementation would need
to provide more error handling, connection of member functions to a user
interface, etc.
Using simple objects to illustrate C++ concepts is fine. But a realistic
problem such as Havener proposed deserves a more realistic solution. Your
magazine is outstanding and you will be receiving a subscription order from me
very soon!
Wondering How I Missed Volumes 1-7,
Vince Mehringer
3D/EYE, Inc.
2359 N. Triphammer Rd.
Ithaca, NY 14850
Voice: 607-257-1381
Internet: vince@eye.com
Yes, just how did you miss Volumes I -- VII? Believe me; our circulation
promotion people would like to know. -- rlw
Figure 1

Listing 1
#include <stdio.h>

#define POLY 0x8408
#define START_CRC 0xffff

unsigned short crc16(data p, length, start crc)
char *data p;
unsigned short length;
unsigned int start crc;

unsigned char i;
unsigned int data;
unsigned short crc;

crc = start crc;

if (length == 0)
return(˜crc);
do
for (i = 0, data = (unsigned int)0xff & *data p++;
i˜:8;
i++, data >˜= 1)

if (( crc & 0x0001) ˜(data & 0x0001))
crc = (crc >>1) ˜ POLY;
else
crc >>= i;
} while (--length);

data = crc;
crc = (crc << 8) (data >) 8 & 0xFF);

return(crc);

main (argc, argv)
int argc;
˜har *argv[];

unsigned char string[40];
unsigned short crc;
int len;

string[0] = T ;

crc = crc16(string,1,5TART CRC);

printf(" The crc of T is 0x%X.\n",crc);

string[1] = (crc >> 8) & 0xFF;
string[2] = cr-c & 0xFF;

crc = crc16(string,3,5TART CRC);

printf("The crc of T and its CRC is 0x%X\n",crc);

strcpy(string,"THE,QUICK,BROWN,FOX,0123456789");
len = strlen(string);

crc = crc16(string,1en,START CRC);

printf("The crc of %s is 0x%X\n",string,crc);

string[len++] = (crc >> 8) & 0xfF;
string[len++] = crc & 0xFF;

crc = crcl6(string,1en,START CRC);
printf("The crc of %.*s and its CRC is 0x%X\n",len-2,string,crc);




























Debugging With Two Monitors


Art Shipman


Arthur Shipman is a computer consultant working in the upstate New York area.
He has a B.A. in mathematics from Mt. St. Mary College. He writes in
Assembler, BASIC, and C on MS-DOS machines. You can contact him at P.O. Box
390, Westbrookville, NY 12785.


When I graduated from my Commodore 64 to an AT clone, the clone came with a
monochrome monitor and an EGA card. I later upgraded to a Nanao Flexscan color
monitor, leaving me with an unused monochrome monitor. So I picked up a second
video card, and now have a dual-monitor system.
For a long time my monochrome monitor sat idle and blank. Aside from Lotus
1-2-3, I found only two utilities that take advantage of a dual monitor
system. (See the "Utilities For A Second Monitor" sidebar.) Then, as my C
skills improved, I realized that I could easily exploit the second monitor for
debugging my code. All I needed was a set of functions that would write
directly to the second monitor.
The self-contained file mono.c (Listing 1) provides all the functions needed
for constant feedback, status, and debugging reports on the monochrome monitor
of an IBM-compatible while a program is running on the color screen. mono.c
provides two functions comparable to standard C output functions: mono(), a
variant of puts(), and mprintf(), a printf() to the mono monitor.
I created these functions as debuggers, not as normal display functions. This
is a no-frills package. You don't have the option to clear the mono display.
You don't get to position the cursor prior to outputting text. And you can't
use the mono as your primary monitor while sending debug reports to the color
display.The advantage, however, is freedom from display formatting concerns.
The functions are line-oriented, like puts(), sending the equivalent of a
newline after every message printed.
The mono() function contains two static variables that remember the current
screen location on the monochrome monitor. The variable segment holds the base
address at which the monochrome write starts. It is updated by one display
line after every write. Before the write, mono() checks to see if segment
still equals its original value. If so, then this is the first call to mono(),
and the monochrome screen is cleared.
One line on the display consists of 80 display locations, each interposed with
an attribute byte -- a total of 160 bytes per display line. Since incrementing
an address segment by one adds 16 to an absolute address, mono() adds only 10
to the segment for a jump of 160 bytes. Each time the segment increases by 10,
mono sends an apparent newline to the display.
After each mono write, segment's value is increased. The value is then tested
to see whether it is below the bottom line of the screen. If so, mono calls
scroll () and reduces the value of segment to the bottom line of the screen.
scroll() is a static function, visible only from within the mono.c file. Thus,
it won't conflict with any other function of the same name that you may have
in your code. scroll() exists for internal use by the mono() function and
cannot be called directly.
mono() is a debugging function. It offers a means to check your program as it
runs, without interfering with the color display monitor your program is
using. I use it simply to track function calls made within the code I'm
debugging. Each new function I write begins with a call to mono(), passing the
new function's own name as a parameter string. mono() dutifully reports the
function's name on the mono monitor as the new function executes. This
activity in no way interferes with the display on the primary monitor.
I also use mono() to indicate which path has been taken through an if-else
construct. It also works in switch statements, and both within and after loops
of all kinds. However, mono() can only accept a string parameter. It won't
produce an intelligible message that shows variable names and their values.
mprintf() does that.
mprintf() uses C's variable argument macros to construct a string from the
parameters you provide, which it then sends to mono(). mprintf() accepts
parameters in the same form as the standard printf() function. mprintf()
receives the format string and the list of variables, and calls vsprintf() to
plug the variables into the string. A call to mono() then displays it.
Be aware that mprintf() as written allows only 100 bytes of string space. If
vsprintf creates a longer string, memory will be overwritten. But you can even
use these debugging functions to help you discover this memory overwrite.
Inserting a second call to mono in mprintf will report the length of the
string that vsprintf creates.
The file mono.c is self-contained. It includes all the #includes needed to
compile properly. It was written and tested on Turbo C 1.5. To my knowledge,
it uses only standard C. It compiles with no warnings and no errors in the TC
integrated environment with options set to report all problems found.
The file contains a main() function that I used for testing and developing
these debugging routines. main(), however, is commented out by means of the
#if 0 and #endif preprocessor directives. Though not needed by the debugging
routines, main provides examples of calls to the debugging routines to help
you get started.
Utilities For A Second Monitor
A monochrome monitor and mono video card are among the least expensive
hardware options you can add to your computer. On the other hand, desktop
space is precious, and a second monitor takes a lot of space. Cost aside, the
usefulness of the second monitor must outweigh the usefulness of the space the
monitor takes, or the monitor will wind up back in its shipping box.
Functions like mono and mprintf, discussed in the accompanying article,
require a mono monitor. If you come to depend on mono and mprintf for your C
language debugging, then you'll see the need for the second monitor. But how
much of your time on the computer is spent programming? And how much of that
programming time is spent working in C language? If you spend 80 percent of
your computer time programming, and 60 percent of that is in C, then more than
half your computer time is spent on other things. What is the second monitor
for, then?
I have found a couple of utility programs that help to answer this question.
One is a memory resident utility called Ctrlalt. The other is a device driver
that lets me treat the second monitor as just another device. Both of these
utilities are perfect for the two-monitor environment, whether you're using C
or not.
Barry Simon and Richard Wilson have developed and released Ctrlalt. The
developers encourage the sharing of this program. It is a memory resident
utility that takes only 7K of memory, yet offers a host of features. It allows
you to copy the display on the active monitor to the inactive screen. (I use
this feature all the time, keeping a copy of Turbo-C's relevant help screen
handy as I work on my latest error.) It allows you to switch between monitors,
or to blank the inactive unit. It also lets you copy part or all of either
monitor to the printer. Ctrlalt does a great deal more than this; I've just
enumerated some dual-monitor features that I'm familiar with.
Tom Bering has modified Michael Geary's ox.sys program to produce two DOS
device drivers. Either or both of these small files are installed by the
config. sys file. If the Mono-drv file is installed, then output can be
redirected to the mono monitor. For example, from within Turbo-C I use ^KW to
write a marked block of text to a file. Ordinarily, I send it to a file called
temp but I have the option to write to mdrv instead, copying the marked block
to the monochrome display. This works from DOS as well, of course.
Both of these utilities have uses that go far beyond just working in C.
Utilities like these help to justify both the desk space and the cost of a
second monitor. They help to unify the system by making the second monitor
more active. They can make a second monitor so useful that single-monitor
systems will seem inadequate for your needs.
If you can't find these utilities elsewhere, they are available from Public
Brand Software, a shareware distributor. Ctrlalt is on their disk number
UW7.0. The video device drivers are on disk number UH6.0. There is a nominal
charge for each disk, plus shipping. Public Brand's voice phone number is
1-800-426-3475.

Listing 1 (mono.c)
#include <stdio.h>
#include <conio.h>
#include <mem.h>
#include <stdarg.h>
#include <string.h>
#include <dos.h>



void mprintf(char *format, ...);
void mono(char *message);
static void scroll(void);


#if 0
void main(void){
int i;

printf("This is the active screen...");
for(i=0;i<12;i++){
getch();

mono("art was here");
mono(" mono.exe copyright (C) 1989
Art Shipman & Art_Was_Here");

mono("one more message line, and");
mono("another, and");
mono("Art's going now...");
printf("and still active\n");

mprintf("%d plus %d equals %d",2,2,5);
}
}/*main*/
#endif

void mprintf(char *format, ...){
va_list argptr;
char buf[100];

va_start(argptr, format);
vsprintf(buf,format,argptr);
va_end(argptr);

mono(buf);
}


void mono(char *message){
static unsigned segment = 0xB000;
static unsigned offset = 0;

char *blank=" ";
int i; int j;
int l=strlen(message);

if (segment==0xB000) /* clear the screen*/
{
for(i=0;i<25;i++)

{ for(j=0; j<160; j++)
{ pokeb(segment, j++, blank[0] );
pokeb(segment, j, 7);
}
segment+=10;
}
segment=0xB000;
}

for(i=0;i<l;i++) /*display a line*/
{
pokeb(segment, offset++, message[i] );
offset++;
}
segment+=10; offset=0;
if (segment-0xB000 >240)
{
scroll();
segment-=10;
}
}


static void scroll(void){
size_t lin = 80*2;

unsigned segment = 0xB000;
int i;
char *blank=" ";

for(i=1; i<25; i++)
{
movedata(segment,160,segment,0, lin );
segment+=10;
}
for(i=0; i<160; i++)
{ pokeb(segment, i++, blank[0] );
pokeb(segment, i, 7);
}


}















































Using The C Preprocessor For Device Control


Art Mansky


Art Mansky has been involved in software development in C for the past ten
years, primarily for microprocessor-based, realtime applications. He can be
reached at Advanced Technology Department, Vitro Corporation, 14000 Georgia
Ave., Silver Spring, MD 20906.




Introduction


This article discusses the use of the C preprocessor in low-level C
programming. It illustrates the concepts involved with a small example, the
control of a peripheral communications interface chip. One of the most useful
features of C for device control programming is its ability to specify
hardware addresses. A typical device control chip contains a number of control
and status registers.Typically, there are byte-long registers accessed via the
chip address plus an offset. For example, suppose that the address of a status
register is offset by 2 and the address of a control register is offset by 4
from the chip address of 0x5e00. The code below will read the status register
and clear the control register.
statval = *(unsigned char *)0x5e02;
*(unsigned char *)0x5e04 = 0x00;
Casting allows access to the register contents, via the register's address, as
an eight-bit unsigned value.
While the preceding code may do the job, it has a number of problems. If a
hardware modification changes the chip's address, for example, all places in
the code where its registers are accessed must be changed. Also, the casting
operation quickly clutters the program if used repeatedly. These deficiencies
can be remedied by using macros as illustrated later in this article.


Bit Manipulations


A bit is set (given a value of one) by performing a logical OR operation with
a one bit. A bit is cleared (set to zero) by performing a logical AND with a
zero bit. Bits within a register can be set or cleared by first constructing a
"bit mask" and then performing the appropriate logical operation. Following
are two examples.
/* clear high nibble (higher 4 bits) */
regval1 &= 0x0f;
/* set low nibble (lower 4 bits) */
regval2 = 0x0f;
When ANDing, bits that are set to one in the mask will leave the bits in those
positions in the destination unchanged. When ORing, bits that are set to zero
in the mask will leave the corresponding destination bits unchanged.
Note that this bit manipulation technique performs a read operation (to obtain
the current value) as well as a write operation (to change the value). On many
controller chips, when a particular address is accessed via a read operation,
it provides a status byte. When that same address is written to, it serves as
a control byte. These are separate and distinct functions. In this case, the
bit modification techniques described above cannot be used, since a change to
a register performs both a read and a write. In such cases, a C variable is
set aside to hold the value written to the control register. This value can
then be modified and the new value re-written to update the register itself.
/* clear upper nibble */
ctrlreg &= 0x0f;
*(unsigned char *)0x5e04 = ctrlreg;
Simple Boolean logic (such as this) and the ability of C to access specific
hardware addresses, let you manipulate the bits in a chip's registers and thus
control the chip's operation.


An Example Chip


To illustrate the use of the C preprocessor's macro facility for device
control, I will write some sample code to control an actual chip. One of the
most useful pieces of device control hardware is a multi-function input/output
controller, typically containing timers, interrupt controllers, and a
Universal Synchronous/Asynchronous Receiver-Transmitter (USART). Such a chip
performs a variety of functions. For example, it might generate vectored
interrupts based on I/O conditions, provide handshaking interrupt lines,
generate periodic timed interrupts, measure elapsed time, supply baud rate
clocks for serial communication, and support a serial I/O channel.
The Motorola MC68901 Multi-Function Peripheral (MFP) is a member of the 68000
family of peripherals. The MFP has eight individually programmable I/O pins
with interrupt capability, a 16-source interrupt controller, four timers, and
a single-channel full-duplex USART. Each of the 24 registers is directly
addressable via five register-select lines. Combined with the MFP chip-select
line, these form the unique address that allows access to each register.
Because the MFP is so versatile, I will use only a small portion of its
capabilities in the examples. In particular, I won't use its USART. In the
examples, I assume that I have a 2 MHz timer clock input to the MFP and that
three hardware devices (numbered 1 through 3) are connected to the general
purpose input/output register at bits 3, 0, and 7, respectively.


Definitions And Macros


The file mfp_defs.h, shown in Listing 1, contains the definitions for the MFP
register addresses. I have also included the definitions that specify how the
three devices are connected to the I/O lines. Only the definition for MFP, the
chip address itself, is a true fixed hardware address. Its register addresses
are all defined as offsets from that. How the chip select pins are connected
on the address bus to the microprocessor CPU determines the MFP address. By
defining the register addresses using the MFP address, a simple one-line
change (to the MFP definition) will correctly adjust the address definitions
for all of the registers.
Listing 2 shows the file mfp_macs.h, which holds the definitions for the
macros. The first macro, REGVAL, performs the cast operation used to access a
register's contents. The next two macros construct the masks for turning a
single bit on or off. The bitnum parameter should be a number from 0 to 7,
with 0 being the least significant bit in the byte and 7 the most significant.
BITON_MASK creates a mask for turning a bit on, and so creates a byte with a
one bit in the desired position and all zero bits elsewhere. This will then be
ORed into the destination to turn on the bit in that position. BITOFF_MASK
creates a mask for turning a bit off, and so creates a byte with a zero bit in
the desired position and all one bits elsewhere. This will then be ANDed into
the destination to turn off the bit in that position.
The next two macros in Listing 2, SET_BITS and CLR_BITS, are the bit-wise OR
and AND of a given bit mask (such as one created by one of the previous two
macros) into a destination location. The last two macros combine the concepts
of the previous ones, to create a pair of bit-set and bit-clear macros. Each
takes a byte address and sets or clears the requested bit in that byte.
Let's examine some example C routines. (Although each is written as a separate
routine, the code that forms the routine body would most likely appear in real
life as a segment of code in a larger routine.) In the first example, shown in
Listing 3, I want to clear a bit in the general purpose data direction
register. This specifies that the corresponding line in the general purpose
I/O register will now be an input line. For this example, we assume that bit 3
is the line in the GPIO that we want to be an input line (connected to a
hypothetical device number 1) so bit 3 is the bit that we must clear in the
data direction register. To emphasize how large a role personal style plays, I
have coded this task in four ways. Although each of the four lines of the
dev_input routine looks different, each compiles to the same assembly language
instruction. I could create a macro SET_DEV1_INPUT that takes no parameters
and is defined as one of the four macros in the example (or any of a number of
other equivalents). While macro use is encouraged, it can be carried too far.
In the second example, shown in Listing 4, I am working with the same
register, but now I want to set three lines as outputs. I have performed this
twice so that I can see alternatives. The first method is to use the existing
macro for setting a bit three times in a row, once for each bit. In the second
method, I have constructed the appropriate bit mask for setting all three and
just use the "set bits using a mask" macro. The first method results in less
efficient object code. (Or maybe not. In some microprocessors, three "bit set"
operations might be be quicker than one OR operation.) However, in general,
creating a special bit mask for one particular situation may be a bad idea.
Later, if there is a need to change which bits are manipulated, the new mask
must be correctly determined. Considering the chances of getting the mask
wrong in the first place, the best approach is the simple but slightly less
efficient one.
In the example routine in Listing 5, I have a delay routine that uses timer B
of the MFP. What is not shown here is the timer B interrupt service routine,
which increments the global variable stimer and returns. We want a routine
that takes as input the time to delay in units of seconds. However, the timer
clock input is a 2MHz clock. Since the timer countdown is only an eight-bit
value, we combine it with the MFP timer's built-in ability to prescale the
count to obtain a timed interrupt every one-hundredth of a second. (A prescale
of 200 combined with a countdown of 100 results in the clock being divided by
a factor of 20,000. 2MHz/20,000 yields 100Hz.) By comparing the stimer
variable with 100 times the requested time value, we achieve the desired delay
in seconds.
I use this delay routine in my final example, shown in Listing 6. The reset
line for device number 3 is connected to the general purpose I/O register's
bit 7. To ensure that the device is initialized, its reset line must be held
high (value of 1) for at least 30 seconds. This routine sets the reset line
high, delays for 30 seconds using the routine from the previous example, and
drops the reset line low (value of 0). Note that through the use of macros and
our delay subroutine, the code for this reset operation is very simple and
readable. It clearly tells how the reset operation is done, without cluttering
the code with all the details. Anyone interested in them can examine the
constant definitions at the top of the routine and the constants and macros as
defined in the included files mfp_defs. h and mfp_macs. h.



Conclusion


Deciding where and how to use the C preprocessor's macro facility may be a
matter of individual style. However, not using macros will result in
hard-coded constants and repeated small patterns of source code (such as the
cast operation in the example). This source code is difficult to understand
and maintain and has a cluttered appearance. These deficiencies often apply to
C source code for applications that interact with hardware devices,
applications that seem to require code that is less structured than that found
in higher-level applications.
As with any good tool, however, the macro can be overused. You can create a
main routine that is just one macro. You would then need to examine the
definition for that macro and all of the macros that it uses, and so on, to
eventually see the actual C source statements. The most useful macros are
often those that can be frequently applied in the source code. If a macro
cannot be used in several places, you might consider removing it. Exceptions,
of course, would include macros that are infrequently applied but greatly
clarify the source code.
Device control software is often difficult to write, is not as well-structured
as code for higher-level applications, and is hard to understand and maintain.
Careful use of the C preprocessor when writing device control software aids in
developing code that is well-structured, readable, and easy to maintain. Its
use can make this frequently difficult software task easier, and more
enjoyable, too.


Bibliography


Arnold, Ken. "Fun With the C Preprocessor," UNIX Review, April, 1988.
Gehani, Narain. C: An Advanced Introduction, Computer Science Press, 1985.
Kernighan, Brian W. and Dennis M. Ritchie. The C Programming Language,
Prentice-Hall, 1978.
Motorola. MC68901 Multi-Function Peripheral Data Sheet, 1984.

Listing 1 (mfp_defs.h)
#define MFP 0x80000 /* 68901 MFP address */
#define GPIO (MFP+0x01) /* General Purpose I/0 Register */
#define AER (MFP+0x03) /* Active Edge Register */
#define DDR (MFP+0x05) /* Data Direction Register */
#define IERA (MFP+0x07) /* Interrupt Enable Register A */
#define IERB (MFP+0x09) /* Interrupt Enable Register B */
#define IPRA (MFP+0x0b) /* Interrupt Pending Register A */
#define IPRB (MFP+0x0d) /* Interrupt Pending Register B */
#define ISRA (MFP+0x0f) /* Interrupt In-Service Register A */
#define ISRB (MFP+0x11) /* Interrupt In-Service Register B */
#define IMRA (MFP+0x13) /* Interrupt Mask Register A */
#define IMRB (MFP+0x15) /* Interrupt Mask Register B */
#define VR (MFP+0x17) /* Vector Register */
#define TACR (MFP+0x19) /* Timer A Control Register */
#define TBCR (MFP+0x1b) /* Timer B Control Register */
#define TCDCR (MFP+0x1d) /* Timers C & D Control Register */
#define TADR (MFP+0x1f) /* Timer A Data Register */
#define TBDR (MFP+0x21) /* Timer B Data Register */
#define TCDR (MFP+0x23) /* Timer C Data Register */
#define TDDR (MFP+0x25) /* Timer D Data Register */
#define UCR (MFP+0x29) /* USART Control Register */
#define RSR (MFP+0x2b) /* USART Receiver Status Register */
#define TSR (MFP+0x2d) /* USART Transmitter Status Register */
#define UDR (MFP+0x2f) /* USART Data Register */

/* input/output device line designations */
#define DEVICE_1 3 /* device 1 - GPIO line 3 */
#define DEVICE_2 0 /* device 2 - GPIO line 0 */
#define DEVICE_3 7 /* device 3 - GPIO line 7 */


Listing 2 (mfp_macs.h) Macro Definitions
/* cast operation for register references */
#define REGVAL(x) (*(unsigned char *)(x))

/* create bit mask for one bit on/off */
#define BITON_MASK(bitnum) (1 << (bitnum))
#define BITOFF_MASK(bitnum) (~(1 << (bitnum)))

/* use given mask to set/clear bits in given byte */
#define SET_BITS(byte,mask) REGVAL(byte) = (mask)
#define CLR_BITS(byte,mask) REGVAL(byte) &= (mask)


/* set/clear given bit number in byte */
#define BIT_SET(byte,bitnum) REGVAL(byte) = (BITON_MASK(bitnum))
#define BIT_CLR(byte, bitnum) REGVAL(byte) &= (BITOFF_MASK(bitnum))


Listing 3
/* clear bit 3 in the data direction register */
/* to make line 3 (from device 1) in the general */
/* purpose I/O register an input line. Just for */
/* fun, do it four different, but equivalent, ways. */

#include "mfp_defs.h" /* MFP address definitions */
#include "mfp_macs.h" /* MFP macro definitions */

#define BIT3_MASK 0xf7 /* mask for clearing bit 3 */


dev_input()
{
*(unsigned char *)DDR &= BIT3_MASK;

REGVAL(DDR) &= BITOFF_MASK(DEVICE_1);

CLR_BITS(DDR,BITOFF_MASK(DEVICE_1));

BIT_CLR(DDR,DEVICE_1);
}


Listing 4
/* set bits 0,3, and 7 in the data direction register */
/* to make those lines (connected to devices 2,1, and 3, */
/* respectively) output lines in the general purpose I/O */
/* register */


#include "mfp_defs.h" /* MFP address definitions */
#include "mfp_macs.h" /* MFP macro definitions */

#define BIT_MASK 0x89 /* mask for setting bits 0,3, and 7 */


devs_out()
{
/* set them one at a time */
BIT_SET(DDR,DEVICE_1);
BIT_SET(DDR,DEVICE_2);
BIT_SET(DDR,DEVICE_3);

/* set all three at once */
SET_BITS(DDR,BIT_MASK);
}


Listing 5
/* delay for number of seconds requested */

/* 2 MHz clock input */

/* divided by 100 (countdown value) */
/* divided by 200 (prescaler value) */
/* divided by 100 (time value multiplier) */
/* yields counter in increments of one second ! */

#include "mfp_defs.h" /* MFP address definitions */
#include "mfp_macs.h" /* MFP macro definitions */

#define COUNTDOWN 100 /* countdown value */
#define DIV_200 7 /* prescaler 200 indicator */
#define TIME_MULT 100 /* input time multiplier */
#define TIMER_B 0 /* timer B en/disable, bit 0 */

delay(seconds)
int seconds;
{
extern long stimer; /* global counter */

stimer= (long)0; /* zero the counter */

REGVAL(TBDR) = COUNTDOWN; /* init the down counter */
REGVAL(TBCR) = DIV_200; /* prescaler div 200 */
BIT_SET(IERA,TIMER_B); /* enable timer B interrupt */
seconds = seconds * TIME_MULT; /* scale the time value */

/* loop until time expired */
white (stimer < seconds) /* (stimer is incremented by */
; /* an interrupt handler) */

BIT_CLR(IERA,TIMER_B); /* disable timer B interrupt */
REGVAL(TBCR) = 0; /* stop the timer */
}


Listing 6
/* initialize device number 3 by holding its reset */
/* line (GPIO bit 7) high for half a minute */

#include "mfp_defs.h" /* MFP address definitions */
#include "mfp_macs.h" /* MFP macro definitions */

#define HOLD_TIME 30 /* seconds to hold reset line */
#define RESET_LINE 7 /* reset line is GPIO bit 7 */

reset_dev3()
{
BIT_SET(GPIO,RESET_LINE); /* set reset line */
delay(HOLD_TIME); /* and hold it, then */
BIT_CLR(GPIO,RESET_LINE); /* clear it back to zero */
}









































































Writing MS-DOS Device Drivers


Marcus Johnson


Marcus Johnson received his B.S. in math from the University of Florida in
1978 and is currently Senior System Engineer for Precision Software in
Clearwater, Florida. You may reach him at 6258 99th Circle, Pinellas Park,
Florida.




Introduction


This article describes, from my personal experience, the joys of writing
MS-DOS device drivers in C. A device driver is the executable code through
which the operating system can communicate with an I/O device.
Many of the device drivers you use on your MS-DOS system are already part of
the operating system: the basic keyboard/screen (console) driver, the floppy
and hard disk drivers, the serial (COM) port driver, and the printer port
driver.
The drivers that I have written include a RAM disk driver and an ANSI console
driver. They have been compiled under Microsoft C v4.0 and assembled under
Microsoft MASM v5.1. The executable binaries were created with Microsoft Link
v3.64. Certain modifications will have to be made in order for this to compile
under Turbo C.
I wrote much of these drivers in C partly as an exercise and partly to make
the code easier to write, understand, and extend. For sheer speed, assembly
language is still better. But if you aren't that comfortable in assembly
language, what better starting point than the relatively clean, documented,
correct code produced by your compiler?
The significance of installable device drivers, such as provided under MS-DOS,
is that you can interface a device to your system that was not originally part
of it. The relative ease with which you can write a device driver has led to
the proliferation of low-cost peripherals in the MS-DOS environment. Once
written, these drivers are installed by simply creating (or adding to) an
ASCII text file called config. sys in the root directory on your boot disk.
For each device, the config.sys file contains a line that reads
device=filename [options]
where filename is the name of the file containing your device driver, and
[options] are optional instructions for your device driver. Well-known
examples of standard drivers include ansi.sys, the console driver that allows
certain common ANSI escape sequences to be properly interpreted on the screen,
and vdisk.sys, the disk driver that lets you keep files in RAM.
ansi.sys and vdisk.sys represent two of the three device driver types.
ansi.sys is a driver of the first type, a character device driver. It is
intended to handle a few bytes of data at a time, and can handle single bytes.
vdisk.sys is a driver of the second type, a block device driver. It handles
data in chunks whose units are called blocks or sectors.
The third type, a clock device driver, is actually a modified character device
driver. It is easy to write. I have not provided an example since I do not
have clock hardware to test it with.


Device Driver Format


Device drivers must rigorously follow a specific plan. Each must include a
header, a strategy routine, an interrupt routine, and a set of command code
routines. The device driver is typically a memory image file, like a .com
file. The main difference between a device driver and a .com file is that the
.com file starts at offset 0x0100 and the device driver starts at 0x0000.
The device header is the first part of the file. It contains the following
fields:
Link to next driver in the file (2-byte offset plus 2-byte segment)
Device attributes (2-byte word)
Strategy routine (2-byte offset)
Interrupt routine (2-byte offset)
The header for a character device driver is followed by an 8-byte logical name
such as PRN, CON, or COM1. This is the name by which the device is known to
the system. You use it exactly as you would any other named device. The header
for a block device driver is followed by a byte containing the number of units
controlled, followed by seven null bytes.
Note that there can be many device drivers within one file, with each driver
pointing to the next. The last driver in the file uses 0xFFFF for the offset
and segment of the link to the next driver. Thus, when there is only one
device per file, as in my drivers, the link is simply a double word
0xFFFFFFFF.
The device attributes word contains the following fields:
bit 15: set if character device, clear if block device
bit 14: set if I/O control supported
bit 13: for a block device, set if not IBM format; for a character device, set
if output-until-busy call supported
bit 12: reserved
bit 11: set if open/close/removable media calls supported
bits 5-10: reserved
bit 4: set if CON driver
bit 3: set if current clock device
bit 2: set if current NUL device
bit 1: set if current standard output device
bit 0: set if current standard input device
Reserved bits should be zero. Bit 11 has meaning only to block devices and
only under MS-DOS version 3 and up. Bits 0-4 only have meaning to character
devices.
Bit 4 is an oddity. It is referenced in Ray Duncan's Advanced MS-DOS as both a
reserved bit and as "special CON driver bit, INT 29H." Apparently, MS-DOS uses
INT 29H to output characters via the CON driver. It was not until I set the
bit and put a replacement for INT 29H in my code that my console device driver
would work. (A quick tour of my system via DEBUG showed that the unadulterated
INT 29H simply outputs a character in AL through the TTY function (OEH) of INT
10H.)
The strategy routine is a curiosity that, according to Duncan, has no real
functionality under the single-user single-tasking MS-DOS we all know, but
would have some utility in a multi-user multitasking environment. Its job is
to store the request header address, which is in the register pair ES:BX on an
I/O request.
This request header is the means by which MS-DOS communicates with your device
driver. The first 13 bytes of each request header are the same. Later bytes
differ depending on the nature of the command. The common portion of the
request header contains the following fields:
Byte 0: Length of the request header
Byte 1: Unit number (which drive)
Byte 2: Command code
Bytes 3-4: Driver's return status word
Bytes 5-12: Reserved

The command code is used by the interrupt routine to determine which command
to execute. The status word is used by the interrupt routine to give back
status to MS-DOS. It contains the following fields:
Bit 15: Set on error
Bits 10-14: Reserved, should be zero
Bit 9: Set if busy
Bit 8: Set if done
Bits 0-7: Error code if bit 15 set
The error codes returned are:
 0: Write-protect violation
 1: Unknown unit
 2: Drive not ready
 3: Unknown command
 4: Data error (bad CRC)
 5: Bad drive request structure length
 6: Seek error
 7: Unknown medium
 8: Block not found
 9: Printer out of paper
10: Write fault
11: Read fault
12: General failure
13: Reserved
14: Reserved
15: Invalid media change (MS-DOS versions 3 and up)


Command Code Routines


MS-DOS makes the driver initialization call (command code 0) only to install
the device driver after the system is booted. It is never called again.
Accordingly, it is a common practice among writers of device drivers to place
it physically at the end of the device driver code, where it can be abandoned.
Its function is to perform any hardware initialization needed. The request
header for this command code includes the following additional fields:
Byte 13: Return number of units initialized
Bytes 14-17: Return break address (last address in driver)
Bytes 18-21: Pointer to the character on the config.sys line following the
"device=" (block devices return a 4-byte pointer to the BIOS parameter block
array here)
Byte 22: Drive number (A=0, B=1, etc.) for the first unit of the block driver
(MS-DOS version 3 and up)
The BIOS parameter block array contains 2-byte offsets to BIOS parameter
blocks, one for each unit supported. The BIOS parameter block describes
pertinent information to MS-DOS about each unit controlled. It contains the
following fields:
Bytes 0-1: number of bytes per block
Byte 2: blocks per allocation unit (must be a power of 2)
Bytes 3-4: number of reserved blocks (beginning with block 0)
Byte 5: number of file allocation tables
Bytes 6-7: maximum number of root directory entries
Bytes 8-9: total number of blocks
Byte 10: media descriptor byte
Bytes 11-12: number of blocks occupied by a single file allocation table
The media descriptor byte describes to MS-DOS what kind of media is in use.
The following codes are valid for IBM-format devices:
0xF8 fixed disk
0xF9 double sided, 15 sectors
0xFC single sided, 9 sectors
0xFD double sided, 9 sectors
0xFE single sided, 8 sectors
0xFF double sided, 8 sectors
The media-check call (command code 1) is useful for block devices only.
(Character devices should simply return DONE. I will not repeat this warning
for other command codes that you use with only one type of device.) MS-DOS
makes this call to determine whether or not the media has been changed. The
request header for this command code includes the following additional fields:
Byte 13: Media descriptor byte, set by MS-DOS
Byte 14: Media change code, returned by function (-1: media has been changed,
0: don't know whether media has changed, 1: media has not been changed)
Bytes 15-18: 4-byte pointer to previous volume ID (if open/close/removable
media bit in device attributes word was set, the media has been changed, and
we're running MS-DOS version 3 or higher)
If we're using a hard disk or a RAM disk, we know that the media cannot be
changed, and we always return 1. If the media descriptor byte has changed (a
copy of the BIOS parameter block can be found at offset 3 into block 0 of the
media, if the format is IBM), or if the volume label has changed (checked
under MS-DOS version 3 and up), then we know the media has changed, and we
return -1. If the media descriptor byte and the volume label match, we don't
really know (how many unlabelled disks, identically formatted, do you have?),
and we return 0.
The build-BIOS-parameter-block call (command code 2) is useful only to block
device drivers. MS-DOS makes this call when the media has been legally
changed. (Either the media check call has returned "media changed" or it
returned "don't know," and there are no buffers to be written to the media.)
The routine returns a BIOS parameter block describing the media. Under MS-DOS
version 3 and up, it also reads the volume label and saves it. The request
header for this command code includes the following additional fields:
Byte 13: the old media descriptor byte (from MS-DOS)
Bytes 14-17: a 4-byte pointer to a buffer containing the first file allocation
table block. If the non-IBM format bit in the device attributes word is zero,
this should not be altered by the driver; otherwise, it may be used as scratch
space by the driver. I have no idea what purpose this serves.
Bytes 18-21: a 4-byte pointer to the new BIOS parameter block, returned by the
driver
MS-DOS performs the I/O-control-read-call (command code 3) only if the
I/O-control bit is set in the device attributes word. It allows application
programs to access control information from the driver (what baud rate, etc.).
The request header for this command code includes the following additional
fields:

Byte 13: media descriptor byte from MS-DOS
Bytes 14-17: 4-byte pointer to where to write the information
Bytes 18-19: count of bytes or blocks to be written; on return, the count of
bytes or blocks written.
Bytes 20-21: the starting block number (block devices only)
The read call (command code 4) transfers data from the device to a memory
buffer. If an error occurs, the handler must return an error code and report
the number of bytes or blocks successfully transferred. The request header for
this command code includes the following additional fields:
Byte 13: media descriptor byte from MS-DOS
Bytes 14-17: 4-byte pointer to where to write the information
Bytes 18-19: count of bytes or blocks to be read; on return, count of bytes or
blocks successfully read
Bytes 20-21: starting block number (block devices)
Bytes 22-25: 4-byte pointer to volume label if error 15 (invalid media change)
reported (MS-DOS version 3 and up)
The non-destructive-read call (command code 5) is valid only for character
devices. Its purpose is to allow MS-DOS to look ahead one character without
removing the character from the input buffer. The request header for this
command code includes the following additional field:
Byte 13: the character
The input-status call (command code 6) is valid only for character devices.
Its purpose is to tell MS-DOS whether or not there are characters in the input
buffer. It does so by setting the busy bit in the returned status to indicate
if the buffer is empty. An unbuffered character device should return a clear
busy bit; otherwise, MS-DOS will hang up, waiting for data in a nonexistent
buffer! This call uses no additional fields.
The flush-input-buffers call (command code 7) is valid only for character
devices. If the device supports buffered input, it should discard the
characters in the buffer. This call uses no additional fields.
The write call (command code 8) transfers data from the specified memory
buffer to the device. If an error occurs, it must return an error code and
report the number of bytes or blocks successfully transferred. The request
header for this command code includes the following additional fields:
Byte 13: media descriptor byte from MS-DOS
Bytes 14-17: 4-byte pointer to where to read the information
Bytes 18-19: count of bytes or blocks to be written; on return, count of bytes
or blocks successfully written
Bytes 20-21: starting block number (block devices)
Bytes 22-25: 4-byte pointer to volume label if error 15 (invalid media change)
reported (MS-DOS version 3 and up)
The write-with-verify call (command code 9) is identical to the write call,
except that a read-after-write verify is performed, if possible.
The output-status call (command code 10) is used only on character devices.
Its purpose is to inform MS-DOS whether the next write request will have to
wait for the previous request to complete by returning the busy bit set. This
call uses no additional fields.
The flush-output-buffers call (command code 11) is used only on character
devices. If the output is buffered, the driver should discard the data in the
buffer. This call uses no additional fields.
MS-DOS makes the I/O-control-write call (command code 12) only if the
I/O-control bit is set in the device attributes word. It allows application
programs to pass control information to the driver (what baud rate, etc.). The
request header for this command code includes the following additional fields:
Byte 13: media descriptor byte from MS-DOS
Bytes 14-17: 4-byte pointer to where to read the information
Bytes 18-19: a count of bytes or blocks to be read; on return, the count of
bytes or blocks read
Bytes 20-21: the starting block number (block devices only)
The open call (command code 13) is available only for MS-DOS version 3 and up.
MS-DOS makes this call only if the open/close/removable media bit is set in
the device attributes word. This call can be used to tell a character device
to send an initializing control string, as to a printer. It can be used on
block devices to control local buffering schemes. Note that the predefined
handles for the CON, AUX, and PRN devices are always open. This call uses no
additional fields.
The close call (command code 14) is available only for MS-DOS version 3 and
up. MS-DOS makes this call only if the open/close/removable media bit is set
in the device attributes word. This call can be used to tell a character
device to send a terminating control string, as to a printer. It can be used
on block devices to control local buffering schemes. Note that the predefined
handles for the CON, AUX, and PRN devices are never closed. This call uses no
additional fields.
The removable-media call (command code 15) is available only for MS-DOS
version 3 and up, and only for block devices where the open/close/removable
media bit is set in the device attributes word. If the media is removable, the
function returns the busy bit set. This call uses no additional fields.
The output-until-busy call (command code 16) is available only for MS-DOS
version 3 and up, and is called only if the output-until-busy bit is set in
the device attributes word. It only pertains to character devices. This call
is an optimization designed for use with print spoolers. It causes data to be
written from the specified buffer to the device until the device is busy. It
is not an error, therefore, for the driver to report back fewer bytes written
than were specified. The request header for this command code includes the
following fields after the standard request header:
Bytes 14-17: 4-byte pointer to the buffer from which data is to be written
Bytes 18-19: count of bytes to be written; on return, the number of bytes
written.


Designing a Device Driver


Designing a device driver is a relatively simple task, since so much of the
design is dictated to you. You know that you must have a strategy routine and
an interrupt routine that must perform certain well-defined functions. The
only real design decisions are how you choose to implement these functions.
What tasks must be performed in order to implement the functions? What
approaches will you use? Note that some calls only exist under MS-DOS versions
3 and up, or act differently under those versions. Will you use those calls,
will you restrict yourself from using them, or (tricky, but best) will you
write code that finds out the MS-DOS version and acts accordingly?
Coding the device driver is an entirely different matter, and, except maybe
for debugging, the most challenging. Those of us who write C code for a living
are not normally concerned with the underlying implementation of our code in
machine language. We might employ some tricks we have learned about how C is
typically implemented -- using shifts to divide or multiply by a power of 2,
for example -- to get us a bit more speed, but by and large we ignore the
machine interface.
In the world of the device driver, you are forced to think about what you're
really doing at the machine level. If you look at my code, you'll find that I
hardly ever pass parameters from one function to another. I don't use local
variables. Everything's done with global variables. Look at w_putc in the
console driver -- it just cries out to be broken down into smaller functions.
But it isn't, although it was originally written that way. The reason? You
have no stack to speak of, perhaps 40 or 50 bytes. C passes parameters on the
stack, two bytes for each word. C also keeps local variables on the stack, two
bytes for each word again. Each function call eats up at least four bytes of
stack as well. (My C compiler insists on starting every function by pushing
the BP register, preparatory to building a stack frame for the local
variables, whether or not there are any local variables.). All these
contributions add up.
What I ended up doing was learning more assembly language then I ever meant
to. In the early stages, I used the -Fc flag in my compilations to generate a
merged assembly/C listing. That allowed me to examine the code that the
compiler generated from the C I had written. In particular, I had to learn
about how far pointers are implemented to come up with the (char far * far *)
cast used in the ansi_init code to (correctly) load the INT 29 vector. I
learned a few more things, too, but I will discuss those a little later.
Unfortunately, when you're working in a high-level language, you sometimes
"can't get there from here." How do you get the compiler to load certain
specific registers and then make a BIOS call? What statement generates a
return-from-interrupt opcode? You need to preserve the machine state by
pushing all the locally-used registers, and then popping them back off the
stack when you're done. What function will do that? If your compiler allows
in-line assembler code, great. But that's cheating, it isn't standard C. Thus
the assembler interface.
I broke the assembly code for the drivers into two files, main.asm and
vars.inc, plus raw.asm for the console driver and bpb.inc and rdisk.asm for
the block driver. raw.asm performs functions that you just can't do in
standard C. It handles all the BIOS calls, the reading and writing of I/O
ports, the interrupt handlers. bpb.inc defines the standard BIOS parameter
block for the RAM disk. rdisk.asm sets up the boot block, file allocation
table blocks, and the first directory block, complete with clever volume
label. main.asm handles the startup code. Except for the device header, it is
pretty much identical for both drivers. vars.inc sets up the global variables
used.
vars.inc allocates the variables because my C compiler wants to put them in a
segment that gets loaded higher in memory than the program code. This behavior
defeats the practice of putting the initialization code physically last and
passing its own address back as the end of the driver. Also, the assembly
language routines and the C routines could never agree (as I discovered by
examining the code with DEBUG) as to where the variables were in memory until
I put them in the assembly language .CODE segment portion.


Other Lessons


In putting the code together, I learned about a few more switches for the
compiler that I had never used before, by examining the merged assembly/source
files. I didn't want to use any code from the compiler's library. I had no
idea what the library code did internally, and I couldn't risk putting unknown
code into the drivers. Nor could I afford the additional stack usage. Yet
there were calls to a stack-checking routine in every C function. Fortunately,
there is a command-line switch to disable such stack probes.
A more serious problem was that my C code was incorrectly pulling fields out
of the request header, which I had set up as a structure. The problem was that
the compiler aligns the structure fields on int boundaries to minimize access
time to the fields. Unfortunately, I don't have access to the source code for
MS-DOS to make its request header similarly aligned. I did discover, however,
that there is yet another command-line switch to force tight packing of
structures.
One final trick I had to play was to fool the linker into not loading the C
library functions. Even though no reference is made in the source code, the
compiler adds to the object file a reference to a routine called _acrtused. As
it turns out, this is the startup code that processes the DOS command line,
initializes the data area for memory allocation, and calls main. I could not
get rid of the references in the C object, so I named the interrupt routine in
main.asm _acrtused and made it a public name.
Creating the final executable was simple. Using the Microsoft linker, I simply
made sure that main.obj is the first file in the command line and that
init.obj is the last. Object modules are linked together in the order they are
found in the command line. The linker complains of no stack segment, as I
expected, but this is a warning, not an error. Finally, the executable
main.exe is converted to main.bin by exe2bin. The file is now ready for
calling in your config.sys file.
Debugging the device driver is not simple. In its final form, it is ill-suited
for standard debugging tools. Its first bytes, containing the link to the next
device in the driver, are not executable. I found that the best way to debug
the driver was to test each of the interrupt functions as they were written,
attaching stubs to them for testing. Once each of the functions was debugged,
I was ready to tie them into the main.asm interrupt routine.
As Duncan recommends, I copied the test version onto a floppy and booted from
there. For the first three evenings of test, everything I did gave the same
result: the drive would be accessed, then everything would get real quiet,
with the A: drive light shining steadily. Finally, as I explained earlier, I
looked at the code with DEBUG and discovered the discrepancies between where
the strategy routine was placing the pointer to the request header and where
the C routines were looking for it. That problem solved, I booted
successfully, and the drivers tested out to my specs.
I am deeply indebted to the following sources of knowledge while producing
this article: Ray Duncan's Advanced MS-DOS and Peter Norton's The Programmer's
Guide to the IBM PC. These volumes are an indispensable part of my library,
and in great danger of falling apart from use.


Listing 1 (block.c) Main Interrupt Routine
#include <dos.h>
#include "block.h"

/*
* normalize()
*
* normalize() guarantees that the offset portion of a far
* pointer is as small as possible. A complete 20-bit address on
* the processor can be calculated as
*
* (segment * 16) + offset
*
* thus, the offset can be kept to a value between 0 and 15. I
* use the FP_SEG and FP_OFF macro's in Microsoft's dos.h to
* manipulate the segment and offset of the far pointer. If your
* compiler doesn't support such a facility, see the _rawscroll
* routine in RAW.ASM, where I do it in assembly language.
*
* The whole point of this is to allow a lot of pointer
* incrementing, using just the offset, without worrying about
* wrapping around.
*/

static void normalize(p)
int far **p;
{
offset = FP_OFF(*p);
FP_SEG(*p) = FP_SEG(*p) + (offset >> 4);
FP_OFF(*p) = offset & 017;
}

/*
* interrupt()
*
* interrupt() takes care of the commands as they come in from
* the request header. Because of the size of the RAM disk
* buffer, the driver initialization could not be appended to the
* back of the driver, and is in-line like everything else.
*/

void interrupt()
{
command = rh->command;
start = rh->b18.io.start;
count = rh->b18.io.count;
transfer = (int far *) rh->b14.transfer;
switch (command)
{
case 0: /* driver initialization */
source = ram_disk;
FP_SEG(source) = FP_SEG(source) + 0x1000;
normalize(&source);
rh->b14.transfer = (char far *) source;
rh->b18.bpb = bpb_tab;
rh->data = 1;
rh->status = DONE;
break;
case 1: /* media check */

rh->b14.media_change_code = 1; /* disk has
* not been changed */
rh->status = DONE;
break;
case 2: /* build parameter block */
rh->b18.bpb = &bpb;
break;
case 4: /* read */
case 8: /* write */
case 9: /* write with verify */

If (start > MAX_BLK «« count > MAX_BLK ««
start + count > MAX_BLK)
{
rh->status = BLK_NOT_FOUND « ERROR;
break;
}
If (command == 4)
{
source = ram_disk;
normalize(&source);
source += (BLK_SIZE / sizeof(int)) * start;
dest = transfer;
}
else
{
source = transfer;
dest = ram_disk;
normalize(&dest);
dest += (BLK_SIZE / sizeof(int)) * start;
}
normalize(&dest);
normalize(&source);
for (k1 = 0; k1 < count; k1++)
for (k2 = 0; k2 < BLK_SIZE / sizeof(int); k2++)
*dest++ = *source++;
rh->status = DONE;
break;
case 15: /* removable media check */
rh->status = DONE BUSY;
break;
case 5: /* non-destructive read */
case 6: /* input status */
case 7: /* flush input buffers */
case 10: /* output status */
case 11: /* flush output buffers */
case 13: /* device open */
case 14: /* device done */
rh->status = DONE;
break;
case 3: /* ioctl read */
case 12: /* ioctl write */
default;
rh->status = UNKNOWN_COMMAND ERROR DONE;
break;
}
}



Listing 2 (block.h) Common References for Block Device Driver
/*
* status bits for the return code
*/

#define UNKNOWN_COMMAND 3
#define ERROR 0x8000
#define DONE 0x0100
#define BUSY 0x0200
#define BLK_NOT_FOUND 8

#define MAX_BLK 256 /* 256 blocks */
#define BLK_SIZE 256 /* 256 bytes/block */
/*------------- global variables -------------*/

/* the transfer address specified in
the request header */
extern int far *transfer;
/* the count specified in the request header */
extern int count;
/* counter */
extern int k1;
/* counter */
extern int k2;
/* offset for normalization */
extern unsigned offset;
/* source pointer */
extern int far *source;
/* destination pointer */
extern int far *dest;
/* command specified in request header */
extern char command;
/* start block specified in request header */
extern int start;

extern struct parm_block /* parameter block */
{
unsigned bps; /* bytes per block */
char spau; /* blocks per allocation unit */
unsigned nrs; /* number of reserved blocks */
char nfat; /* number of file allocation tables */
unsigned rent; /* number of root directory entries */
unsigned tns; /* total number of blocks */
char mdb; /* media descriptor byte */
unsigned nsfat; /* number of blocks per FAT */
} bpb,
bpb_tab [ ];

/*
* pointer to the request header
*/

extern struct request_header
{
char rlength;
char unit;
char command;
unsigned status;
char reserved [ 8 ];

char data;
union
{
char far *transfer;
char media_change_code;
} b14;
union
{
struct parm_block far *bpb;
struct
{
unsigned count;
unsigned start;
} io;
} b18;
} far *rh;

extern int ram_disk[ ];


Listing 3 (char.c) Main Interrupt Routine; also Keyboard Read Routine
#include "char.h"

/*-------- Prototypes ---------*/

/* handle init call */
extern void char_init (void);
/* look up key code for reassignment */
extern char *k_seek (void);
/* read the keyboard */
extern void rawread (void);
/* see if char is available at the keyboard */
extern int rawstat (void);
/* write byte into ring buffer */
extern void r_write (char);
/* write character to screen */
extern void w_putc (void);

/*
* rd_getc()
*
* rd_getc() reads a character from the keyboard, hanging until
* there is one. If the character has been reassigned, copy the
* reassignment buffer into the ring buffer. Otherwise, write
* the character itself (with leading nul byte for extended
* keys) into the ring buffer
*/

void rd_getc()
{
if (r_index == w_index)
{
rawread();
if (k_seek())
{
for (k = 0; k <*len; k++)
r_write(*ptr++);
}
else

{
if (keycheck & 0177400)
r_write(0);
r_write(((char) keycheck) & 0000377);
}
}
}

/*
* interrupt()
*
* interrupt() takes care of the commands as they come in from
* the request header. Of all the commands, only the device
* initialization call is a separate function; this reduces
* stack overhead. char_init() is a separate function, alone in
* its own module, so that it can report its own address as the
* end of the driver.
*/

void interrupt()
{
count = rh->count;
transfer = rh->transfer;
switch (rh->command)
{
case 0: /* initialization */
char_init();
break;
case 4: /* read */
while (count)
{
rd_getc();
*transfer++ = r_buf[ r_index++ ] & 000377;
r_index &= RLIMIT;
count--;
}
rh->status = DONE;
break;
case 5: /* non-destr uctive read */
if (r_index == w_index)
{
if (!rawstat())
{
rh->status = BUSY DONE;
break;
}
rd_getc();
}
rh->status = DONE;
rh->data = r_buf[ r_index ];
break;
case 7: /* flush input buffers */
r_index = w_index = 0;
while (rawstat())
rawread();
rh->status = DONE;
break;
case 8: /* write */
case 9: /* write with verify */

while (count)
{
outchar = *transfer++;
w_putc();
count--;
}
case 1: /* media check */
case 2: /* build parameter block */
case 6: /* input status */
case 10: /* output status */
case 11: /* flush output buffers */
rh->status = DONE;
break;
case 3: /* ioctl read */
default:
rh->status = UNKNOWN_COMMAND ERROR DONE;
break;
}
}


Listing 4 (init.c) Code for Initializing the Device Driver
#include "char.h"

extern void int29 (void);

void char_init()
{
*((char far * far *) 0x0000A4) = (char far *) int29;
rh->transfer = (char far *) char_init;
rh->status = DONE;
}


Listing 5 (key.c) Routines for Manipulating the Key Reassignment Buffers
#include "char.h"
/*
* k_seek()
*
* k_ seek() finds a buffer based on the global variable
* 'keycheck'. the first match returns a pointer to the
* replacement string; the variable 'len' is also set to
* point to the length field. If no match, then it returns
* a null pointer
*/

char *k_seek()
{
for (kp = &kbuffer[ 0 ], k = 0; k < NKEYS; k++, kp++)
{
if (kp->keystroke == keycheck)
{
len = &(kp->length);
ptr = kp->buffer;
return ptr;
}
}
return ((char *) 0);
}


/*
* k_alloc()
*
* k_alloc() searches for an unallocated key buffer.
* It does so by searching for a zero keystroke field.
* Simple.
*/

char *k_alloc()
{
keycheck = 0;
return k_seek();
}


Listing 6 (ring.c) Ring Buffer Routines
#include "char.h"

/*
* r_write()
*
* r_write() puts a byte in the buffer. when is the buffer full?
* when writing 1 more byte would set the read and write indices
* equal to each other (which means the buffer is empty!!). does
* nothing but return if it can't write the byte without
* overflowing the buffer... if this was a real multi-tasking
* system, we could sleep until somebody reads a byte, which
* would allow us to do our write, but it isn't, so...
*/

void r_write(c)
char c;
{
if (((w_index + 1) & RLIMIT) == r_index)
return;
r_buf[ w_index++ ] = c;
w_index &= RLIMIT; /* wrap the index around */
}

/*
* r_puti()
*
* r_puti() converts a small (0 - 99) decimal number into two
* ASCII digits and put them in the ring buffer
*/

void r_puti(c)
char c;
{
r_write((c / 10) + '0');
r_write((c % 10) + '0');
}


Listing 7 (write.c) Routines Used to Write to the Screen
#includde "char.h"

/*--------- external function prototypes: ---------*/


/* look for unused key buffer */
extern char *k_alloc (void);
/* look for key buffer */
extern char *k_seek (void);
/* write decimal integer to ring buffer */
extern void r_puti (char);
/* write byte to ring buffer */
extern void r_write (char);
/* clear selected part of the screen */
extern void rawclear (void);
/* set the video mode */
extern void rawmode (void);
/* move the cursor */
extern void rawmv (void);
/* scroll the screen up */
extern void rawscroll (void);
/* output character as raw tty */
extern void rawtty (void);
/* output character to screen */
extern void rawwrite (void);

/*
* delimiters used for quoted characters as
* parameters of escape sequences
*/

#define DELIM1 '\"
#define DELIM2 '"'

/*
* characters that require special handling
*/

#define BEL '\007'
#define BS '\010'
#define NL '\012'
#define CR '\015'
#define ESC '\033'

/*
* color codes
*/

#define BLUE (01)
#define GREEN (02)
#define RED (04)
#define CYAN (BLUE « GREEN)
#define MAGENTA (BLUE « RED)
#define YELLOW (GREEN « RED)
#clefine WHITE (BLUE « GREEN « RED)

/*
* macro's for turning on and off attributes or designating
* a color as foreground or background
*/

#define ON(x) (attrib «= (char) (x))
#define OFF(x) (attrib &= (char) (~(x)))

#define FORE(x) (attrib «= (char) (x))
#define BACK(x) (attrib «= (char) ((x) << 4))

/*
* we don't want to use the standard 'c' isdigit(); it's
* either implemented as a function (we don't want to use
* osmebody else's functions that might have unpleasant side
* effects) or a macro invoking an array of values that
* dictate what lexical properties a given character possesses
* (a waste of precious memory)
*/

#define isdigit(x) (((x) >= '0') && ((x) <= '9'))

/*
* w_write()
*
* w_write() keeps track of actually getting stuff on the screen
* and moving the cursor around
*/

void w_write()
{
switch (outchar)
{
case CR:

/* just set the column to 0 for a carriage return */

curr.loc.col = 0;

/* and fall through to the backspace handler */

case BS:

/* decrement the current column unless at the left
* margin */

if (curr.loc.col)
--curr.loc.col;

/* move the cursor and that's it... */

rawmv();
break;

default:

/* first, write the character without
moving the cursor */

rawwrite();

/* then, if we're not on the right margin, bump the
* cursor right and that's it */

if ((curr.loc.col + 1) <= max.loc.col)
{
++curr.loc.col;

rawrmv();
break;
}

/* but if we were at the right margin, check the wrap
* flag; if it's clear, just return. if not, execute
* a carriage return (just set the column to zero -
* we'll do a rawmv() call later), set the current
* character to newline, and fall into the newline
* routine */

if (!wrap)
break;
curr.loc.col = 0;
outchar = NL;
case NL:

/* if we're not at the bottom of the screen, just bump
* the line down and call rawmv() */

if (++curr.loc.line < 25)
{
rawmv();
break;
}

/* but if we were at the bottom (or somehow below!),
* make sure we're on the bottom line. If we're in
* one of the CGA 80x25 text modes, do our fancy
* assembly language scroll routine, else just let
* the BIOS handle it */

curr.loc.line = 24;
if (video_mode == 2 video_mode == 3)
{
rawscroll();
break;
}

case BEL:

/* do a raw tty output; it handles the cursor movement
* too */

rawtty();
break;
}
}

/*
* w_buffer()
*
* w_buffer() writes a byte into the escape buffer. silently
* overwrites the last byte in the buffer if we get that far. it
* was either that or trash the new byte
*/

void w_buffer(c)
char c;

{
if (char_cnt == BUF_LEN)
esc_buf[ BUF_LEN - 1 ] = c;
else
esc_buf[ char_cnt++ ] = c;
}
/*
* w_cursor()
*
* w_cursor() handles the cursor left, right, up and down
* functions. bumps the value by the value of the 1st parameter
* in the escape sequence (if there isn't one, we put a 1 in for
* it) in the direction specified by the delta, until it hits the
* specified limit. then execute the cursor move...
*/

void w_cursor()
{
if (!char_cnt)
esc_buf[ 0 ] = 1;
while (*cur_val != limit)
{
*cur_val += delta;
esc_buf[ 0 ]--;
if (!esc_buf[ 0 ])
break;
}
rawmv();
}

/*

* w_putc()
*
* w_putc() updates the parameters that might have changed since
* last time, then runs the character through the escape sequence
* state machine
*/

void w_putc()
{

/* update parameters */

max.loc.col = SCREEN_WIDTH - 1;
cur_page = CURRENT_PAGE;
curr.position = (PAGE_TABLE [ cur_page ]).position;
if (curr.loc.col > max.loc.col)
curr.loc.col = max.loc.col;
if ((video_mode = CURRENT_MODE) == 7)
video_address = MONOCHROME + SCREEN_OFFSET;
else
video_address = GRAPHIC + SCREEN_OFFSET;

/* process the escape sequence state */

switch (state)
{


case HAVE_ESC:
/* if we have an escape, we want a left bracket.
* if we get it, change the state and return,
* else reset back to the RAW state and fall
* through */

if (outchar == '[')
{
state = HAVE_LBRACE;
break;
}
state = RAW;

case RAW:

/* if it's an escape, change the stae, else output the
* character */

if (outchar == ESC)
state = HAVE_ESC;
else
w_write();
break;

case IN_NUMBER:

/* if it's another digit, roll it into the value. else
* the state falls back to HAVE_LBRACE, and we fall
* through */

if (isdigit(outchar))
{
tmp.value *= 10;
tmp.value += outchar - '0';
break;
}
else
{
state = HAVE_LBRACE;
w_buffer(tmp.value);
}

case HAVE_LBRACE:

/* if we have a string delimiter, change the state and
* save the delimiter */
if (outchar == DELIM1 outchar == DELIM2)
{
state = IN_STRING;
tmp.delim = outchar;
break;
}

/* else if it's 'punctuation', ignore it */

if (outchar == ';' outchar == '=' 
outchar == '?')
break;


/* else if it's a digit, start a number and
change the state */

if (isdigit(outchar))
{
state = IN_NUMBER;
tmp.value = outchar - '0';
break;
}

/* else it terminates the escape sequence, and
* identifies its purpose */

switch (outchar)
{
case 'A': /* cursor up */
limit = 0;
delta = (char) -1;
cur_val = &curr.loc.line;
w_cursor();
break;
case 'B': /* cursor down */
limit = 24;
delta = 1;
cur_val= &curr.loc.line;
w_cursor();
break;

case 'C': /* cursor right */
limit = max.loc.col;
delta = 1;
cur_val = &curr.loc.col;
w_cursor();
break;

case 'D': /* cursor left */
limit = 0;
delta = (char) -1;
cur_val = &curr.loc.col;
w_cursor();
break;

case 'H':
case 'R':
case 'f':

/* set cursor position: make sure there
* are at least 2 parameters stored,
* correct any out-of-range parameters,
* and execute the move. if the
* character was 'R', fall through into
* the report position sequence */

switch (char_cnt)
{
case 0:
w_buffer(1);
case 1:
w_buffer(1);

default:
break;

/* set graphic rendition - just do all
* the parameters and set/reset the
* appropriate bits in the attribute
* byte */

while (char_cnt)
{
switch (esc_buff[ --char_cnt ])
{
case 0:
attrib = 0007; break;
case 1:
ON(010); break;
case 4:
OFF(07); ON(01); break;
case 5:
ON(0200); break;
case 7:
OFF(07); 0N(0160); break;
case 8:
0FF(0167); break;
case 30:
OFF(07); break;
case 31:
OFF(07); FORE(RED); break;
case 32:
OFF(07); FORE(GREEN); break;
case 33:
OFF(07); FORE(YELLOW); break;
case 34:
OFF(07); FORE(BLUE); break;
case 35:
OFF(07); FORE(MAGENTA); break;
case 36:
OFF(07); FORE(CYAN); break;
case 37:
OFF(07); FORE(WHITE); break;
case 40:
OFF(0160); break;
case 41:
OFF(0160); BACK(RED); break;
case 42:
OFF(0160); BACK(GREEN); break;
case 43:
OFF(0160); BACK(YELLOW); break;
case 44:
OFF(0160); BACK(BLUE); break;
case 45:
OFF(0160); BACK(MAGENTA); break;
case 46:
OFF(0160); BACK(CYAN); break;
case 47:
OFF(0160); BACK(WHITE); break;
default:
break;
}

}
break;
case 'p':
if (esc_buf[ 0 ])
{

/* if the first parameter is not nul, then we're
redefining a 'normal' key. Clear the msb of
keyc1heck to indicate this */

keycheck = (esc_buf[ 0 ]) &
0000377;

/* check first to see if we've already allocated
a buffer to this key; then if not, see if we have
an unused buffer to hand out */

if (k_seek() «« k_alloc()))
}

if (!esc_buf[ 0 ])
curr.loc.line = 1;
else if (esc_buf[ 0 ] > 25)
curr.loc.line = 25;
else
curr.loc.line = esc_buf[ 0 ];

if (!esc_buf[ 1 ])
curr.loc.col = 1;
else if (esc_buf[ 1 ] > max.loc.col + 1)
curr.loc.col = max.loc.col + 1;
else
curr.loc.col = esc_buf[ 1 ];

curr.loc.line--;
curr.loc.col--;
rawmv();

if (outchar != 'R')
break;

case 'n':
/* output the position; format is
* "\033[%.2d;%.2dR\015" */

r_write(ESC);
r_write('[');
r_puti(curr.loc.line + 1);
r_write(';');
r_puti(curr.loc.col + 1);
r_write('R');
r_write(CR);
break;

case 'J':
/* rawclear clears the screen from
* (curr.loc.line, curr.loc.col) to
* (max.loc.line, max.loc.col); so for
* clear screen, set the current

* position to the upper left hand
* corner of the screen, and the max
* line to the bottom of the screen */

curr.loc.line = curr.loc.col = 0;
max.loc.line = 24;
rawclear();
break;

case 'K':
/* and clear to end of line is even
* simpler - just set the max line equal
* to the same line we're on */

max.loc.line = curr.loc.line;
rawclear();
break;

case 'h':
case 'l':
/* set and reset mode do the same thing
* unless the mode is 7. easy */

if (!char_cnt)
w_buffer(2);
if (esc_buf[ 0 ] > 7)
break;
if (esc_buf[ 0 ] == 7)
wrap = (char) (outchar == 'h');
else
rawmode();
break;

case 'm':
{
k = 1;
kp->keystroke =
(esc_buf[ 0 ])
& 0000377;
}
else
break;
}
else
{

/* first byte was nul - an extended key. indicate
by setting msb of keycheck to $FF */

keycheck = (esc_buf[ 1 ]) 
0177400;
if (k_seek() «« k_alloc())
{
k = 2;
kp->keystroke =
(esc_buf[ 1 ])
« 0177400; 
}
else

break;
}55

/* copy the parameters into the buffer, counting as we go */

for (*len= 0; (k < char_cnt) &&
(k < KEY_BUFLEN); ++*len)
*ptr++ = esc_buf[ k++ ];

break;

case 's':

/* save current position */

saved.position = curr.position;
break;

case 'u':

/* restore current position */

curr.position = saved.position;
rawmv();
break;

default:

/* anything else? discard the parameters
* and output the final character to the
* screen */

w_write();
break;
}

/* finally, clear the buffer by resetting the count
and fall back to the RAW state */

char_cnt = 0;
state = RAW;
break;

case IN_STRING:

/* finally, the IN_STRING case - if the character
* isn't the delimiter we saved, then put it into
* the buffer as it was received */

if (outchar == tmp.delim)
state = HAVE_LBRACE;
else
w_buffer(outchar);
break;
}
}

Listing 8 (char.h) Common Reference for Character Device Driver
/* This is used for the escape buffer. This is how many

* bytes of parameters the escape() routine can save in one
* sequence. Tune it as you see fit. It needs to be at
* least long enough to hold the two bytes of an extended
* key (such as F1), plus the replacement string: */

#define BUF_LEN 80

/* length of the definition field; tune as you see fit.
* How long a string do you want? */

#define KEY_BUFLEN 21

/* number of re-assignments you can define;
* tune as you see fit. Don't use it much?
* make it less. Redefining the entire keyboard?
* then make it more */

#define NKEYS 20
/*
* parameters for the ring buffer. If you want to
* change the size, just change RLOG - the math demands
* that the size of the buffer be a power of 2. Makes
* things nice and efficient that way.
*/

#define RLOG 6
#define RLEN (2 << (RLOG - 1))
#define RLIMIT (RLEN - 1)

/*
* macros for reading the system RAM where these neat
* things are stored
*/

#define CURRENT_MODE (*(char far *)0x0449)
#define SCREEN_WIDTH (*(char far *)0x044A)
#define SCREEN_OFFSET (*(unsigned far *)0x044E)
#define PAGE_TABLE ((POSITION far *)0x0450)
#define CURRENT_PAGE (*(char far *)0x0462)

/*
* base addresses for the video memory for the
* monochrome adaptor (MONOCHROME) and the CGA (GRAPHIC)
*/

#define MONOCHROME ((char far *) 0x000B0000)
#define GRAPHIC ((char far *) 0x000B8000)

/*
* status bits for the return code
*/

#define UNKNOWN_COMMAND 03
#define ERROR 0x8000
#define DONE 0x0100
#define BUSY 0x0200

/*
* the states of the escape sequence:

*
* RAW: no escape sequence begun yet, or previous
* sequence has been terminated
*
* HAVE_ESC: an escape has been received; now awaiting
* the left bracket
*
* HAVE_LBRACE: an escape followed by a left bracket
* have been received; now waiting for a
* parameter or terminating character
*
* IN_STRING: a parameter beginning with a delimiter has
* been started; until the same delimiter is
* received, characters will be placed in the
* escape buffer as is
*
* IN_NUMBER: a numeric parameter has been started; each
* subsequent digit is 'added' to the number
* in the escape buffer
*/

#define RAW 0
#define HAVE_ESC 1
#define HAVE_LBRACE 2
#define IN_STRING 3
#define IN_NUMBER 4

/* typedef for cursor positioning; this union reflects the way
* that they are stored internally. At the assembly language
* level, 16-bit registers can be loaded directly with the
* 16-bit position so that the high and low halves are
* correctly loaded for BIOS calls
*/

typedef union
{
short position;
struct
{
char col;
char line;
} loc;
} POSITION;
/*
* typedef for the reassignment buffer. keystroke is the key
* being replaced; length is the number of bytes that the
* keystroke 'generates'; buffer holds the data that replaces
* the keystroke.
*/

typedef struct
{
int keystroke;
char length;
char buffer [ KEY_BUFLEN ];
} KEY;


/*----------- global variables -----------*/


/* the current character being output */
extern char outchar;
/* the current video mode */
extern char video_mode;
/* the current screen attribute */
extern char attrib;
/* a count of how many parameter bytes
* have been read into the escape buffer
*/
extern char char_cnt;
/* the code being checked for in
* reassignment routines; if null, it is
* used to find an ununsed buffer; if
* the value is non zero and positive,
* it is used to look up a regular key;
* if non-zero and negative, it is used
* to look up an extended key */
extern int keycheck;
/* the parameter buffer for ansi escape
* sequences */
extern char esc_buf [ BUF_LEN ];
/* the current position */
extern POSITION curr;
/* the maximum position. Actually, the
* maximum line number is not used as
* a maximum, but is used simply to
* tell the clear-screen and
* clear-to-end-of-line code how much screen to
* clear */
extern POSITION max;
/* the current video page number */
extern char cur_page;
/* the current video address. this is
* the base address plus the offset to
* the current page */
extern char far *video_address;
/* the transfer address specified in the
* request header */
extern char far *transfer;
/* the count specified in the request
* header */
extern int count;

/*
* pointer to the request headerd
*/

extern struct
{
char rlength;
char units;
char command;
unsigned status;
char reserved [ 8 ];
char data;
char far *transfer;
unsigned count;
} far *rh;

extern int k; /* generic int */

/* pointer to the length field
* for the selected buffer */
extern char *len;
/* pointer to the definition
* field for the selected buffer */
extern char *ptr;

extern KEY kbuffer[NKEYS]; /* the buffers */
extern KEY *kp; /* pointer to a buffer */

/* r_buf[ r_index ] is the
* next byte to be read */
extern unsigned r_index;
/* r_buf[ w_index ] is where
* the next-byte can be written */
extern unsigned w_index;

extern char r_buf [ RLEN ]; /* ring buffer */

/*
* a temporary variable for storing either the delimiter while
* the escape sequence is in the IN_STRING state, or the value
* being computed if the escape sequence is in the IN_NUMBER
* state. a convenient way of doubling the utility of a byte
* of storage while keeping track of just what we're doing
* with it.
*/

extern union
{
char delim;
char value;
} tmp;
extern char state; /* the escape sequence state */
extern POSITION saved; /* the saved cursor position */
extern char wrap; /* wrap flag: wrap on if set */
extern char *cur_val; /* line or column parameter being
manipulated by w_cursor() */
extern char delta; /* incr/decr to cur_val */
extern char limit; /* limit of *cur_val */





















Buffering Mouse Events


Michael Kelly


Michael Kelly has held a variety of jobs, including auto mechanic and
electro-mechanical technician. For the past three years, he has been engaged
in a program of self-study concerning programming and related subjects.
Joining the C Users' Group stimulated his interest in writing portable code.
He intends to enter the computer field in some professional capacity. Kelly
can be reached at 254 Gold St., Boston, MA 02127.


This article demonstrates a technique for asynchronously buffering mouse
events in a queue. This approach, written in C and assembler, has several
advantages over polling.
First, the mouse driver forwards information as events are detected instead of
when your program finally asks for it. A user can become annoyed with an
application where moving the mouse moves the cursor on the screen, but
clicking a button does nothing. The event queuing approach provides users with
a consistent feel. When a program is ready to process mouse events, it should
activate the handler and turn on the mouse cursor. When the program is busy
for a while, executing a user selection perhaps, it should turn off the mouse
cursor, de-activate the handler, and either free the queue memory (if needed
by your program), or adjust the queue pointers to indicate an empty condition.
Second, capturing all mouse input with one handler makes isolating
system-dependent code easier. You can construct mouse interface functions to
send messages to your portable application code, and change only the mouse
managment module to accommodate other environments.
Third, this technique allows your application code to take on some of the
"event-driven" feel common to many windowing environments. You can test this
paradigm before making an investment.


Implementation Details


The queue is implemented as a circular, singly-linked list, with pointers head
and tail that chase each other around the queue in a manner reminiscent of the
PC BIOS keystroke buffer. I represented the queue as a linked list rather than
as an array of structures to avoid multiplies (a very slow operation on the
8088) inside the mouse handler. Handler exectution disables interrupts, and
the handler is likely to be called many times per second. The queue test
driver program in Listing 1 uses INT 33H, Function OCH to register the mouse
event handler with the mouse driver. The test driver should work with devices
compatible with the Microsoft Mouse Driver Interface. Once you have
initialized the mouse and registered the handler, the mouse driver will
execute a far call to the handler when it senses a mouse event corresponding
to one or more of the event types set during handler registration.
On entry to the handler (Listing 2), the machine registers DX, CX, BX, and AX
contain information about the mouse cursor vertical position, mouse cursor
horizontal position, mouse button status, and type of mouse events in effect
at the time of the call, respectively. Also, the DS register is left pointing
to the mouse driver's data segment, rather than that owned by the application.
These peculiarities, combined with the need for the handler to execute
quickly, dictate that it be written in assembler.
To function properly, HANDLER must know the internal representation of a link
in the queue, be able to access the pointers head and tail, and follow the
same protocol as the function that removes events from the queue. In this
example if tail ->next equals head, the queue is considered to be full and
HANDLER exits. Otherwise, HANDLER stores the mouse event information in the
queue, sets the member variable valid to 1, and advances tail by one link. The
dequeuing function resets the valid member variable to 0 before moving head,
then processes the mouse events.
Listing 3 and Listing 4 define the C functions responsible for queue
construction and destruction. The set_que() function performs memory
allocation, list linking, and initialization of head and tail. Its sole
parameter, que_entries, is the desired number of event entries in the queue.
If successful, set_que returns 1, and head and tail both point to the first
link in the list. If set_que is unable to allocate the requested number of
links, it calls free_que() to release any memory allocated to the queue, and
returns 0.
Listing 1 contains a trivial test driver program. If a mouse is detected, the
driver calls set_que(). If set_que() returns 1, the handler is registered with
all event bits set. The program pokes a character to the screen at the mouse
cursor position (if you're running on a CGA or EGA monitor in text mode 2 or
3) while the user drags the mouse using the left mouse button. It terminates
when the user presses the right mouse button.
Included on the code disk is a small model link library (TC_MOUSE. LIB and
TC_MOUSE. H) containing most of the mouse interface functions. If you use a
compiler other than Turbo C v1.5+ you can use int86() and int86x(), loading
the registers as described in Table 1 to accomplish the same ends.
References
Dettman, Terry, DOS Programmer's Reference, Que Corp., 1988.
Duncan, Ray, Advanced MS-DOS Programming, 2nd Edition, Microsoft Press, 1988.
Table 1 Microsoft Mouse Driver Interface Functions Used In Queue Example
Detect presence of mouse driver:
 INT 33H Function 00H
 On Entry: AX = 0000H
 Returns:
 if (AX == FFFFH)
 BX == number of mouse buttons
 if (AX == 0000H)
 No MicroSoft compatible mouse driver
 in system

Note: This function is also used to reset the mouse driver to
its default state after hiding the mouse cursor (use before
your application exits).

Display mouse cursor:
 Int 33H, function 01H
 On Entry: AX = 0001H
 Returns: nothing

Hide mouse cursor:
 Int 33h, function 02H
 On Entry: AX = 0002H
 Returns: Nothing

Register mouse EVENT handler:
 Int 33H, function 0CH
 On Entry: AX = 000CH
 CX = EVENT mask (see Mouse Event
 Mask below)

 DX = offset portion of EVENT handler address
 ES = segment portion of EVENT handler address
 Returns: Nothing

Mouse Event Mask:
 a 1 in bit position implies:
 0 = mouse cursor movement
 1 = left button press
 2 = right button press
 3 = left button release
 4 = right button release
 5 - 15: unused

Note: If bit zero is set, the handler will be called frequently
during mouse movement, so the queue should be
considerably larger than if handling button activity only.

Listing 1 (testq.c)
/* HEADER: CUG0000;
TITLE: Mouse Event Queue Test Driver;

FILENAME: TESTQ.C;

COMPILER: TURBO C V. 1.5+ Small Memory Model;

REQUIRES: TESTQ.PRJ, MOUSEQ.H, MOUSEQ.OBJ, TC_MOUSE.H,
TC_MOUSE.LIB, DEBUG.H, DEBUG.OBJ, HANDLER.OBJ;



*/

#include <stdlib.h>
#include <stdio.h>
#include <conio.h>
#include <dos.h>
#include "tc_mouse.h"
#include "mouseq.h"

extern void far handler(void); /* the mouse event handler */

#define DEBUG

#include "debug.h"


#define write_char(c) { video_buf = MK_FP(segg,ofset); *video_buf = (c); }


unsigned int far *crt_address;
unsigned int segg,ofset;
unsigned char far *video_buf;

void main(void)
{
int buttons = 0;
int result = 0;
unsigned int qsize = 256;


crt_address = MK_FP(0x40,0x63);

/*
* x and y mouse coordinate conversion to text mode
* row and column mouse cursor location assumes Ega or Cga
* text mode 2 or 3
*
* source must be modified for other video configurations
*/
if(*crt_address == 0x03D4)
segg = 0xB800;
else
err_exit(" Program requires Ega or Cga Monitor");

clrscr();


result = mouse_reset(&buttons); /* INT 33h FUNCTION 00h */
if(! result)
err_exit(" No mouse detected in System!");

/*
* attempt to create EVENT queue with qsize links
*/
if(! set_que(qsize))
err_exit(" Mouse Event Queue Set Up Failure");

printf(
"\n\tDrag Left Mouse Button ... or ..."
" Press Right Button to Quit\n"
);

mouse_show(); /* INT 33h FUNTION 01h */

/*
* INT 33h FUNTION 0Ch -- Register mouse EVENT handler
*
* 0x1f is EVENT mask (all EVENT bits set)
*/
mouse_set_eventhandler(0x1f, handler);

while(! (head->buttonbits & RIGHT_BUTTON)) {
if(head->valid) {
/*
* xcoord and ycoord are in pixels
* ofset calc assumes 640 x 200 (cga or ega text modes 2 & 3)
* you may have to alter for mono
*/
ofset = ((head->ycoord >> 3) * 160) + ((head->xcoord >> 3) << 1);
if(head->buttonbits & LEFT_BUTTON)
write_char('h');
head->valid = 0;
head = head->next;
}
}

mouse_hide(); /* INT 33h FUNCTION 02h */
mouse_reset(&buttons); /* INT 33h FUNCTION 00h */
free_que(); /* destroy the EVENT queue */

clrscr();
}

Listing 2 (handler.asm)
Comment #

extern void far handler(void); /* a mouse event handler */
------------------------------


Environment:

Specific to mice compatible with the Microsoft Mouse Driver Interface

External Variables: _head,_tail: near pointers to links
in a circular queue. Each link has the
form mirrored by the event_q struc defined
below.

even ;use word data alignment
event_q struc ;mirrors EVENT Link defined in mouseq.h
xc dw ? ;x mouse coordinate
yc dw ? ;y mouse coordinate
butstat dw ? ;mouse button status
evmask dw ? ;event mask ( active events at time of entry )
vflag dw ? ;"valid" flag
nextptr dw ? ;pointer to next link
event_q ends


_head and _tail are defined in mouseq.h
and iniitialized in the file mouseq.c,
which is responsible for the creation and
destruction of the queue.

-----------------------------

Mouse Event Handler called as a far function from the mouse driver
when it senses an event for which the event mask has a bit set.


Designed to be linked to a Small Memory Model C Program

This example uses Turbo C V. 2.0

**************************************
On Entry:
AX = bit mask for current events
where a 1 in bit position:
0 = mouse cursor movement
1 = left button press
2 = right button press
3 = left button release
4 = right button release
5 - 15: unused
BX = button status
CX = X mouse coordinate
DX = Y mouse coordinate
**************************************


#



name _handler


_TEXT segment byte public 'CODE'
DGROUP group _DATA,_BSS
assume cs:_TEXT,ds:DGROUP
_TEXT ends
_DATA segment word public 'DATA'
extrn _head:word,_tail:word

even ;use word data alignment
event_q struc
xc dw ?
yc dw ?
butstat dw ?
evmask dw ?
vflag dw ?
nextptr dw ?
event_q ends

sav_ax dw ?
_DATA ends
_BSS segment word public 'BSS'
_b@ label byte
_BSS ends
_TEXT segment byte public 'CODE'

;***********
;Entry Point
;***********

_handler proc far
cli ;disable interrupts to avoid re-entry
push ds ;save registers
push di
mov di,DGROUP ;point DS to application's data segment
mov ds,di
mov sav_ax,ax ;we're called from inside the mouse driver
;so keep stack pushes to a minimum

mov di,_tail
mov ax,[di].nextptr ;get tail->next pointer
mov di,_head
cmp ax,[di] ;if it equals head then queue is full
jne hndlr1
jmp handl_exit ;exit if queue is full

hndlr1:
mov di,_tail ;we sacrifice one slot in queue to simplify
;code, similar to PC BIOS keyboard buffer
;routine

;***********************************
;store mouse event info in the queue

;***********************************

mov [di].xc,cx ;x mouse coordinate from CX reg
mov [di].yc,dx ;y mouse coordinate from DX reg
mov [di].butstat,bx ;mouse button status from BX reg
mov ax,sav_ax ;event mask from AX reg
mov [di].evmask,ax
mov [di].vflag, 1 ;mark queue entry "valid"
mov ax,[di].nextptr
mov _tail,ax ;bump tail pointer

;******************
;Handler Exit Point
;******************

handl_exit:
pop di ;restore registers
pop ds
sti ;enable interrupts
ret ;far return to mouse driver
_handler endp
_TEXT ends
_DATA segment word public 'DATA'
_s@ label byte
_DATA ends
_TEXT segment byte public 'CODE'
public _handler
_TEXT ends
end

Listing 3 (mouseq.c)
/* TITLE: Mouse Event Queue;

FUNCTION: Handles construction and destruction of an event
queue to ease MicroSoft-compatible mouse event
management;

FILENAME: MOUSEQ.C;

COMPILER: TURBO C V. 1.5+ Small Memory Model;

REQUIRES: MOUSEQ.H;


*/


#include <stdlib.h>
#include "mouseq.h"

static unsigned int num_links = 0;
EVENT *head = NULL, *tail = NULL;

/*
* int set_que(unsigned int que_entries);
*
* attempts to create a circular, singly linked list large
* enough to store "que_entries" number of mouse events
*

* on success: head and tail point to first link in list
* local variable num_links equals que_entries
*
* returns: 1
*
* on failure: calls free_que() to free memory allocated
* before an allocation failure, if any
*
* returns: 0
*/
int set_que(unsigned int que_entries)
{
if(que_entries < 3)
return 0;

head = (EVENT *) malloc(sizeof(EVENT));
if(! head)
return(0);

tail = head;
head->next = NULL;
head->xcoord = 0;
head->ycoord = 0;
head->buttonbits = 0;
head->maskbits = 0;
head->valid = 0;

num_links = 1;

while(--que_entries) {
tail->next = NULL;
tail->next = (EVENT *) malloc(sizeof(EVENT));
if(! tail->next) {
free_que();
return 0;
}
else
++num_links;

/*
* could have used calloc() to zero members of EVENT structure
* but for this example it is done explicitly
*/
tail = tail->next;
tail->xcoord = 0;
tail->ycoord = 0;
tail->buttonbits = 0;
tail->maskbits = 0;
tail->valid = 0;
}

/*
* make queue circular, then point head and tail
* to first link in list
* if handler() sees tail->next == head, it assumes the
* event queue is full, and returns to the mouse driver
*/
tail->next = head;
tail = tail->next;


return 1;

}

void free_que()
{
EVENT *linkptr = NULL, *prev_linkptr = NULL;

if(! num_links) /* num_links set by set_que() */
return;

linkptr = head->next;

/*
* save link pointer int prev_linkptr
* bump link pointer to next position
* then free the previous link pointer
*/
do {

prev_linkptr = linkptr;
linkptr = linkptr->next;
if(prev_linkptr)
free(prev_linkptr);
}
while(--num_links);

if(head)
free(head);

head = tail = NULL;
}

Listing 4 (mouseq.h)
#if !defined(MOUSEQ_H)
#define MOUSEQ_H

int set_que(unsigned int que_entries);
void free_que(void);


typedef struct mouseq {
int xcoord;
int ycoord;
int buttonbits;
int maskbits;
int valid;
struct mouseq *next;
}
EVENT;

extern EVENT *head, *tail;

#endif




































































Coding For Superscalar Architectures


Timothy Prince


Timothy Prince has a B.A. in physics from Harvard and a Ph.D. in mechanical
engineering from the University of Cincinnati. He has 25 years of experience
in aerodynamic design and computation. He can be contacted at 39 Harbor Hill
Dr., Grosse Pointe Farms, MI 48236.


C has become the fundamental programming language for machines of new
architectures. By supporting C and porting some operating system from the UNIX
family, a hardware vendor can help customers put a new architecture to use
quickly enough to justify its development. Moreover, most applications far
outlive the hardware on which they were based. Support in C for multiple
architectures has become a necessity.
Much C code in the past has been written in a somewhat portable style, but
with only a few architectures in mind. That can limit how well the code can be
compiled in a way that produces good performance. The new pipelined
"superscalar" computers present some opportunities for code performance
improvement that have not been widely known. These new computers also increase
the importance of certain techniques that have been discussed in the past but
not often exploited. Most of these techniques can be used without degrading
performance on other architectures, thus making a gain in performance
portability.
Generation of efficient code on a variety of architectures first received
attention with the advent of supercomputers that provide architectural support
for operations on vectors. With improved compiler technology and increased
competition among vendors, the definition of vectorizability has widened
considerably. Vectorizability is one of the desirable features of portably
efficient code.


Combining Loops With Breaks 


Loops should be arranged to minimize conditional branching. Whenever possible,
conditional blocks within a loop should end with break.
for(;;) {
...
if (...) {
...
break;
}
...
}
This kind of structure enables compilers to pipeline together the sections of
code that lead to further iteration.
Combining as many operations as possible into each block of code reduces the
fraction of execution time spent on looping control or conditionals. This is
particularly important for typical RISC machines. They are designed to carry
out several independent operations nearly concurrently. Modern compilers
should not be inhibited by loops that contain conditional exits.
There are often many opportunities to combine loops. If the operations in the
second loop don't depend on later operations of the first loop, consecutive
loops with identical control structure can be combined. Even if consecutive
loops differ by 1 in iteration count, they can still be combined. Consider the
following:
for (i=0; i<n; ++i)
...
for (i=1; i<n; ++i)
...
If the operations in the two loops can be intermixed, the loops can be
combined as:
if (n>0)
for (i:0; ; ) {
...
if (++i>=n)
break;
...
}
If the second loop runs from 0 to n-2, the two loops can still be combined as:
if (n>0)
for (i=0; ; ++i) {
...
if (i>=n-1)
break;
...
}
Combining loops is most beneficial when it keeps the pipelines full. Also,
when the same data are used in both sections, it allows a reduction in memory
bus traffic by taking increased advantage of registers.


Increasing Available Parallelism


C programmers have tended to be more aware than others of techniques to reduce
page faults in virtual memory systems. Two important methods are grouping
related functions together in one source file and traversing large data arrays
sequentially. With processor speed increasing faster than disk speed, these
considerations are as important as ever. On the other hand, C programmers have
not been as familiar with compilers that need to overlap operations or to
perform them in a different order from the way they appear in program
statements.
Most of those who have evaluated polynomials have heard of Horner's rule:

poly(x) = a0+x*(a1+x*(a2+x*(a3+....)))
which minimizes the number of multiplications required. This can pose an
obstacle to the performance of pipelined machines. It prevents any attempt at
"superscalar" performance, where more than one operation is performed at a
time. Opportunities for "fine-grained" parallel computation are obtained by
writing expressions in forms such as:
#define SQ(x) (x)*(x)

poly(x) = a0+x*a1+SQ(x)*(a2+x*a3)
+SQ(SQ(x))* (a4+x*a5+SQ(x)*(a6+x*a7))
Ideally, the inner multiplications involving a3 and a7 can be started,
followed by the calculation of SQ(X) and the remaining inner multiplications.
As each result comes out of the multiplier pipeline, it feeds into the adder
pipeline. The first pair of results from addition are multiplied by SQ(x) and
added to the other pair. By this time, the factor SQ(SQ(x)) has been
calculated, so the final multiplication and addition can proceed in sequential
fashion. With certain compilers, additional brackets are required to obtain
the desired order of evaluation. These additional brackets are ignored by
compilers that conform to K&R. It may be necessary to dictate the order of
operations by writing, for example:
q= a4+x*a5;
p= a0+x*a1;
p += SQ(x)*(a2+x*a3);
p +=
SQ(SQ(x))*(q+SQ(x)*(a6+x*a7));
This form needs at least four floating point registers for efficient
evaluation on a scalar architecture. Two registers suffice for Horner's
method. Still, the proposed evaluation scheme will not run out of registers on
any popular architecture. For code clarity, it is useful to have a compiler
that is able to determine the most effective order of evaluation without
requiring the programmer to pay such attention to coding details.
Roundoff errors vary with the order of evaluation. While Horner's method is
most accurate when the terms decrease in magnitude with increasing order, the
suggested order has given satisfactory results with a wider variety of cases.
Compared with a Horner polynomial, this revised form of the expression has two
additional multiplications. Superscalar operation calculates eight terms in
less time than is typically required for four Horner terms on a machine with a
three-stage pipeline. On superscalar machines, approximating math functions
with polynomials of up to nine terms should be more satisfactory than using
table lookup methods.
Frequently, expressions such as
max(0, min(x, 1))
are required. This expression contains two comparisons, which we would like to
have pipelined. Code such as
x>0 ? min(x, 1) : 0
may be almost twice as fast. But if the context is such that other independent
operations will absorb the time anyway, the first version may be preferred to
conserve registers. Many other obvious cases occur where an expression can be
written so as to give the compiler freedom to choose the fastest order of
evaluation on the target machine.


Unrolling Loops


Numerical problems often involve many small searches. Particularly on
pipelined machines, the arrays are likely not to be large enough to benefit
from conversion to binary search. With a pipelined architecture, a linear
search loop needs to be unrolled to gain full performance. Several current
compilers do this automatically, giving the equivalent of:
for (i = -1; ; ) {
if (x[++i] < goal)
break;
if (x[++i] < goal)
break;
if (x[++i] < goal)
break; ...
}
The unrolling required for a given fraction of peak performance is
proportional to the total number of pipeline stages required for memory fetch
and comparison. As typical pipelined architectures use four stages for memory
access and three for floating point comparison, unrolling six times would
produce 50% of peak performance. This unrolling would more than triple the
speed with less than double the number of instructions in the loop body.
Evidently, if unrolling is required for full performance on many problems, an
unrolling compiler is needed to avoid the requirement for changes in the
source code to accommodate various architectures.
Unrolling compilers form a logical bridge to the style of operation needed on
vector machines. Coding styles currently in favor for vector machines are
often also suitable for scalar and superscalar machines.


Recursion, True or Apparent


The greatest weakness of vector architectures is their unsuitability for loops
of a recursive nature. In the most common cases, the recursion is only
apparent. The loop can be altered in several ways to make it vectorizable.
Apparent recursion usually poses no problem for superscalar architectures.
When the recursion is real, there may be ways to group the terms so that
superscalar performance is available. For example, Clenshaw's algorithm for
evaluation of a Chebyshev polynomial involves the loop:
for (dd=0, j=m; - -j>=1; ) {
d=y*2*(sv = d) +c [j] -dd;
dd=sv;
}
return y*d-dd+c[0]*.5;
Assuming that the loop must execute at least once, and removing an unnecessary
subtraction from the first trip, the loop may be reorganized to permit the
array elements to be fetched before they are needed:
for (dd=c[j=m-1]; ; ) {
d=y*2*(sv=d) +dd;
dd=c[- -j];
if (j==0)
return y*d-sv+dd*.5;
dd -= sv;
}
A pipelining compiler should generate code that requires only the time of one
multiplication and one add per loop iteration. Compilers for non-pipelined
machines also are able to generate efficient code for the second form.
Recursion limits this loop to 50% greater than scalar performance. Merging
such loops with other loops is a useful route to improved superscalar
performance -- current vector compilers should be able to split off
vectorizable portions when necessary. Multiple evaluations of such an
expression can be vectorized by looping across the independent cases inside
the recursion loop. With unrolling, this technique also improves performance
of superscalar machines.
Clenshaw's algorithm was chosen for this example because it is a
straightforward illustration of the points to be made. Other more common
situations, such as fitting spline curves by solution of tridiagonal systems,
share many of these characteristics.

"Pump priming" involves copying enough vector elements to registers before the
loop starts so as to eliminate delays in arithmetic pipelines. It is usually
beneficial for pipelined machines. It does, however, introduce apparent
recursion, which inhibits vectorization and reduces the feasible amount of
unrolling. So pump priming is best restricted to situations where there is
true recursion, or where there is no desire to maintain performance on a
vector machine. Pump priming was performed by compilers for architectures
developed by Floating Point Systems and Control Data Corporation. Current
unrolling compilers do not consider pump priming as an alternate strategy. It
is most useful when recursion or shortness of the loop prevent achieving full
performance by loop unrolling.
Apparent recursion can be eliminated by creating a vector that would not
otherwise be required. It can also be eliminated by putting appropriate
additional elements at the beginning or end of a vector. Such elimination can
allow a superscalar machine to keep its pipelines full without duplicating
operations. The following simple example arises in problems involving finite
differencing in cylindrical coordinates. The apparently recursive form is:
for (im1=n, i=1; i<= n; im1=i++)
flux[i]=q[i]-q[im1];
The vectorized form is:
for (q[0]=q[n], i=1; i<=n; ++i)
flux[i]=q[i]-q[i-1];
The C compiler needs access to the declarations of the arrays, or the code
must contain special directives, to assure that the arrays do not overlap.
Unlike other languages, C does not give any assurance that arrays passed into
a function as arguments do not overlap.


C Branching Syntax


In C, the conditional expression:
if (a && b)
has the same meaning as:
if (a)
if(b)
In many cases, the same result will be obtained from the expression:
if (a & b)
The last expression leaves no doubt that a and b can be evaluated in parallel.
A similar relationship holds with the and operators. When there are no side
effects, such as function calls, the code with bitwise operators may be more
efficient.
Since there is no penalty in generated code for writing expressions such as:
a!= 0 && b != 0
there is little excuse for the shorthand:
a && b
Replacing && with & can change the meaning of the second form but not the
first. That makes the shorthand more conducive to errors.
Using the bitwise operators permits the compiler to substitute a logical
operation for a branch. Many modern architectures have efficient instruction
sets for evaluating relational expressions without branching. Source code that
does not require additional branches can profit from such architectures.


Conclusion


Coding for greater available parallelism improves the performance of software
on superscalar and vector architectures. In most cases, there is little or no
penalty on any current architecture. Awareness of techniques such as
unrolling, with the use of appropriate software development tools, can improve
performance with minimal change to current applications. In a few cases,
adopting a coding style that differs from common practice can help compilers
for the new architectures perform better code generation.






























An Object-Based Real-Time Executive


Michel de Champlain


Michel de Champlain is president of Cnapse, which specializes in C & C+ +
training, real-time executives, and reusable software components. He is a
faculty member in the Computer Science Department at College Militaire Royal
St-Jean. He has an M.S. in computer science and is currently completing a
Ph.D. at Ecole Polytechnique, Montreal. You can contact him at Cnapse, 11
Laurier, Suite 102, Chambly, Quebec, Canada J3L 9Z7, (514) 447-7221.




Introduction


The Synapse[1][2] Real-Time eXecutive[3] is a small object-based real-time
executive library that is easy to learn and use, where each object is a task.
Small real-time systems and device drivers are its main application area.
The concurrent programming model is asynchronous message passing via
interrupts. Tasks communicate by sending interrupt messages to handlers (also
called methods). This inter-task communication concept is in fact a natural
extension of the hardware and software interrupts uniformly available in
high-level systemcall facilities. SynRTX also provides dynamic priority and
time slice options, along with task identifiers, to allow the control of
scheduling and dispatching decisions. SynRTX's actual implementation runs on
IBM PCs and compatibles.
SynRTX requires an encapsulation mechanism, which structures an application by
separating it into compilation units. The compilation unit (or task unit)
describes abstract object type (AOT) that operates by a fixed set of
operations (or methods) executed by handlers. A SynRTX real-time application
is composed of one or more task units, where each task unit is in a file that
defines one task and must be compiled separately.
The name of a task and its handlers are the only exportable entry points.
Constants, variables, devices, and handlers are declared and referenced
locally (using static class) in the task unit. Modularity in a real-time
application in SynRTX is achieved through information hiding, which enforces a
rigorous separation between tasks.
Figure 1 shows a task symbol that emphasizes the only visible interface of a
task unit: the task name and its handler entry points.


Input/Output In A Multitasking Context


Input and output statements have been implemented in the SynRTX library for
three reasons:
The SynRTX library implements integer, character, and string input-output
statements to easily test real-time applications. This basic support I/O eases
software development. These statements can be replaced (or ignored) if more
powerful device drivers are written.
The statements offer complete mutual exclusion while executing I/O.
These statements are extremely useful as practical examples in tutorials. I/O
statements are io_Get* (input) or io_Put* (output).
Here is an example program. It reads the following three lines of input:
A
65535
Hello!
The example program is:
typedef unsigned short word;
char aChar;
word aWord;
char aString[7];
io_getc (&aChar);
io_getw(&aWord);
io_gets(aString);
*/ aChar = 'A', aWord = 65535, and aString
= "Hello!" */
io_puts("\naChar = "); io_putc(aChar);
io_puts("\naWord = "); io_putw(aWord);
io_puts("\naString = "); io_puts(aString);
io_get* statements read items from the keyboard (c for a character, w for a
word, and s for a string). An io_get* item is a variable reference. io_put*
statements write items to the screen. An io_put* item is an expression or a
string constant. There is also a printf equivalent:
io_putf("\naChar = %c aWord = %w aStrin9 = %s",
aChar, aWord, aString);


Scheduling And Communication


A task is a separate "thread of control" that executes concurrently with other
tasks. It is an independent active entity with a collection of statements
executed strictly in sequence that goes into action automatically when the
task is started. A task ends by executing a task_terminate statement in its
body. Concurrent execution of a task is created and scheduled by the
task_start statement, with actual parameters if required.
To meet performance requirements, some real-time applications use explicit
control over the scheduling and dispatching regime. SynRTX specifies the time
slice and the priority of tasks during the creation of a task, a desirable
feature in many applications. When dealing with excessive loads, an
application needs to control scheduling and dispatching decisions along with
dynamic adjustment of priorities and time slices. Listing 1 shows most of the
scheduling control statements in one task.
A real-time application must respond to a series of external asynchronous
hardware interrupts, which may occur at any time. On the software side,
multitasking independently executes tasks that send messages to other tasks
via software interrupts. SynRTX supports both hardware and software
interrupts, as well as the handlers that serve them. This notion is completely
CPU independent. It brings a uniform degree of abstraction to interruptios and
handlers. This uniform abstraction is seldom available in traditional
operating systems.
Inter-task communication requires that asynchronous tasks be allowed to
exchange messages (information) or signals (synchronization only) with each
other. SynRTX's concurrent programming model is built around asynchronous
message passing via interrupts. Communication between tasks is effected by
sending interrupt messages to handlers. This flexibility of this mechanism
allows both asynchronous and synchronous message passing. It also offers
unidirectional or bidirectional information transfer, where and when required.

Handlers are intended to receive asynchronous interrupt messages. A handler is
an interrupt service routine (ISR). It is declared solely in the task unit
that it is dedicated to serve. Once an interrupt call has been sent, the
handler body is executed to assure mutual exclusion in the body during its
execution. Although there are two kinds of handlers, software and hardware,
only software handlers can receive parameters from interrupts. A handler
behaves like a "critical section procedure." It returns to the point of
interrupt when the service is completed.
SynRTX also supports an interface to hardware interrupts. These interrupts are
similar to inter-task communication between a driver task and a hardware
device task.
Asynchronous inter-task communication is a natural mechanism based on the
importance of interrupts in real-time systems. SynRTX offers asynchronous
message passing as a general communication construct.
In Figure 2, a single arrow indicates an asynchronous interrupt with no
information (data) in the message from the calling task to the interrupted
task.
Listing 2 and Listing 3 show a task Receiver that awaits the reception of an
interruption from task Sender, via the statement task_enableWait in the task
entry of Receiver.
Synchronous of inter-task communication requires a rendezvous between tasks
that desire to transfer data unidirectionally or bidirectionally. The sender
is blocked until the other task (the receiver) sends back an interrupt. Then
both tasks continue.
In Figure 3, a single arrow with a small circle represents an asynchronous
interrupt with information (data) in the message.
Listing 5 shows a task Receiver that waits to receive an interruption with
data from another task Sender, shown in Listing 4, by means of the
task_enableWait statement in Receiver.


Mutual Exclusion Among Tasks


Mutual exclusion among tasks requires that access by asynchronous tasks to the
same critical section of code be serialized. When tasks need to update common
data, the data may be corrupted if more than one update takes place in
parallel. In SynRTX, handlers guarantee mutually exclusive access to global
data in a task. A handler guarantees that only one task is active inside a
handler at a given time.
Figure 4 shows how mutual exclusion is achieved using handlers. The task
Observer (Listing 8) waits for an event, and increments the eventCount in task
Update (Listing 7) when it sees one. The task Reporter (Listing 9) waits for a
while, then prints and resets the eventCount in the Update task. The task
Update is started first to initialize count with the value zero and give fair
access to the Reporter and the Observer by rotating interrupt priorities.
Tasks competing for shared devices must synchronize their accesses. Once a
resource is acquired by a task, another task claiming the device should be
blocked until the owner task releases it. Figure 5 illustrates a basic device
driver. Each task must follow the same access protocol: acquire the device,
use it, and finally release it.
The task BasicDeviceDriver (Listing 10) blocks itself by the task_enableWait
statement when the device is available. The Acquire handler keeps the task
identification (the caller) in owner, and acknowledges it by sending an
interrupt to the caller's Grant handler. The owner can use the device, and is
the sole task that can release it. The Release handler verifies whether the
task attempting to release the device is really the owner, then frees the
device by assigning NOBODY to the owner. Otherwise, an exception (implemented
as a local handler to the server task) is raised to indicate which task does
not respect the access protocol.


A Simple Keyboard Device Driver Example


The keyboard device driver shown in Figure 6 is composed of two tasks: a
keyboard client and a keyboard server. The server task has five handlers (one
hardware and four software).
The Keyboard hardware handler (see Listing 11) serves every keyboard interrupt
at vector 9. A hardware interrupt service routine should be as fast as
possible, reducing latency to a minimum. You achieve this goal by sending a
software interrupt to the keyboard server task itself through Getchar handler,
allowing a fast return from the hardware interrupt. The three software
handlers Acquire, Release, and Exception support the device driver access
protocol explained earlier. Finally, the Hit handler polls the first-in
first-out character buffer and returns (via an interrupt) one character to the
keyboard client task if the buffer is not empty.
The keyboard client task in Listing 12 is a straightforward server exerciser
that repeats forever the following protocol: acquire the keyboard, echo every
character typed on the screen, release the keyboard when a character r is hit,
and free the server for 10 seconds.
Several keyboard client tasks can be created. These tasks compete for the
keyboard trying to acquire it every time they are run. In fact, every keyboard
client is a potential "virtual keyboard" that could be (in a windowing
application, for example) attached to a specific window.


Conclusion


I have described the SynRTX real-time executive designed on an explicit
inter-task communication primitive, the interrupt. The mutual exclusion
embedded in handlers is a simple, high- level, and easy-to-build rapid device
driver prototype in C (certainly more readable than low-level semaphores). A
handler in SynRTX achieves its mutual exclusion because only one task at a
time can be running in a handler. Several classical task coordinate problems
[1] like readers/writers, bounded-buffer, dining philosophers, event timer,
and robot arm controller have been solved elegantly and implemented
successfully.
Device driver programming is not easy. SynRTX helps to make application
development quicker and reduces the complexity of multitasking. The bottom
line is that a task unit implemented as an object operated on by a set of
handlers is suitably comprehensive.
References
[1] de Champlain, M., "Synapse: A Real-Time Programming Language," M.S.
Thesis, Computer Science Department, Concordia University, September 1989.
Synapse is a real-time programming language designed by the author as part of
his M.S. Thesis. SynRTX is the library that provides all services needed by
the language.
[2] de Champlain, "Synapse: A Small and Expressive Object-based Real-Time
Programming Language," ACM SIGPLAN Notices, May 1990.
[3] Cnapse, "SynRTX/PC - Synapse Real-Time eXecutive for the IBM/PC and
Compatibles," Programmer's Manual v2.03, August 1990.
Figure 1 Symbolic Representation of a Task Unit
Figure 2 Asynchronous Communication
Figure 3 Synchronous Communication
Figure 4 Mutual Exclusion Among Tasks
Figure 5 Basic Device Driver
Figure 6 Keybord Device Driver

Listing 1 (schedcon.c) Scheduling Control Task
#include <synrtx.h>

static handler IdentifyTheCallerTask(msg_t *msg_ptr) { 
register task_t caller = msg_ptr->srcTid;

io_putf("Interrupted by task #%w named: %s\n", caller,
task_nameOf(caller));
}
task SchedulingControl(void) {
task_t myChild, self;
priority_t myPriority;
timeslice_t myTimeslice;

/* creation of a child named ChildTask, priority 1, timeslice 20ms */

myChild = task_create(ChildTask, "ChildTask", 1, 20, NO_ARG, NO_VAR);

/* print the child taskid in three different ways
io_putw(myChild);
io_putw(task_child());
io_putw(task_idOf("ChildTask"));

/* Start the execution of myChild */
task_resume(myChild);

/* Print my task id and my parent task id */
io_putw(task_self());
io_putw(task_parent());

/* Save my priority and my time slice */
self = task_self();
myPriority = task_priorityOf(self);
myTimeslice = task_timesliceOf(self);

/* Raise my priority */
task_setPriorityOf(self, myPriority + 1);

/* Suspend my child, give him my old priority, and start him again */
task_suspend(myChild);
task_setPriorityOf(myChild, myPriority);
task_resume(myChild);

/* Raise my time slice by 10 real-time units */
task_setTimesliceOf(self, myTimeslice + 10);

/* Give up the cpu */
task_reschedule();

/* When get back here, kill my child, and suicide */
task_terminate(myChild);
task_terminate(self);
}


Listing 2 (sender.c) Sender Task for Asynchronous Communication
task Sender(void) {
task_t self = task_self();
msg_t msg;

task_setHandler(0); /* this task has no handlers */

loop {
/* interrupt Receiver.Sync */
msg.srcTid = self;
msg.dstTid = task_idOf("Receiver");
msg.dstHid = 0; /* Sync = Handler Id #0 */
msg.type = msg_type_SYNC;
task_interrupt(&msg);
/* ... */
}
}


Listing 3 (receiver.c) Receiver Task for Asynchronous Communication

#define SYNC 0
static handler Sync(msg_t *msg_ptr) { /* for synchronization only */
}

task Receiver(void) {
task_setHandler(1, Sync); /* this task has one handler called Sync */

loop {
/* wait on Sync */
task_enableWait(NO_TIMEOUT, 1, SYNC);
}
}


Listing 4 (sender.c) Sender Task For Synchronous Communication
#define REPLY 0

static handler Reply(msg_t *msg_ptr) { /* for synchronization only */
}

task Sender(void) {
task_t self = task_self();
msg_t msg;

task_setHandler(1, Reply); /* this task has one handler called Reply */

loop {
/* interrupt Receiver and wait on Reply */
msg.srcTid = self;
msg.dstTid = task_idOf("Receiver");
msg.dstHid = 0; /* Sync = Handler Id #0 */
msg.type = msg_type_DATA;
msg.value.a = /* address of DATA */
task_interruptEnableWait(&msg, NO_TIMEOUT, 1, REPLY);
/* ... */
}
}


Listing 5 (receiver.c) Receiver Task for Synchronous Communication
#define SYNC 0

static handler Sync(msg_t *msg_ptr) {
register task_t caller = msg_ptr->srcTid;
register task_t self = msg_ptr->dstTid;

/* interrupt caller.Reply */
msg_ptr->srcTid = self;
msg_ptr->dstTid = caller;
msg_ptr->dstHid = 0; /* caller's REPLY handler id = 0 */
msg_ptr->type = msg_type_SYNC;
task_interrupt(&msg);
}

task Receiver(void) {
task_setHandler(1, Sync); /* this task has one handler called Sync */

loop {
/* wait on Sync */

task_enableWait(1, SYNC);
}
}


Listing 6 (suuser.c) Start Up User Tasks
#include <synrtx.h>

extern task Update(void);
extern task Observer(void);
extern task Reporter(void);

task StartUpUserTasks(void) {
task_start(Update, "Update", 1, 2, NO_ARG, NO_VAR);
task_start(Observer, "Observer", 1, 2, NO_ARG, NO_VAR);
task_start(Reporter, "Reporter", 1, 2, NO_ARG, NO_VAR);
task_terminate(task_self());
}


Listing 7 (update.c) Update Task
#include <synrtx.h>

#define OBSERVE 0
#define REPORT 1

static word eventCount;

static handler Observe(msg_t *msg_p) {
io_putf("Update.Observe: eventCount = %w\n", ++eventCount);
}

static handler Report(msg_t *msg_p) {
io_putf("Update.Report: eventCount = %w\n", eventCount);
eventCount = 0;
}

task Update(void) {
task_setHandler(2, Observe, Report);

eventCount = 0;
loop {
/* give a fair access by rotating interrupt priorities */
task_enableWait(NO_TIMEOUT, 2, OBSERVE, REPORT);
task_enableWait(NO_TIMEOUT, 2, REPORT, OBSERVE);
}
}


Listing 8 (observer.c) Observer Task
#include <synrtx.h>

task Observer(void) {
task_t self = task_self();
msg_t msg;

task_setHandler(0);
loop {
/* wait for an event ...

* ... I've seeing one! then, interrupt Update.Observe
*/
msg.srcTid = self;
msg.dstTid = task_idOf("Update");
msg.dstHid = 0; /* Observe = 0 */
msg.type = msg_type_SYNC;
task_interrupt(&msg);
}
}


Listing 9 (reporter.c) Reporter Task
#include <synrtx.h>

task Reporter(void) {
task_t self = task_self();
msg_t msg;

task_setHandler(0);
loop {
/* Wait for a while ... */
task_delay(500L);

/* ... then report */
msg.srcTid = self;
msg.dstTid = task_idOf("Update");
msg.dstHid = 1; /* Report = 1 */
msg.type = msg_type_SYNC;
task_interrupt(&msg);
}
}


Listing 10 (basicdd.c) Basic Device Driver Task
#include <synrtx.h>

#define ACQUIRE 0
#define RELEASE 1
#define EXCEPTION 2
#define NOBODY (task_t)0

static task_t owner;

static handler Acquire(msg_t *msg_p) {
register task_t caller = msg_p->srcTid;
register task_t self = msg_p->dstTid;

owner = caller;

/* interrupt caller.Grant */
msg_p->srcTid = self;
msg_p->dstTid = caller;
msg_p->dstHid = /* caller's GRANT handler id */
msg_p->type = msg_type_SYNC;
task_interrupt(&msg);
}

static handler Release(msg_t *msg_p) {
register TID caller = msg_p->srcTid;

register TID self = msg_p->dstTid;

if (owner == caller)
owner = NOBODY;
else {
/* interrupt self.Exception(caller) */
msg_p->srcTid = msg_p->dstTid = self;
msg_p->dstHid = EXCEPTION;
msg_p->type = msg_type_WORD;
msg_p->value.w = (word)caller;
task_interrupt(msg_p);
}
}

static handler Exception(msg_t *msg_p) {
io_putf("Illegal Access to Device Driver by Task #%w\n", msg_p->value.w);
}

task BasicDeviceDriver(void) {
task_setHandler(3, Acquire, Release, Exception);

owner = NOBODY;
loop {
task_enableWait(NO_TIMEOUT, 1, ACQUIRE);
do /* The owner can now used the device ...
* only the client (owner) can release it.
*/
task_enableWait(NO_TIMEOUT, 2, RELEASE, EXCEPTION);
while (owner != NOBODY);
}
}
}


Listing 11 (kserver.c) Interrupt Driven Task Keyboard Server
#include <synrtx.h>
#include <ascii.h>

#define BEEP 8
#define NCHARS 80
#define NOBODY 0
#define EOI 0x20

static word buffer[NCHARS], nCharsIn, charToSend, charToReceive;
static task_t ksid, owner;
static msg_t msg;

/* handler numbers */
#define ACQUIRE 0
#define RELEASE 1
#define HIT 2
#define EXCEPTION 3
#define GETCHAR 4

#define PORT_A 0x60 /* 8255 */
#define PORT_B 0x61

static char kbdTab[] = {
NUL,ESC,'1','2','3','4','5','6','7','8','9','0','-','=', /* 00-13 */

BS,HT,'q','w','e','r','t','y','u','i','o','p','[',']',CR, /* 14-28 */
0,'a','s','d','f','g','h','j','k','l',';','\"','`', /* 29-41 */
0,'\\','z','x','c','v','b','n','m',',','.','/',0,'*', /* 34-55 */
0,' ',0 }; /* 56-58 */

static byte code, status;
static bool bufferFull;

/* Hardware handler Keyboard for interrupt vector 9 */
static handler Keyboard(void) {
/* disable keyboard interrupt to not overwrite interrupt stack */
dev_enable(DISABLE, 1, KEYBOARD);
cpu_enable(); /* allow further interrupts */

code = dev_byte(PORT_A);
status = dev_byte(PORT_B);
dev_setByte(status 0x80, PORT_B); /* send ack _/^\_ to kbd */
dev_setByte(status & 0x7F, PORT_B);

if (code < 0x80 && kbdTab[code] && !bufferFull) {
msg.value.w = (word)kbdTab[code];
/* interrupt KeyboardServer.GetChar */
task_interrupt(&msg);
}
/* keep all interrupts disable for the EOI and Exit */
cpu_Disable();
dev_setByte(EOI, 0x20);
dev_enable(ENABLE, 1, KEYBOARD);
}

static handler Acquire(msg_t *msg_p) {
register task_t caller = msg_p->srcTid;
register task_t self = msg_p->dstTid;

owner = caller;
charToSend = charToReceive = nCharsIn = 0;
bufferFull = FALSE;

/* interrupt caller.Grant */
msg_p->srcTid = self;
msg_p->dstTid = caller;
msg_p->dstHid = 0; /* GRANT = 0 */
msg_p->type = msg_type_SYNC;
task_interrupt(msg_p);
}

static handler Release(msg_t *msg_p) {
register task_t caller = msg_p->srcTid;
register task_t self = msg_p->dstTid;
if (owner == caller) {
owner = NOBODY;
/* BufferFull is set to TRUE to avoid any software interrupts */
bufferFull = TRUE;
} else {
/* interrupt self.Exception(caller) */
msg_p->srcTid = msg_p->dstTid = self;
msg_p->dstHid = EXCEPTION;
msg_p->type = msg_type_WORD;
msg_p->value.w = (word)caller;

task_interrupt(msg_p);
}
}

static handler Hit(msg_t *msg_p) {
register task_t caller = msg_p->srcTid;
register task_t self = msg_p->dstTid;
register word c;
if (owner == caller) {
if (nCharsIn == 0) /* buffer empty ? */
c = 0;
else {
c = buffer[charToSend] 0x0100;
charToSend = (charToSend + 1) % NCHARS;
nCharsIn--;
bufferFull = FALSE;
}

/* interrupt caller.GetChar(character) */
msg_p->srcTid = msg_p->dstTid;
msg_p->dstTid = caller;
msg_p->dstHid = 1; /* GetChar = 1 */
msg_p->type = msg_type_WORD;
msg_p->value.w = c;
task_interrupt(msg_p);
} else {
/* interrupt self.Exception(caller) */
msg_p->srcTid = msg_p->dstTid = self;
msg_p->dstHid = EXCEPTION;
msg_p->type = msg_type_WORD;
msg_p->value.w = (word)caller;
task_interrupt(msg_p);
}
}

static handler Exception(msg_t *msg_p) {
io_putf("Illegal Access to KeyboardServer by Task #%w", msg_p->value.w);
}

static handler GetChar(msg_t *msg_p) {
if (nCharsIn == NCHARS) /* buffer full ? */ {
/* then, ignore the character (stops int from hard) */
io_puts("\nKeyboardServer: buffer full !\n");
bufferFull = TRUE;
} else {
buffer[charToReceive] = msg_p->value.w;
charToReceive = (charToReceive + 1) % NCHARS;
nCharsIn++;
}
}

task KeyboardServer(void) {
register word ps;

ps = cpu_saveAndDisable();
ksid = task_self();

/* Install Keyboard hardware handler as the ISR for interrupt vector 9 */
cpu_setHandler(INT9_KEYBOARD, Keyboard);


/* Install software handlers */
task_setHandler(5, Acquire, Release, Hit, Exception, GetChar);
charToSend = charToReceive = nCharsIn = 0;
/*
* The KeyboardServer cannot receive characters while it is not yet
* acquired, but should wait until a "client" gets it.
* BufferFull is set to TRUE to avoid any software interrupts.
*/
bufferFull = TRUE;
msg.srcTid = msg.dstTid = ksid;
msg.dstHid = GETCHAR;
msg.type = msg_type_WORD;
cpu_restore(ps);
loop {
/* enable wait Acquire */
task_enableWait(NO_TIMEOUT, 1, ACQUIRE);

do /* enable wait GetChar, Hit, Release, Exception */
task_enableWait(NO_TIMEOUT,4,GETCHAR,HIT,RELEASE,EXCEPTION);
while (owner != NOBODY);
}
}


Listing 12 (kclient.c) Keyboard Client Task
#include <synrtx.h>

#define REL (((word)0x0100) ((word)'r'))

/* handler numbers */
#define GRANT 0
#define GETCHAR 1

static word character;

static handler Grant(msg_t *msg_p) { /* for synchronization only */
}

static handler GetChar(msg_t *msg_p) {
character = msg_p->value.w;
}

task KeyboardClient(void) {
register task_t self = task_self();
register task_t ksTid = task_idOf("KeyboardServer");
msg_t msg;

task_setHandler(2, Grant, GetChar);

loop {
/* interrupt KeyboardServer.Acquire, then wait on Grant */
msg.srcTid = self;
msg.dstTid = ksTid;
msg.dstHid = 0; /* Acquire = 0 */
msg.type = msg_type_SYNC;
task_interruptEnableWait(&msg, NO_TIMEOUT, 1, GRANT);
io_puts("\nKeyboardClient: Acquired\n");
character = 0;

loop {
/* interrupt if KeyboardServer.Hit and wait for GetChar */
msg.srcTid = self;
msg.dstTid = ksTid;
msg.dstHid = 2; /* Hit = 2 */
msg.type = msg_type_SYNC;
task_interruptEnableWait(&msg, NO_TIMEOUT, 1, GETCHAR);
/*
* a legal character is a word 0x01xx where "xx" (LSB) is the
* actual character. The character 0x0000 means no key hit.
*/
if (character) {
io_putc((char)character);
if (character == REL) break; /* then 'r'elease */
character = 0;
}
}

/* interrupt KeyboardServer.Release */
msg.srcTid = self;
msg.dstTid = ksTid;
msg.dstHid = 1; /* Release = 1 */
msg.type = msg_type_SYNC;
task_interrupt(&msg);
io_puts("\nKeyboardClient: Released and delayed for 10 secs ...\n");
/*
* 10 seconds of freedom for the keyboard. All the characters typed
* during that time will flushed at the time of next Acquirement.
*/
task_delay(500L);
}
}































Inter-Process Communication With Sockets


Bryan Glennon


Bryan Glennon is president of BPG Consulting, Inc. He has been professionally
involved in software engineering for the last nine years. His recent clients
include Rockwell International, Ameritech Applied Technologies and McDonalds.
He can be reached at P.O. Box 841, Bensenville, IL 60106, (708) 595-6059.


Inter-Process Communication (IPC) allows one or more processes to share
information. IPC methods include signals, semaphores, shared memory, pipes,
and sockets. Some, such as signals and semaphores, convey only a little
information, (such as whether an event has occurred). Shared memory allows
several processes to read from and write to the same area of memory, to convey
considerably more information. The processes must cooperate, however, to
ensure that the shared memory is only accessed when it is in a "safe" state --
no updates are incomplete.
Pipes connect two processes more safely. One process writes data into one end
of the pipe, and the other reads data from the opposite end. Pipes, however,
have one major disadvantage -- the two processes using the pipe must share a
common ancestor.
Sockets provide a communication link just like a pipe, but for processes with
no common ancestor. Indeed, many versions of UNIX implement pipes by using a
pair of sockets. Sockets allow separate processes to transfer information
between them. The processes may be on the same or different processors.


The Example Application


To demonstrate socket use, I will build a small time-tracking system. This
system helps the user keep track of the time spent on various tasks. A
back-end, or server process, reads events from a socket, and several
front-end, or client processes, send events to the socket. Implementing such a
system without using IPC is possible, but the IPC version has several
advantages.
In a multi-user environment, a single process (the server) handling all file
access reduces the need for file or record locking in the application. The
socket provides synchronization, which guarantees sequenced, reliable delivery
of the messages (depending on the choice of socket types). Since the server
and client process are not started by the same process (they share no common
ancestor), sockets are a logical choice for the IPC mechanism. Also, by moving
all of the file manipulations to the server, each client process can be
smaller. All changes are localized to the server, which presents a consistent
interface to all clients.
The first example shows how processes running on the same processor can
communicate through sockets. I then expand the example to show server and
client processes on different hosts of a connected network.


The Server Process


The server process waits for an event to arrive. Once an event arrives, it
writes a record to the time entry file. Listing 1 shows the structure of an
event. All messages between the clients and the server consist of exactly one
event. The event code in the EVENT_TYPE structure is set by the client
process, based on the name under which it was invoked. Listing 2 shows the
code for the server process.
First, the server must create the socket by calling socket(domain, type,
protocol). The socket's domain determines where the socket resides and where
processes must be in order to access the socket. In Listing 2, the socket is
created with a domain of AF_UNIX, placing the socket in the UNIX domain. Only
processes running on the same host as the server can access a UNIX domain
socket. When the operating system creates the socket, it places an entry in
the file system, just as when a file is created. When connections to the
socket are made, both the server and the client processes use the full path
name of the socket. Sockets may also be created in the Internet domain, which
allows not only processes on the same processor to access the socket, but also
processes on the same network. You can access sockets in the Internet domain
by specifying the address of the host processor, and the port on the socket's
host.
The type parameter determines the socket type. In Sun OS, the valid socket
types are SOCK_STREAM, SOCK_DGRAM, SOCK_RAW, and SOCK_SEQPACKET. Stream
sockets (SOCK_STREAM) provide reliable, two-way byte streams, and sequence is
guaranteed when using stream type sockets. Stream-based sockets also support
out-of-band (OOB) data. OOB data is announced to the receiving socket when it
arrives, instead of being placed in the message queue. The receiving process
must take special action to process this out-of-band data. When OOB data
arrives, a signal (SIGUGR) is sent to the receiving process. When the process
receives the signal, it can read the data, peek at the data, or ignore the
data. Sending a message out-of-band is especially useful when sending an abort
message that will cause the socket to cease processing any messages that could
be in the socket.
Datagram sockets (SOC_DGRM) support connectionless, unreliable message
transfer of fixed-length message packets. The service is unreliable because
the client process does not have to establish a connection, via the connect ()
call, before sending the message. Without this step, the system cannot
guarantee that the message is delivered. Since the sender does not have to
establish a connection, less overhead is involved with this type of socket,
and there is no risk of failure due to an invalid address. Since delivery is
not guaranteed, DATAGRAM type sockets are best used for non-critical message
transfer, or when some other form of acknowledgment is used.
The remaining socket types are used for special purpose applications:
SOCK_SEQPACKET socket types are not yet implemented; they are planned to
provide sequenced, reliable, connection-based transfer of messages of a fixed
maximum length. SOCK_RAW type sockets provide access to the underlying network
protocol. In general, only someone with superuser permissions can use this
socket type.
The protocol argument allows the creator of the socket to specify which
network transport protocol will be used. Implementation-dependent restrictions
govern which protocol can be used with the various socket types and domains. A
value of zero can be used for the protocol, which specifies that the default
protocol associated with the type and domain should be used.
The call to socket() returns a small, positive integer similar to a file
descriptor. This number refers to the socket in future operations. If the call
to socket() fails, it returns -1. The global variable errno will be set to
indicate the error. Specific errors will vary from machine to machine, but
common errors include: a protocol that is not supported, lack of permissions,
or exceeding system limits for buffers or descriptors.
After the server successfully creates the socket, it must associate the socket
with an address. This process, called binding, is done by calling bind(socket,
address, length). The first argument is just the descriptor returned by
calling socket(). The structure pointer passed as the second argument is first
filled in by the caller. For sockets in the UNIX domain, the full path name of
the socket is placed into the structure. A pointer to this structure is then
passed to bind(). The third argument, the name length, is the size of the
second argument, not the length of the string that represents that path name.
Between the call to socket() and the call to bind(), the socket exists in the
address space appropriate for the given domain, but it has no name associated
with it. bind() returns zero on a successful call, or -1 for failure. As with
all system calls, the global variable errno is set to the appropriate error
code. Again, the specific errors will vary, but the call will fail if, for
example: the descriptor does not refer to a socket, the specified address is
already in use, or the address is invalid.
When the server creates sockets in the UNIX domain, it must call unlink() to
delete the file system entry once the socket is no longer needed.
After binding a socket to an address, the server calls listen(socket, count).
This call establishes the number of simultaneous inbound connections that the
socket will accept. It applies only to sockets of type SOCK_STREAM or
SOCK_SEQPACKET. The socket parameter is the descriptor returned by the call to
socket(). The count parameter specifies the maximum queue length of pending
connections. If a request for connection arrives when the queue is full, the
connection is denied, and errno is set to ECONNREFUSED. listen() returns a
value of zero if successful, otherwise it returns -1. The call can fail for a
number of reasons, including specifying a bad descriptor or a socket type
other than SOCK_STREAM or SOCK_SEQPACKET.
After establishing the connection queue, the server calls
accept(socket,address, len) to allow other processes to connect to the socket.
The first parameter is a socket descriptor returned by socket. The second and
third parameters provide the address of the connecting process, so two-way
communication can be established. These parameters can be zero, in which case
no information is returned, and two-way communication is not possible.
accept() clones the socket represented by the first argument when a connection
is made, allowing the original socket to continue to accept connections, while
communication takes place over the cloned socket. Only SOCK_STREAM type
sockets can use this mechanism. accept() returns a socket descriptor which is
used for communication with the accepted socket. If the call fails, it returns
-1. The call can fail for various reasons, including a bad descriptor or
invalid (and non-null) pointers for the address and length parameters.
After calling accept(), the server begins reading data from the returned
socket descriptor using the standard system call read(). From here on, the
descriptor can be read and processed like a normal file descriptor.
The server will continue to run, waiting for connections at the accept() call.
When it receives an event code of SHUT_DOWN, it closes the socket using the
standard system call close(). It removes the file system entry by calling
unlink(). The process cleans up the time entry file and exits.


The Client Process


When a user wants to make a time accounting entry, he invokes the client
process (or processes). There are several client processes, all of which run
the same program. Under UNIX, client process names are all linked to the same
program. Any of these names will invoke the program. When a child process is
executed, the command name is passed as the first argument to the program.
This name, (the variable argv[0] in the example client process) determines the
event to be sent to the server. Multiple program instances could be used, each
with a different name, though this would waste disk space.
Once invoked, the client process builds the event message, locates the address
of the server (using the agreed-upon name), sends the event, and exits.
Listing 3 shows the code for a simple client process.
After evaluating argv[0] to determine the event type, the EVENT_TYPE structure
ev is filled. The client process creates a socket in the same domain and of
the same type used by the server process. The socket's address is then built,
using the same parameters as the server process. The client then calls
connect(). Since the socket has already been created, bound to an address and
is accepting connections, the connection is allowed. At this point, the client
uses the standard system call write() to write an event record to the socket.
Once the write call returns, the client closes the socket and exits.


Running Over A Network


You must make several minor modifications to allow the server and client
processes to be on different hosts on a connected network. First, you must
place the socket in the Internet domain (AF_INET), not the UNIX domain.
Sockets in the Internet domain have an Internet address contained in the
sockaddr structure. Also, the clients must specify the name of the service and
the machine on which it is running. In the example, this is hard-coded for
simplicity, but in the real world, you could approach this in one of two ways.
Either the service would be so widely needed and used that a well-known port
number would be assigned, and possibly even an alias for the host on which the
server ran (e.g., Port 64 on machine TimeHost). Or an entry would be made in a
services database, from which the clients could retrieve the address of the
server by using an OS provided call (such as getservbyname under SUN OS).
Listing 4 shows the server process with the network modifications.
The first change is in the type of the address structure. Instead of being a
struct sockaddr, it is now a struct sockaddr_in. The call to socket specifies
AF_INET as the domain, requesting that the socket be placed in the Internet
domain. The address of the host (in this case, a machine called utopia) is
retrieved using the gethostbyname() system call. This call returned a pointer
(hp) to a host entry structure. This structure's address is copied into the
socket's address structure. The call to bind() is made, and after the socket
is bound to an address, the port number is printed out. The port number, if
not specified, is set during the call to bind(). From this point on, the
network version is the same as the earlier example.
The two function calls htons() and ntohs() convert the arguments from host
byte-order to network byte-order and back.
The client process contains similar modifications. For this example only, the
port number returned when the server is started must be entered on the command
line. Listing 5 shows the main function of the client process with the changes
for the network version.



Summary


Sockets provide a generalized platform that discrete processes can use to
share information. The delivery can be reliable (by the use of stream type
sockets), or unreliable (by the use of datagrams). Processes on the same or on
different processors can use sockets to share information. Sockets do not
require processes to share a common ancestor, so they are useful whenever the
processes that need to communicate have no common ancestor and reside on
different processors. Most network services use socket pairs to communicate.

Listing 1 (time_entry. h)
/*
* Definition of event structure & types, socket &
* file names.
*/

#define SOCKET_NAME "/usr/bryang/tmp/time_socket"
#define TIME_FILE "/usr/bryang/ta/time"

typedef struct {
int event_cd; /* Code to specify event; see below */
time_t event_st /* Time stamp of event start time */
char bill_flag; /* Y - time is billable */
char logname [8]; /* Login name of user */
}EVENT_TYPE;

/*
* The following are examples of valid event types.
*/

#define PHONE_IN 1 /* Phone time - in call */
#define PHONE_OUT 2 /* Phone time - out call */
#define PROJECT_CODED 3 /* Project time w/ time code */
#define PROCEST_NSPEC 4 /* Project time w/o time code */
#define OVERHEAD 5 /* General overhead time */
#define MARKETING 6 /* Looking for that next slot */
#define SHUT_DOWN 99 /* Force server process to exit */


Listing 2 (server.c)
/*
* Server process for time accounting system, non-network version.
* This should be run as a daemon or in the background.
*/

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include "time_entry.h"

main()
{

int skt_id, /* Socket descriptor */
msg_sock, /* Socket that will accept messages */
i,j;

EVENT_TYPE event; /* An event from one of the client processes */

struct sockaddr sa; /* Socket address structure */
char buff[80];

FILE *time_file;

/* Open the output file in append mode */
if(!(time_file = fopen(TIME_FILE, "a+"))){
perror("Opening time file");
exit(1);
}

fprintf(time_file, "\n***STARTUP***\n");

/*
* Create a stream type socket in the UNIX domain, with the
* default protocol.
*/
skt_id = socket(AF_UNlX, SOCK_STREAM, 0);
if(skt_id == -1){
/* Error. Cleanup & leave */
perror("Can't create socket");
exit(1);
}

/*
* Now that the socket exists, set up the address structure
* and bind an address to the socket.
*/
sa.sa_family = AF_UNIX;
strcpy(sa.sa_data, SOCKET_NAME);

if(bind(skt_id, &sa, sizeof(sa)) == -1){
perror("Can't bind address");
exit(1);
}

listen(skt_id, 5); /* Set up the queue for requests */

/*
* Main loop is here. Accept a connection, read the message,
* and write the time record...
*/

do{
/*
* Accept a connection. Accept will return a new socket
* descriptor with the same properties as the original. This
* new socket is used for read and write
*/

msg_sock = accept(skt_id, 0, 0);
if(msg_sock == -1){
perror("Can't accept connections");
exit(1);
}
if(read(msg_sock, &event, sizeof(event)) < 0){
perror("Read");
exit(1);
}
if(event.event_cd != SHUT_DOWN){
if (fwrite(&event, sizeof(event),1,time_file) != 1){
perror("Write");

exit(1);
}
}
close(msg_sock); /* Close the socket retruned by accept... */
}while(event.event_cd != SHUT_DOWN);

/* Cleanup */
unlink(SOCKET_NAME);
fprintf(time_file, "\n***SHUTDOWN***\n");
fclose(time_file);
exit(0);
}


Listing 3 (client.c)
/*
* Basic client process for time accounting system, non-network version.
* This process (and those that are linked to it) are run as commands
* and send events to the server process.
*/

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include "time_entry.h"

main(argc, argv)
int argc;
char *argv[];
{

int skt_id, /* Socket descriptor */
msg_sock, /* Socket that will accept messages */
i,j;

EVENT_TYPE event;

struct sockaddr sa; /* Socket address structure */
char buff[80];

/*
* Specify the socket domain and type we want...
*/
skt_id = socket(AF_UNIX, SOCK_STREAM, 0);
if(skt_id == -1){
perror("Can't create socket");
exit(1);
}

/*
* Now set up the address structure and connect to the socket.
*/
sa.sa_family = AF_UNIX;
strcpy(sa.sa_data, SOCKET_NAME);

if(connect(skt_id, &sa, sizeof(sa)) == -1){
perror("Can't connect to socket");

exit(1);
}


build_event(argv[0], &event);

/*
* Now, send the event
*/

if(write(skt_id, &event, sizeof(event)) != sizeof(event)){
perror("Write");
exit(1);
}

/* Cleanup */
close(skt_id);
exit(0);
}

build_event(a, e)
char *a; /* Name process was invoked with */
EVENT_TYPE *e; /* A time entry event */
{

if(!strcmp(a, "pi")) /* A PHONE_IN event */
e->event_cd = PHONE_IN;
else if(!strcmp(a, "po")) /* PHONE_OUT */
e->event_cd = PHONE_OUT;
/*
* Check any other types...
*/

else
e->event_cd = SHUT_DOWN; /* Default is shut down */

e->bill_flag = 'Y'; /* Billable is the default */

/*
* Code here to get the system time (e->event_st), and the
* login name of the user (e->logname). For now, let's just
* hard code something...
*/
e->event_st = 0;
strcpy(e->logname, "bryang");
}


Listing 4 (net_server.c)
/*
* net_server.c
*
* Server process for time accounting system, network version.
* This should be run as a daemon or in the background.
*/

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#include <netdb.h>
#include <stdio.h>
#include "time_entry.h"

main()
{

int_skt_id, /* Socket descriptor */
msg_sock, /* Socket that will accept messages */
length,
i,j;

EVENT_TYPE event; /* An event from one of the client processes */

struct sockaddr_in sa; /* Socket address structure */
char buff[80];
FILE *time_file;

/* Open the output file in append mode */
if(!(time_file = fopen(TIME_FILE, "a+"))){
perror("Opening time file");
exit(1);
}

fprintf(time_file, "\n***STARTUP***\n");

/*
* Create a stream type socket in the INTERNET domain, with the
* default protocol.
*/
skt_id = socket(AF_INET, SOCK_STREAM, 0);
if(skt_id == -1){
/* Error. Cleanup & leave */
perror("Can't create socket");
exit(1);
}

/*
* Now that the socket exists, set up the address structure
* and bind an address to the socket.
*/
sa.sin_family = AF_UNIX;
sa.sin_addr.s_addr = INADDR_ANY; /* Any available address */
sa.sin_port=0; /* Any port */

if(bind(skt_id, &sa, sizeof(sa)) == -1){
perror("Can't bind address");
exit(1);
}

/*
* Find out the port number, since we asked for any available
*/

length=sizeof(sa);
if(getsockname(skt_id, &sa, &length)){
perror("Can't find name");
exit(1);
}

printf("Port: %d\n", ntohs(sa.sin_port));

listen(skt_id, 5); /* Set up the queue for requests */

/*
* Main loop is here. Accept a connection, read the message,
* and write the time record...
*/

do{
/*
* Accept a connection. Accept will return a new socket
* descriptor with the same properties as the original. This
* new socket is used for read and write
*/

msg_sock = accept(skt_id, 0, 0);
if(msg_sock == -1){
perror("Can't accept connections");
exit(1);
}

if(read(msg_sock, &event, sizeof(event)) < 0){
perror("Read");
exit(1);
}
if(event.event_cd != SHUT_DOWN){
if (fwrite(&event, sizeof(event),1,time_file) != 1){
perror("Write");
exit(1);
}
}
close(msg_sock); /* Close the socket retruned by accept... */
}while(event.event_cd != SHUT_DOWN);

/* Cleanup */
unlink(SOCKET_NAME);
fprintf(time_file, "\n***SHUTDOWN***\n");
fclose(time_file);
exit(0);
}


Listing 5 (net_client.c)
/*
* Basic client process for time accounting system, network version.
* This process (and those that are linked to it) are run as commands
* and send events to the server process.
*/

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include "time_entry.h"

main(argc, argv)
int argc;

char *argv[];
{

int skt_id, /* Socket descriptor */
msg_sock, /* Socket that will accept messages */
i,j;

EVENT_TYPE event;

struct sockaddr_in sa; /* Socket address structure */
struct hostent *hp, /* Host entry pointer */
*gethostbyname(); /* Function to get hostinfo by name */
char buff[80];

/*
* Specify the socket domain and type we want...
*/
skt_id = socket(AF_INET, SOCK_STREAM, 0);
if(skt_id == -1){
perror("Can't create socket");
exit(1);

}

/*
* Now set up the address structure and connect to the socket.
*/
sa.sin_family = AF_INET;
hp = gethostbyname("utopia"); /* Running server on machine utopia */
if(!hp){
perror("Can't find utopia");
exit(1);
}

/*
* Copy the host address from the pointer retruned in the
* gethostbyname() call into the address member of the
* socket address structure.
*/

bcopy(hp->h_addr, &sa.sin_addr, hp->h_length);

/*
* For this example, the port number retruned when the server
* is started is enterd on the command line for the client.
* In reality, the port would be assigned or made available
* in some other manner.
*/

sa.sin_port = htons(atoi(argv[1]));

if(connect(skt_id, &sa, sizeof(sa)) == -1){
perror("Can't connect to socket");
exit(1);
}

build_event(argv[0], &event);

/*

* Now, send the event
*/

if(write(skt_id, &event, sizeof(event)) != sizeof(event)){
perror("Write");
exit(1);
}

/* Cleanup */
close(skt_id);
exit(0);
}



















































State Machines In C


Paul Fischer


Paul Fischer holds a B.S.C.S. from the University of Illinois at
Champaign-Urbana. He is currently employed by Rockwell International as a
software engineer. You may contact him at 6111 Pershing Ave., Downers Grove,
IL 60516, (708) 969-2610.


State Machines are used in various applications involving rules, such as data
link protocols, screen management, and device control. In simple applications,
case statements are generally used, but in complex applications state tables
have greater appeal. This article describes state machine implementation using
state tables in C.


Definition


A state machine is a relatively simple entity with a primitive memory and a
set of rules. The memory provides the state machine with the knowledge of what
it is doing right now. The set of rules defines how a state machine reacts to
certain events. In essence, a state machine is a tool that allows programmed
control over known but dynamic conditions.
The following terms are used throughout this article:
Device -- the instrument (either logical or physical) that the state machine
is controlling. Some examples are a telephone line, a robot, and a data link
protocol.
State -- the current activity of the device. For example, a telephone line is
ringing, a robot is off, a data link protocol is XOFFed.
Event -- a stimulus introduced to the device that potentially changes its
state. For example, the telephone line is answered, the robot's power button
is pushed, a timeout occurs for the data link protocol.
Transition -- a passage from one state to another. The direct result of the
receipt of a valid event.
State machines are then defined in terms of valid states, valid events for
each state, and the transition from one state to another for each valid event.
The benefits of a state machine increase with the complexity of the applica-
tion. Over 20 states is complex. For the purposes of this article, however, I
have created a simple example to illustrate the techniques involved.


Example


The following example walks through a state machine for a video cassette
recorder (VCR). It shows the definition of the operation of VCR through a
state transition diagram, and how to transform the diagram into C.
Figure 1 is the state transition diagram for the VCR, which shows the rules
for how the VCR operates. The circles represent the valid states, (states are
prefixed with an S_), and the arcs represent transitions. The event that
caused the transition is a label on the arc, (events are prefixed with an E_).
Consider the VCR in the state S_OFF. Only two events cause a transition to a
new state: the E_POWER event causes the VCR to be placed in the S_POWER state;
the E_TAPE_IN event automatically turns the power on and places the VCR in the
S_READY state. A valid event does not necessarily cause a transition to a new
state. For example, every state except S_OFF accepts an E_CHAN_UP and
E_CHAN_DOWN event so that channel selection can occur. These events cause a
transition to the same state.
Once the operation of the device has been described in terms of states,
events, and transitions, the actual internal operation of the device becomes
important. For example when the VCR moves to the S_PLAY state, some action is
necessary to cause the tape to begin playing, and "PLAY" must be displayed on
the VCR LED display. Another example would be to eject the tape on any
transition to the S_OFF state. Actions such as these are taken during the
transition and are implemented as a function call or list of function calls.
All of these actions are added as the state transition diagram is transformed
into a state table.


State Table -- C Structures


Listing 1 shows the #defines for all valid states and events and the structure
definition of the state table. The four items necessary in the structure to
define the state table are: the state, a valid event for that state, the next
state to transition to for that event, and a list of functions to perform for
the state/event combination. The structure S_TABLE defines each of these. Note
that a fixed number of functions are set for the function list.
Once the structure is in place, a fairly straightforward procedure is required
to transform the diagram in Figure 1 into the state table in Listing 2. For
each state, you list the state, a valid event, the next state, and the set of
functions to perform. The states are listed in numeric order in the state
table, so reading the table is easy. For instance, the S_PLAY state has three
valid events: E_STOP, E_CHAN_UP, and E_CHAN_DOWN. When these events are
received, the functions in the corresponding function lists are executed, and
the device transitions to the next state. Since this code is not going to
actually run inside a VCR, the set of functions is limited -- display the new
state, increment or decrement, and display the channel. If this were a real
application, additional functions that caused playing and recording to begin
would be referenced in the function list also.


Table Driver


Once the table is defined, there must be some code to read the events and
drive the operation of the VCR as specified in the state table. Listing 3
shows the source for the driver, which accepts an event and a standard
argument of type pointer to ARG (ARG is defined in Listing 1). The ARG
structure contains everything the state machine needs to store about itself.
In this case, it is simply the current state of the VCR and a counter for
channel selection. When the driver receives an event, it locates the current
state in the state table, finds the event passed in, sets the next state, and
executes the function list. All functions are passed the same standard
argument ARG. If the event is not found, an error is noted. For the VCR,
invalid events are ignored. Other devices may want to take some other type of
action.
Also included is a function to read events. In general, events are read from
some type of queue, be it hardware or software. Events for this program are
read from standard I/O. Note that the current state and any other data
required is initialized before the event loop begins.
The last task is to write the functions contained in the function lists.
Generally, these are small functions, each of which performs one task. When
you implement a state table this way, the code provides an excellent,
self-contained document for the operation of the device. Listing 4 shows the
three functions referenced in the state table.


Alternative Organizations


The state table can be optimized for execution speed and memory allocation.
You can improve performance by reducing the search times required each time an
event occurs. Through a slight restructuring of the S_TABLE structure, the
state can be used as a direct index into the state table. The cur_state field
then appears only once, regardless of the number of events for that state. (It
really doesn't need to appear at all since the array index can be used.) It is
followed by a variable length event list. Then the driver can directly index
to the current state upon each invocation. Another performance gain can be
realized by placing the events for each state in "most likely to occur" order.
You can add space savings and flexibility by removing the restriction on the
number of functions in a function list. When this list is variable length, you
gain two benefits. You no longer need to store NULL function pointers as
padding, and there is no artificial restriction on the number of functions a
state/event combination may have.
As with most benefits, there are also costs to pay. To maintain readability of
the state table with the previous suggestions, you will most likely have to
process the state table or convert it from data to source. Simple initialized
data cannot be used as in the original organization. Also, a different driver
would have to be written to handle the different organization.



Summary


For the right application, a state table approach can provide considerable
benefits. Its combination of readability and extensibility lets you easily add
features. For complex or multiple instance devices, organizing the source code
as a mirror of the device's actual operation provides a versatile,
straightforward, and efficient means of writing software.
Figure 1

Listing 1
/* States */
#define S_OFF 1
#define S_POWER 2
#define S_READY 3
#define S_PLAY 4
#define S_FAST_F 5
#define S_REWIND 6
#define S_RECORD 7

/* Events */
#define E_POWER 1
#define E_CHAN_UP 2
#define E_CHAN_DOWN 3
#define E_TAPE_IN 4
#define E_TAPE_EJECT 5
#define E_STOP 6
#define E_PLAY 7
#define E_RECORD 8
#define E_FAST_F 9
#define E_REWIND 10
#define E_TIMEOUT 11

/* Miscellaneous */
#define END -1
#define MAX_CHAN 13

/* Argument Structure */
typedef struct {
int cur_state;
int chan;
} ARG;

/* State Table Structure */
typedef struct {
int state;
int event;
int n_state;
int (*flist[5]) ();
} S_TABLE;

Listing 2
extern int disp_state();
extern int inc_channel();
extern int dec_channel();

/* State Table */

static S_TABLE s_table[ ] =
{
/* State Event Next_State F_list */
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
S_OFF, E_POWER, S_POWER, disp_state,0,0,0,0,

S_OFF, E_TAPE_IN, S_READY, disp_state,0,0,0,0,
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
S_POWER, E_POWER, S_OFF, disp_state,0,0,0,0, 
S_POWER, E_CHAN_UP, S_POWER, inc_channel,0,0,0,0,
S_POWER, E_CHAN_DOWN, S_POWER, dec_channel,0,0,0,0,
S_POWER, E_TAPE_IN, S_READY, disp_state,0,0,0,0, 
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
S_READY, E POWER, S_OFF, disp_state,0,0,0,0, 
S_READY, E_CHAN_UP, S_READY, inc_channel,0,0,0,0,
S_READY, E_CHAN_DOWN, S_READY, dec_channel,0,0,0,0,
S_READY, E_TAPE_EJECT, S_POWER, disp_state,0,0,0,0, 
S_READY, E_PLAY, S_PLAY, disp_state,0,0,0,0, 
S_READY, E_RECORD, S_RECORD, disp_state,0,0,0,0, 
S_READY, E_FAST_F, S_FAST_F, disp_state,0,0,0,0, 
S_READY, E_REWIND, S_REWIND, disp_state,0,0,0,0, 
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
S_PLAY, E_STOP, S_READY, disp_state,0,0,0,0, 
S_PLAY, E_CHAN_UP, S_PLAY, inc_channel,0,0,0,0,
S_PLAY, E_CHAN_DOWN, S_PLAY, dec_channel,0,0,0,0,
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
S_FAST_F, E_STOP, S_READY, disp_state,0,0,0,0,
S_FAST_F, E_CHAN_UP, S_PLAY, inc_channel,0,0,0,0,
S_FAST_F, E_CHAN_DOWN, S_PLAY, dec_channel,0,0,0,0,
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
S_REWIND, E_STOP, S_READY, disp_state,0,0,0,0,
S_REWIND, E_CHAN_UP, S_PLAY, inc_channel,0,0,0,0,
S_REWIND, E_CHAN_DOWN, S_PLAY, dec_channel,0,0,0,0,
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
S_RECORD, E_STOP, S_READY, disp_state,0,0,0,0,
S_RECORD, E_CHAN_UP, S_PLAY, inc_channel,0,0,0,0,
S_RECORD, E_CHAN_DOWN, S_PLAY, dec_channel,0,0,0,0,
/*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*/
END, END, END, 0,0,0,0,0,
};

Listing 3
main()
{
int get_event();
ARG arg;

init(&arg);

/* the event loop */
while (1)
{
driver(get_event(), &arg);
}
}

/* Initialize the ARG structure */
int init(arg)
ARG *arg;
{
arg->cur_state = 1;
arg->chan = 1;
}

/* Get an event - here it is from stdio */

int get_event()
{
int ev;
printf("Event: ");
if (scanf("%d", &ev) != 1)
{
printf("\nCompleted\n");
exit(0);
}
return (ev);
}

/* send the event through the state machine */
int driver(ev, arg)
int ev;
ARG *arg;
{
register int curr = arg->cur_state;
register int i,j;
int (*func) ();

/* find the state */

for (i = 0; (curr != s table[i].state
 s_table[i].state == END); ++);
if (s_table[i].state == END)
{
printf("Invalid State: %d\n",curr);
return (-1);
}

/* find the event for this state */

for (; (s_table[i].event != ev
&& s_table[i].state == curr); i++);
if (s_table[i].state != curr)

{
/* uncomment printf if warning desired */
/* printf("Invalid event: %d\n", ev); */
return (-2);
}

/* set the next state */

arg->cur_state = s_table[i].n_state;

/* execute the function list */

for (j = 0; j < MAX_FUNCS; j++)

{
if ((func = *(s_table[i].flist[j])) != O)
(*func) (arg);
}

return (0);
}


Listing 4
/* State names */
static char *state_name[] = {
"Error",
"0FF",
"POWER",
"READY",
"PLAY",
"FAST_F",
"REWIND",
"RECORD",
};

/* Functions in the Function Lists */

int disp_state(arg)
ARG *arg;
{
printf("State is: %s\n", state_name[arg->cur_state]);
}

int inc_channel(arg)
ARG *arg;
{
arg->chan++;
printf("Channel is: %d\n", (arg->chan % MAX_CHAN) + 1);
}

int dec_channel(arg)
ARG *arg;
{
arg->chan--;
if (!arg->chan)
arg->chan = MAX_CHAN;
printf("Channel is: %d\n", (arg->chan % MAX_CHAN) + 1);
}



























Standard C


<errno.h>




P.J. Plauger


P.J. Plauger is senior editor of The C Users Journal. He is secretary of the
ANSI C standards committee, X3J11, and convenor of the ISO C standards
committee, WG14. His latest book is Standard C, which he co-authored with Jim
Brodie. You can reach him at pjp@plauger. uunet.




Introduction


If I had to identify one part of the C Standard that is uniformly disliked, I
would not have to look far. Nobody likes errno or the machinery that it
implies. I can't recall anybody defending this approach to error reporting,
not in two dozen or more meetings of X3J11. Several alternatives were proposed
over the years. At least one faction favored simply discarding errno. Yet it
endures.
More than that, the C Standard has even added to the existing machinery. The
standard header <errno.h> is an invention of the committee. (We wanted to have
every function and data object in the library declared in some standard
header. We gave errno its own standard header mostly to ghettoize it.) We even
added some words in the hope of clarifying a notoriously murky corner of C.
A common topic at meetings of the Numerical C Extensions Group is how to tame
errno. Or how to get rid of it. The fact that no clear answer has sprung forth
to date should tell you something. There are no easy answers when it comes to
reporting and handling errors.


What the Standard Says


The standard header <errno.h> can be one of the easiest parts of the Standard
C library to implement. Here is what the Standard has to say about this
header:


4.1.3 Errors <errno.h>


The header <errno. h> defines several macros, all relating to the reporting of
error conditions.
The macros are
ED0M
ERANGE
which expand to integral constant expressions with distinct nonzero values,
suitable for use in #if preprocessing directives; and
errno
which expands to a modifiable lvalue92 that has type int, the value of which
is set to a positive error number by several library functions. It is
unspecified whether errno is a macro or an identifier declared with external
linkage. If a macro definition is suppressed in order to access an actual
object, or a program defines an identifier with the name errno, the behavior
is undefined.
The value of errno is zero at program startup, but is never set to zero by any
library function.93 The value of errno may be set to nonzero by a library
function call whether or not there is an error, provided the use of errno is
not documented in the description of the function in the Standard.
Additional macro definitions, beginning with E and a digit or E and an
upper-case letter,94 may also be specified by the implementation.
Footnotes:
92. The macro errno need not be the identifier of an object. It might expand
to a modifiable lvalue resulting from a function call (for example, *errno()).
93. Thus, a program that uses errno for error checking should set it to zero
before a library function call, then inspect it before a subsequent library
function call. Of course, a library function can save the value of errno on
entry and then set it to zero, as long as the original value is restored if
errno's value is still zero just before the return.
94. See "future library directions" (4.13.1).
There are a few additional words on errno in the description of <math. h>:


4.5.1 Treatment of Error Conditions


The behavior of each of these functions is defined for all representable
values of its input arguments. Each function shall execute as if it were a
single operation, without generating any externally visible exceptions.
For all functions, a domain error occurs if an input argument is outside the
domain over which the mathematical function is defined. The description of
each function lists any required domain errors; an implementation may define
additional domain errors, provided that such errors are consistent with the
mathematical definition of the function.105 On a domain error, the function
returns an implementation-defined value; the value of the macro EDOM is stored
in errno.
Similarly, a range error occurs if the result of the function cannot be
represented as a double value. If the result overflows (the magnitude of the
result is so large that it cannot be represented in an object of the specified
type), the function returns the value of the macro HUGE_VAL, with the same
sign (except for the tan function) as the correct value of the function; the
value of the macro ERANGE is stored in errno. If the result underflows (the
magnitude of the result is so small that it cannot be represented in an object
of the specified type), the function returns zero; whether the integer
expression errno acquires the value of the macro ERANGE is
implementation-defined.
Footnote:
105. In an implementation that supports infinities, this allows infinity as an
argument to be a domain error if the mathematical domain of the function does
not include infinity.



Implementing <errno.h>


This is not a very demanding specification. You can write the file <errno. h>
simply as:
/* errno.h standard header
*/
#ifndef _ERRNO
#define _ERRNO

#define EDOM 1
#define ERANGE 2

extern int errno;

#endif
In some library file, you must add a definition for the data object:
int errno = 0;
Your only other obligation is to store the values EDOM or ERANGE in errno at
the appropriate places within the math functions. What could be simpler?
Here is a case where the overt implementation is the easiest part of the job.
errno causes trouble in two subtler ways -- sometimes its specification is too
vague and sometimes it is too explicit. To see why takes some explaining. That
is the major freight of this column.


History


Let's begin at the beginning. C was born under UNIX. That operating system set
new standards for clarity and simplicity. The interface between user program
and operating system kernel is particularly clean. You specify a system call
number and a handful of operands. The 40-odd system calls of early UNIX have
doubled in number. That is still on the spare side, compared to systems of
comparable power. The operands are almost always scalars -- integers or
pointers. They are equally spare.
Each implementation of UNIX adopts a simple method for indicating erroneous
system calls. Writing in assembly language, you typically test the carry
indicator in the condition code. If the carry indicator is clear, the system
call was successful. Any answers you requested are returned in machine
registers or in a structure whose address you specify. If the carry indicator
is set, the system call was in error. One of the machine registers contains a
small positive number to indicate the nature of the error.
That scheme is great for assembly language. It is less great for programs you
write in C. You can write a library of C-callable functions, one for each
distinct system call. You'd like each function return value to be the answer
you request when making that particular system call. You can do so, but that
makes it difficult to report errors in a way that is easy to test.
Alternatively, you can have each function return as its value a success or
failure indication. Do that and you have no easy way to get at the answer you
want from a successful system call.
One trick that mostly works is to do a bit of both. For a typical system call,
you can define an error return value that is distinguishable from any valid
answer. A null pointer is an obvious case in point. The value --1 can also be
set aside in many cases, with no serious conflict with valid answers. Each
UNIX system call usually has a return value that indicates that some form of
error has occurred.
What the C-callable functions do not do is report exactly which error
occurred. That strains the trick a bit too much. All you can tell from the
return value is whether an error occurred. You have to look elsewhere to get
details.
The "elsewhere" that early UNIX programmers adopted was a data object with
external linkage. Any system call that fails stores the error code from the
kernel in an int variable called errno. It then returns --1, or some such
silly value, to indicate the error. Most of the time, the program doesn't care
about details. An error is an error is an error. But in those few cases where
the program does care, it knows how to get additional information. It looks in
errno to see the last error code stored there.
Naturally, you'd better look before it's too late. Make another system call
that fails and the error code gets over-written. You must also look at errno
only after a system call that fails. A successful call doesn't clear the value
stored there. It's not a great piece of machinery, but it does work.


Overworked Machinery


The first problem with errno is that it was too handy. People started finding
additional uses for it. It grew from a dirty little trick for augmenting UNIX
system calls to a C institution. And that's when it got overworked.
System calls aren't the only rich source of errors. Another well explored vein
is the portion of the library that computes the common math functions. This
portion is largely declared in the standard header <math. h>.
Some functions yield values too large to represent for certain arguments (such
as exp (1000.0)). Some yield values too small to represent for certain
arguments (such as exp (-1000.0)). Some are simply undefined for certain
argument values (such as sqrt(-1.0)). Some are defined, but of suspect worth
for certain argument values (such as sin (1e30)).
You could introduce one or more error codes for each function that can run
into trouble. Following the naming convention for UNIX error codes, you could
report ESQRT for the square root of a negative number. That is both open-ended
and messy.
Fortunately, math errors fall into just a few categories:
An overflow occurs when a result is too large in magnitude to represent as a
floating point value of the required type.
An underflow occurs when a result is too small in magnitude to represent as a
floating point value of the required type.
A significance loss occurs when a result has nowhere near the number of
significant digits indicated by its type.
A domain error occurs when a result is undefined for a given argument value.
Several different system calls in UNIX can yield the same error codes.
Similarly, several different math functions can yield one or more of these
errors. (The errors can even occur for nearly all of the arithmetic operators,
with floating point operands.) In fact, you can do an adequate job of covering
all the math errors with just two error codes:
EDOM is reported on a domain error.
ERANGE is reported on an overflow or an underflow.
Loss of significance is a chancy error to report. One programmer's notion of a
serious loss may be a matter of utter indifference to another programmer.
Indeed, some very stable algorithms are insensitive to serious loss of
significance in portions of a calculation. Hence, it is arguable whether
significance loss should even be reported by the library.
You can see what's coming. Errors can occur in the math library much as they
can occur on system calls. You need some way to report math library errors. So
why invent yet another mechanism when you've already got one handy? An early,
and natural, evolution of the C library was to report math errors by writing
EDOM and ERANGE in errno. That practice has been blessed by inclusion in the C
Standard.


The Problems


I said earlier that the specification of errno is both too vague and too
explicit. Now I can elaborate.
The vagueness comes from the historical use of errno to register system call
errors. That practice has been implicitly endorsed by the C Standard. (See the
quote from section 4.1.3 above.) Any library function can store nonzero values
in errno. The stores can occur because the function makes one or more system
calls that fail. Or they can occur because some function in the library
chooses to use this reporting channel.

All you can count on is the behavior explicitly called out in the Standard.
Call sqrt (-1.0) and you can be sure that errno contains the value EDOM. Call
fabs (1.0) and all bets are off, believe it or not. No library function will
store a zero in errno. Anything else is fair game.
Footnote 93 of the Standard (shown above) tells you the only safe coding style
for using errno. Set it to zero right before a library function call, then
test it before the next library call. It's rather a noisy channel.
The overspecification mostly affects the math functions. By spelling out when
errno must be set, the Standard interferes with important optimizations. In
partiular, the Standard makes it hard for compilers to use the newest floating
point coprocessors to advantage.
Chips like the Intel 80X87 family and the Motorola 68881 have some pretty
fancy instructions. Some can compute part or all of a math function with
inline code. A smart compiler can dramatically speed up calculations by using
these instructions. If nothing else, the compiler can avoid the function call
and return overhead for a math function.
The problem comes when a mathematical exception occurs. These math
coprocessors run autonomously, and they want to keep moving. They want to
record an error by carrying along a special code, called NaN (for "Not a
Number") or INF (for "infinity"). Later operations preserve these special
codes. You can test at the end of a computation whether anything went wrong
along the way.
At best, these coprocessors record an error in their own condition code. The
main processor has to take pains to copy the coprocessor condition code into
its own to test whether an error occurred. That stops a pipelined coprocessor
in full career. If a C program must set errno on every math exception, it can
run a math coprocessor at only a fraction of its potential speed.
Footnote 92 suggests one trick that can help. The Standard no longer requires
that errno be an actual data object. It is now an rvalue macro. That gives the
implementor considerable latitude. In particular, the errno macro can expand
to an expression such as *_erfun(). Every time the program wants to check for
errors, a function gets called to tell the program where to look.
That has two implications. First, the implementation can be lazy about
recording errors. It can wait until someone tries to peek at errno before it
stores the latest error code. That might give the implementation sufficient
latitude to leave math coprocessors alone most of the time. (The translator
may be hard pressed to exploit this opportunity, however.)
The second implication is that errno can move about. The function can return a
different address every time it is called. That can be a tremendous help in
implementing shared libraries. Static data storage is a real nuisance in a
shared library, as I have mentioned in the past. Static data storage that the
user program can alter is even worse. errno is the only such creature in the
Standard C library. That's why I was one of the strongest proponents of
changing it from a data object to an rvalue macro.


Conclusion


Even as a macro, errno is still an annoying piece of machinery. Any program
can contain the sequence:
y = sqrt(x);
if (errno == EDOM)
.....
The need to support such error tests severely constrains what an
implementation can do with sqrt and its ilk. Since any library function can
alter errno, programmers are also ill served. Here we have a mechanism that
can be hard on the implementor and hard on the user.
Perhaps with a bit more thought, X3J11 could have cleaned up this mess. I
still don't know how. There are too many different floating point formats,
even with the growing popularity of the IEEE 754 standard. And there are too
many different ways that errors get reported in current systems. I find it sad
that we couldn't do better than errno. But I'm not surprised.


Mea Culpa


You'd think I'd know better by now. I have taken others to task for publishing
unchecked code. No matter how carefully you desk check, you're bound to miss
things. There is no substitute for unleashing a compiler and a test program or
two on whatever you write.
Nevertheless, I managed to let some code appear in print before I tested it.
The code I showed for <assert.h> contains a couple of errors. It appeared in
the August 1990 CUJ under the title "With Gun and Reel."
Here is the correct version of <assert.h>:
/* assert.h standard header
* copyright (c) 1991 by P.J. Plauger
*/
#undef assert /* remove existing
definition */
#ifdef NDEBUG
#define assert(test) ((void)0)
#else /* NDEBUG not defined */
int _Assert(char *);
/* macros */
#define _STR (x) _VAL(x)
#define _VAL(x) #x
#define assert(test) (void)((test) _Assert( \
__FILE__ ":"_STR(__LINE_) " " #test))
#endif
One error was in the machinery for turning the macro __LINE__ into a string.
My naive approach produced the string "__LINE__" instead of the current line
number as desired. (Dan Saks tells me, however, that at least one compiler
inadvertently got the right answer.)
The other error lay in declaring _Assert to be a function returning void.
That's an accurate declaration, but inappropriate if the function is to be
called as the right operand of the operator. I could have replaced the
operator with a conditional expression, as in:
#define assert(test) ((test) ?_Assert( \
__FILE__":" _STR(__LINE_) " " #test) \
: (void)0)
I chose, instead, to declare __Assert as a function returning int, even though
it never returns. Here is the (slightly) modified version of the function:
/* _Assert function
* copyright (c) 1991 by P.J. Plauger
*/
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>

/* print assertion message and abort
*/
int _Assert(char *mesg)

{
fputs(mesg, stderr);
fputs(" - assertion failed\n", stderr);
abort();
return (0); /* should never be reached */
}
Sorry about that.
























































Doctor C's Pointers(R)


Puzzles, Part 2




Rex Jaeschke


Rex Jaeschke is an independent computer consultant, author and seminar leader.
He participates in both ANSI and ISO C Standards meetings and is the editor of
The Journal of C Language Translation, a quaterly publication aimed at
implementors of C language translation tools. Readers are encouraged to submit
column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA 22091
or via UUCP at uunet!aussie!rex or aussie!rex@uunet.uu.net.


This month I'll continue the Puzzles series, this time concentrating on arrays
and subscripting. As before, I have included the answers to the problems but
encourage you to try solving them before looking at the answers.


The Puzzles


1. Discuss the declaration double d[1][1][1].
2. Given the declaration int i[2][3][4], convince yourself that all the
statements in Listing 1 are equivalent. They all initialize the last int in
the 3D array to 1.
3. What is the type and value of the expression "abcd" [2] ?
4. Explain what is happening in the expression f()[2] = 'x'. Under what
circumstances might this not work? Can you think of a use for this construct?
5. When an array is passed to a function, the address of its first element is
actually passed. For the array short s[5][3], what is the type of the value
actually passed in f(s)?
6. Given double d[2][4], what is the type of the expression d[1]?
7. What is the order of evaluation in the expression a [i++] [i] [-i]? Assume
that i initially has the value 5.
8. How do you declare an array of 16 1-bit bit-fields?
9. Is an array name a pointer?
10. It is widely believed that an array name is always converted to the
address of the first element. Are there any exceptions?


The Solutions


1. In an array declaration, the size of each dimension must be a compile-time
constant integer expression having a value greater than zero. Unlike most
languages, C permits a dimension of size 1. You could argue that an array of 1
element isn't an array at all. However, looking from the other perspective, a
scalar is simply an array of 1 element.
In any event, d is a 3-dimensional array of 1 element in each dimension, so
the total number of elements is 1x1x1 which is, of course, 1. That is, space
is really allocated for a scalar double object. However, since d was declared
as an array, you must use all three subscripts to access it. That is,
d[0][0][0] refers to the object.
But why would you ever want to use an array of dimension 1? Consider Listing
2, taken from the setjmp.h header for TopSpeed C V1.04.
The type jmp_buf must "be an array type suitable for holding information
needed to restore a calling environment." Since the elements saved do not have
the same type, an array cannot be used directly in this case. Instead, an
array of 1 structure is used.
Granted, few programmers will likely need to do something like this, but
seeing how implementors use the language can give you ideas on applying it
yourself.
2. Well, this is one of those cases where you just have to work it out a step
at a time. The keys to solving this are:
a) the subscript operator [] is commutative. That is, a[i] is equivalent and
interchangeable with i[a]. Both K&R and ANSI C require that one of the
operands be a pointer expression and the other an integer expression. There is
no requirement they be in either order.
b) Any expression of the form a [i] can be rewritten as *(a + i) and vice
versa. This is the fundamental conversion identity in pointer/array
expressions in C.
c) addition is commutative such that a + i is equivalent to i + a.
d) the precedence table shows that [] associates left-to-right.
3. To write predictable code, you must know the type of each expression you
write. When the compiler comes across a string literal, it takes on the job of
storing that string as an unnamed static char array. The array is initialized
with the characters abcd and an extra trailing null character. That is, the
type of the expression "abcd" is array of 5 char.
The expression "abcd" designates an array just as the name of an array does.
When such an expression is used as an operand of [], it is converted to the
address of the first element. Since expressions of the form a [i] can be
rewritten as *(a + i), "abcd"[2] can be rewritten as *("abcd" + 2). This
results in an expression of type char with value 'c'. (Interestingly,
according to the rules stated in Solution 2, "abcd"[2] can also be written as
2["abcd"].)
4. Due to the fact that a [i] is equivalent to *(a + i), you can arbitrarily
subscript any data pointer expression to ne level. So, for f( ) [2] to be
acceptable, f( ) must have type pointer to object type T. Then, f() [2] is
equivalent to *(f( ) + 2) and has type T. In the example given, type T could
be any arithmetic type, although the use of a character constant might imply
that T is char.
Consider Listing 3. This construct would fail at runtime if the pointer
returned pointed to a const object that really was write-protected. A similar
case is if the returned pointer pointed into a string literal that was stored
in a read-only location (as permitted by ANSI C).
If the address returned was that of an automatic object local to function f,
the result would be undefined since that object is not guaranteed to exist
after f returns.
5. If you guessed that &s[0][0] (which has type short *) is passed, you are
not alone since most people guess just that. Unfortunately, that is incorrect.
In C, a multidimensional arrays is considered to be an array of arrays, which
is shown by using separate [] punctuators and operators in multi-dimensional
array declarations and expressions.
Essentially, every array in C is one dimensional. It just so happens that the
elements they contain can be vectors. In any event, the first element in any
array is a[0], regardless of the number of dimensions that array has. As such,
what is passed to f is &s[0]. The type of s[0] is array of 3 short int and s
is array of 5 elements each of which is an array of 3 short int. The type of
the expression &s [0] therefore, is pointer to an array of 3 short int, which
is quite different from a a pointer to short int. (I discussed pointers to
arrays in my CUJ column in May 1990. The function f could be defined in either
of the following ways -- they are equivalent -- and s can be subscripted to
two levels.
void f(short s[][3]) { /* ... */
}
void f(short (*s)[3]) { /* ...
*/ }
6. Based on the discussion in Solution 5, d[1] designates the second element
in the array d. It is the second row of 4 doubles. The type of d[1],
therefore, is array of 4 double. Many people would answer that its type was
pointer to double instead. However, that is not altogether correct.
Expressions that designate arrays are not always converted to pointers (see
Puzzle 10). One example where the conversion does not take place would be
sizeof (d[1]).

7. The [] operator provides no guarantee about the order of evaluation of its
operands. Since there are no sequence points in this expression, the order of
evaluation is unspecified.
8. This is a trick question. You can't declare an array of 16 one-bit
bit-fields. There are cases in which an array of bit-fields would be useful,
but array referencing requires pointers and, therefore, addresses. Very few
machines provide bit addressing, so a pointer to a bit-field would have to be
larger in representation than other pointer types. On machines without native
bit addressing, such emulation would probably be expensive.
[You might try
struct {
int bit : 1;
} a [16];
if wasted space is not an issue. -- Ed.]
9. In many cases, an array name behaves like a const pointer, but it really is
not a pointer. It is a non-modifiable lvalue and one that often is converted
to the address of an object.
10. There are three exceptions to this rule. Consider the following:
int a[10];

sizeof(a)
&a
In the first expression, sizeof determines the size of the whole array, not
the size of a pointer to the first element. In the second expression, a
pointer to the whole array is produced, not a pointer to the address of the
first element. Many older compilers warn (or even reject) constructs like &a
suggesting that the & is superfluous. Under ANSI C rules, it is not.
The third and final case has to do with string literals. For example:
char *pc = "abcd";
char c[] = "abcd";
In the first declaration, the compiler recognizes that a scalar variable is
being initialized. Therefore, it stores the string as a null-terminated array
of char elsewhere, and initializes pc to the address of the start of that
location. That is, the expression "abcd" represents an unnamed array and is
converted to the address of the first element.
On the other hand, c is an array, so the compiler recognizes that "abcd" is
simply shorthand for {'a', 'b', 'c', 'd', '\0'}. It initializes the array with
those characters. Here the expression "abcd" is not treated as an array, so no
conversion to pointer is done. The two initializers are textually identical
but are interpreted differently.

Listing 1
/*1*/ i[1] [2] [3] = 1;
/*2*/ 1[i] [2] [3] = 1;
/*3*/ 2[1[i] ] [3] = 1;
/*4*/ 3[2[1[i] ] ] = 1;
/*5*/ 3[i[1] [2] ] = 1;
/*6*/ *(*(*(i + 1) + 2) + 3) = 1;
/*7*/ *(3 + *(2 + *(1 + i))) = 1;


Listing 2
typedef unsigned char TenByteReal [10];

typedef struct {
unsigned j_sp;
unsigned j_ss;
unsigned j_flag;
unsigned j_cs;
unsigned j_ip;
unsigned j_bp;
unsigned j_di;
unsigned j_es;
unsigned j_si;
unsigned j_ds;
TenByteReal st1, st2;
} jmp_buf[1];


Listing 3
#include <stdio.h>

main()
{
int *f(void);

f()[2] = 'x';
printf("%c %c %c %c\n", f()[0],
f()[1], f()[2], f()[3]);
}


int *f(void) 
{
static int a[] = {'a', 'b', 'c', 'd'};

return &a[0]; /* or simply, return a; */
}

a b x d






















































Questions & Answers


Coding Style And Library Usage




Ken Pugh


Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language
courses for corporations. He is the author of C Language for Programmers and
All On C, and is a member on the ANSI C committee. He also does custom C
programming for communications, graphics, and image databases. His address is
4201 University Dr., Suite 102, Durham, NC 27707. You may fax questions for
Ken to (919) 493-4390. When you hear the answering message, press the * button
on your telephone. Ken also receives email at kpugh@dukemvs.ac.duke.edu
(Internet) or dukeac!kpugh (UUCP).


Q
I'm writing a program using Microsoft C v5.1 and am trying to give the user
the ability to press a key to cancel the printing of a file.
From your March 1990 CUJ Q/A section I got started. Using different variations
I came up with the enclosed program source code (see Listing 1). I'm using the
setjmp() and longjmp() functions to return the desired location when the user
presses a key to cancel the printing.
The problem I'm having is when the longjmp() is performed and I return to the
code following the setjmp(). The local variables are bad and apparently the
return address is bad. It appears that the local variables have not been
restored correctly because I print the addresses (and contents) of the
variables before and after the longjmp() is performed and the segment portion
of the address has changed.
This problem occurs only when I'm reading and writing the file to the printer.
If I perform an infinite loop with no disk or printer I/O being performed, the
longjmp() recovers okay. Also, I've written a critical error handler (24H)
that performs a successful longjmp() when the user chooses to "abort" from a
"printer not ready" condition.
I've tried the different levels of Microsoft I/O functions (stream, low level,
and even _dos functions) with the same result. What is happening in the I/O
functions that causes this to happen, and how can I get around it?
David Schaefers
Oak Brook, IL
A
I can make a guess as to why it doesn't work. A call to a handler for the
critical error (0X24) is a nested function call. The program calls DOS, which
calls the printer routines, which calls the critical error handler, if
necessary. In this instance, the interrupt is only an indirect method for
accessing a function.
The keyboard interrupt (09H) is not a nested call. It occurs asynchronously
and is generated by hardware. The standard states that longjmp should work in
the context of an interrupt and signal, but this appears to apply only to
those interrupts that are handled by the signal function.
As an alternative, you could simply set up an external variable whose value
would be set by the interrupt routine. Suppose it was declared as:
int abort_print;
Your main program would not make any setjmp references. The print loop might
look like:
abort_print = 0;
while(0 < (read_cnt =
read(read_handle, buff, 4096)))
{
write(prn_handle, buff, read_cnt);
if (abort_print)
break;
}
The interrupt function would be modified to look like:
if(kbhit()) /* key in buffer
*/
{
_dos_setvect(0x09, old_int09);
abort_print = 1;
}
You should set your own function for the critical error interrupt, since it is
possible to get a "Printer off-line or out of paper" error. An abort response
to that message should reset the keyboard interrupt function. (KP)
Q
I am delighted to read the April 1990 issue (among a lot of others) about the
short discussion on "indentation and readability" to find that this is also an
issue that people consider.
I am a beginning C programmer and quickly discovered that it is often
difficult to find the matching braces pair. Actually, with whichever ways you
presented in the discussion, it is still difficult when the open and close
braces are far apart, and even worse, when they are on different columns in an
article or even different pages. The attached examples should illustrate my
point. If there are nested switches and while loops, good luck.
Now when I write programs, I would add a short comment next to the close brace
to indicate where its other half is, such as shown in Listing 2.
I don't know whether this will become trivial when programming experience
grows. But for now, I need several color pens when reading longer programs
written by others in order to join the braces pairs.
Another suggestion for improving readability of long program files with lots
of functions: Please arrange the order of the functions in some systematic
way, e.g. alphabetically. Or at least, arrange the function prototypes in
order of their appearance in the program.
Kim Tsang
Hong Kong
A
I fully agree with your method of commenting closing braces. I always follow
it myself for long loops or lots of embedded braces. For the listings in this
magazine, I tend to leave them off because of the narrow column width. I don't
see the listings until they come out in typeset form. For a few of them, I
wish I had comments as you describe.
I confess I am less prone to alphabetize functions in a listing. I tend to
group them together based on related operations. It would be a simple matter
to write a program that created a index for the functions or use a
cross-reference program to yield an index. (KP)
Q
I'm trying to compile a program (Listing 3) using the Turbo C+ + compiler and
I get the following error:

Turbo C++ Version 1.00 Copyright
(c) 1990 Borland International
q_sort. c:

Error q_sort.c 12: Type mismatch
in parameter '_fcmp' in call to
'qsort' in function main

*** 1 errors in Compile ***
This error is removed if the stdlib.h qsort() prototype is changed for the
compare function from const void * to unsigned char **.
Does that mean I have to make changes to stdlib every time I try to compile a
program using qsort that calls a compare function with different parameter
types? What if I have a program with qsort called twice each time with a
different type of compare function. Technical support at Borland had no
answer. Can you help?
Firdaus Irani
Chestnut Hill, MA
A
I've tried out your problem with Microsoft C and it gives a similar warning
message. By switching the prototype to read
int comp(const void *, const void *);
the message went away. Even with the comp function in the same file with
unsigned char **, there was no disagreement between the prototype and the
function header.
I tried switching the parameter in the qsort prototype to read
_fcmp(void *, void *)
but the same warning message appeared. The standard implies that the
parameters should match, but it appears that they do not. This may be because
it is a nested function prototype. (KP)
[Looks like a bug in Turbo C++ to me. (PJP)]


Reader's Replies




Global Declarations


I would like to make some additional comments concerning your reply to Richard
J. Wissbaum's "Array vs. Pointer" question in the August 1990 CUJ. Several
years ago, a programmer I was working with on a large C application made a
mistake identical to that made by Mr. Wissbaum. It took the two of us nearly a
week to track down the mistake, and it has made a lasting impact on my coding
style. I now declare all global variables in a common header shared by all
modules that will use them. Each variable is declared exactly once. Listing 4
shows how I would have coded Mr. Wissbaum's example.
Not only does this approach centralize all global variables but it forces the
definition and declarations to be identical. I have also initialized the
variable "date" to demonstrate the use of the INIT() macro. In cases where the
INIT() macro falls short (such as initializing a structure) I code:
struct S_MyStruct
{
int i;
long 1;
char *p;
};
CLASS struct S_MyStruct MyStruct
#ifdef DRIVER
= {0, 21L, NULL};
#endif
;
I'm not real happy about the dangling semicolon but I have not yet found a
better way to code it.
In your reply to J. A. Jaffe (same issue) you state that a single #define is
preferable to two defines for the same value (one numeric, one quoted). I
contend that multiple declarations of global variables is just as bad, if not
worse.
Bob Withers
Allen, Texas
A
Your point is valid. I tend to prefer using a separate header file for the
extern version of the declaration and including that in the source where the
variables are actually declared and initialized. My real preference is not to
use global variables at all, except for static externals. That eliminates
having to worry about double declarations. (KP)


offsetof Macro


I was browsing your Q/A in The C Users Journal (June '90, p. 71) when I
spotted your mention of the new ANSI macro offsetof. You say that the purpose
of this macro is to "eliminate having to declare a variable simply so that
these offsets can be calculated." I came across a programming trick years ago
that can simulate the offsetof macro with no price to pay in performance or
obfuscation.
The code in Listing 5 shows how anyone can simulate offsetof and use it.
Notice that no variables need to be declared to get the offset of a field in
the record. The preprocessor transforms the offsetof into a number. These
offsets may need to be stored in an array if you want to use them in a loop on
the fields, of course.
This offsetof function relies on a simple and ingenious idea that I cannot
take credit for (I don't know who first thought of it):

The offset of a field in a structure is equal to the address of that field if
the structure begins at address 0. Moreover, by taking 0 and typecasting it to
be a pointer to that structure, you can take the address of an element and get
the offset.
I'm not certain it will run on all compilers, but I have used it on a number
of them including Microsoft's C compiler for DOS. I hope you find this little
trick amusing.
Aninda Roy
Schaumburg, Il
ANSI simply took the equivalent of your definition and made it a standard
macro. (KP)
[This trick doesn't have to work right on all compilers. Some implementations
may do something different. (PJP)]


Basic Numeric Values


Thank you very much for your response to my question about UNIX and MS-DOS
file I/O.
The CUG library should now have my source code for packing and unpacking BASIC
numeric values. I've been doing this for years. Reading the data is fine, but
updating the information can also prove very valuable. I work with a system on
which I must maintain data with both C and BASIC programs.
The code includes functions to perform BASIC crd(), crs(), cvi(), mkd(),
mks(), and mki() functions. I have used these on both MS-DOS and XENIX 3.4b.
Vern Martin
Alliance, Ohio


Reader's Requests


As a reader of The C Users Journal, I thoroughly enjoy your column, and find
your answers quite useful. On the job I have programmed in C on a variety of
systems. But at home, currently the only computer I have is a rather obscure
system made by Yamaha (the music company, not the motorcycle) that is based on
MSX style of computer (a hardware standard attempted by Microsoft, I believe,
that caught on in Europe, but bombed here). It's a CX5M computer. My question:
do you know where I could find a C compiler for this system? Since it is
designed around a standard system (MSX), I assume someone, somewhere may have
written one, but I am unable to find it. I contacted Carolyn Engineering in
Springfield, VA. who still sells the CX5M and CX5MII computers, and no-one
there knew where I could find one.
I appreciate any assistance you can lend and appreciate your taking the time
to respond.
Jay Orr
Vienna, IL
Anyone know about this one? (KP)


.EXE file segments


I am trying to store the configuration data for an .EXE within the .EXE
itself. I've allocated static data space in my program for the config data and
have also placed a unique key within that structure. When the .EXE is loaded
into memory it begins by opening and reading itself in binary mode searching
the location in the data segment that holds the unique key. Once found it can
read and write to that data area. My problem comes into play when I try to
index into the .EXE to the beginning of the data segment (in order to cut down
search time when seeking where the static structure is located). Starting from
the beginning of the .EXE is far too slow. Does the header information in an
.EXE give you the ability to index directly to the beginning of the data
segment and if so how?
Ken Yerves
Jacksonville, FL
Unless you included the symbol table in the load file (via debugging options),
there does not appear to be anything in the .EXE file that can directly help
you. You can get relocation information, and even where the load module starts
in the file, but there does not appear to be any information on the beginning
of the data segment. Perhaps one of our readers has other ideas. (KP)

Listing 1
/*-------------------------------------------------------------*/
/* Purpose: */
/* Enable any key to be pressed while reading and printing */
/* a text file to cancel the print. Uses setjmp() and */
/* longjmp() */
/* Compiled: cl /c /AL /W3 filename (Microsoft Cv5.1) */
/*-------------------------------------------------------------*/
#pragma check_stack (off) /* disable stack checking */

#include <stdlib.h>
#include <stdio.h>
#include <malloc.h>
#include <setjmp.h>
#include <io.h>
#include <dos.h>
#include <conio.h>
#include <conio.h>
#include <fcntl.h>
#include <sys\types.h>
#include <sys\stat.h>

static jmp_buf mark;

static void (interrupt far *old_int09) (void);
static void (interrupt far *keyb_rtn_ptr) (void);

static int busy=0;
/* function prototypes */
static void interrupt far new_int09(void);
void main(void);

/*-------------------------------------------------------------*/
/* main()
/* Purpose: */
/* Read and print file indefinitety to test the print cancel.*/
/*-------------------------------------------------------------*/

void main()
{
int read_handle;
int prn_handle;
unsigned read_cnt;
char *buff;

read_handle = open("TEST.DAT", O_BINARY O_RDWR, 0);
prn_handle = fileno(stdprn); /* get file handle
for stdprn */
buff = (char *) malloc(4096);

if(setjmp(mark) != 0) /* return here on key
 press */
{
printf("Cancelled print!\n");
return; /* print cancelled */
}
old_int09 = _dos_getvect(0x09); /* save old keyboard
 handler */
_dos_setvect(0x09, new_int09); /* new keyboard
 handler */

while(1) /* read forever */
{
read_cnt = read(read_handle, buff, 4096);
write(prn_handle, buff, read_cnt);
lseek(read_handle, OL, SEEK_SET);
}
}

/*------------------------------------------------------------*/
/* new_int09() */
/* Purpose: */
/* Handles interrupt 09h interrupts. */
/*------------------------------------------------------------*/
static void interrupt far new_int09()
{
(*old_int09)(); /* call former interrupt handler */
_disable(); /* disable hardware interrupts */
/* while testing and setting the */
/* busy flag */

if(busy) /* prevent recursive invocations */
return;

busy = 1;

_enable(); /* reenable hardware interrupts - */
/* it's safe */

if(kbhit()) /* key in buffer */
{
longjmp(mark, -1); /* cancel the printing */
}

busy = 0;
}


Listing 2
int do_something (. . . .)

{

.....
.....

} /* for j */

} /* switch key */

call_a_function ();

} /* for i */

} /* while TRUE */

} /* do_something */


Listing 3
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int comp(unsigned char **, unsigned char **);
unsigned char *list[] = ( "cat", "car", "cab", "cap", "can" };

main()
{
int x;

qsort(list, 5, sizeof(unsigned char *), comp);
for (x = 0; x < 5; x++)
printf("%s\n", list[x]);
return 0;
}

int comp(unsigned char **a, unsigned char **b)
{
return strcmp(*a, *b);
}



Listing 4
------------------------ globals.h -----------------------

#ifndef GLOBALS_H
#define GLOBALS_H

#ifdef DRIVER
#define CLASS
#define INIT (x) = x
#else
#define CLASS extern
#define INIT (x)
#endif

CLASS char date[9] INIT("01-01090");

#endif /* GLOBALS_H */

---------------------- file1.c ---------------------------

#define DRIVER
#include "globals.h"

main()
{
strcpy(date, "01-01-90");
foo();
puts(date);
}

---------------------- file2.c ---------------------------

#include "globals.h"

foo()
{
puts(date);
strcpy(date, "02-02-90");
puts(date);
}
----------------------------------------------------------


Listing 5
#define offsetof(type, member) (int) &((type *) 0)->member

struct RECORD {
firstname[MAX FNAME];
lastname[MAX_LNAME];
.
.
};

main( )
{
printf("The offset of lastname is %d\n",
offsetof(struct RECORD, lastname));
}

































































On The Networks


It Was Summer Vacation




Sydney S. Weinstein


Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and president
of Datacomp Systems, Inc., a consulting and contract programming firm
specializing in databases, data presentation and windowing, transaction
processing, networking, testing and test suites and device management for UNIX
and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron
Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the
Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do
Internet addressing).


Although it's almost the New Year Holidays by the time you read this, it's
obviously summer vacation on the networks right now. Things have been
extremely quiet. So much so that cries of, "Where is Rich Salz?" have started
to surface once again. I suspect he is also on vacation. For those who are
new, Rich Salz is the moderator (editor) of the USENET network news group
comp.sources.unix. Since my last column in October, there has only been one
item posted to that news group, and it was just the end of the last set of
postings. Some people have recently discussed whether it's time for a new
moderator for that group. Perhaps when school restarts, there will be more
useful items to post and Rich will have time to work on his queue. (Not that
Rich is in school.) Until then, here is the one new item:
Ever have a problem with pointers overwriting the end of a structure? Hard to
find, aren't they? Then you need Volume 22, Issue 112 and 113, debug_malloc,
by Conor P. Cahill <virtech!cpcahil>. It is a collection of routines that
replace the malloc(3), memory(3), string(3), and bstring(3) library functions.
They fill all mollaced areas with non-null bytes. free does validity checks on
the argument, the memory block being freed, and memory beyond the end of the
block. Thus it can check for problems occurring by writing before or past the
memory block. The string and memory routines also check their parameters for
overwriting beyond the end also. The action to take on error is selectable at
runtime.
There was much new in comp.sources.misc, as Brandon Allbery, its moderator,
has been posting things regularly. One of the more interesting benefits of
attending a USENIX conference is having your "face saved." A database of
scanned images of peoples' faces is kept on uunet from this collection and
from others submitted for the database. At the conference, your face image is
printed on labels to use on business cards. But it also has other uses outside
of the conference. One of these is the faces program. Version 1.4 was recently
contributed by Rich Burridge <richb@Aus. Sun. COM> for Volume 13, Issues
70-76, with a patch in Issue 108. The faces program is used to visually
monitor a list. Typically, this is a list of incoming mail messages, jobs in
the print queue, or users on a system. It displays the picture of the person
identified in the item. So instead of seeing, "You have mail from Syd," you
would see a picture of me along with my name at the bottom showing that you
have mail. In the past, the face image, a small bitmap, had to be stored
locally on the machine, which meant that faces had to be uploaded from uunet
periodically. This new version supports placing the face image in compressed
format on the mail header line X-Face: (plus continuation lines). This allows
for displaying images that are not currently in the database. It also has an
option to update the local faces database with this new image. Rich is looking
for volunteers to further develop the faces program and has also included a
to.do file.
FM, a binary file editor, has been through a major rewrite to increase its
portability. This new version, 2.0, was contributed by D. Jason Penney
<servio!penneyj> for Volume 13, Issue 77. It now supports BSD derived
machines, as well as System V type and MS-DOS PCs. It uses curses to display
the contents of the file in pages of ASCII and hex. The file can be edited in
either mode, and the changes written back. It also supports limited simple
string searching.
For those using C++, genman 2.0 from Bob Mastors <epochsys!bby> will produce a
UNIX format manual page for a C++ class from its include file. Output is
either troff man type input data or plain ASCII text. It is in Volume 13,
Issue 78.
For those that cannot follow the output of the process status command of UNIX,
in Volume 13, Issue 98, Arndt Jonasson <arndt@zyx. ZYX. SE> has published tps,
a post processor to ps output. It runs ps and converts its output into a tree
format. This highlights the parent-child process hierarchy.
The June 1990 issue of the Communications of the ACM presents an alternative
to balanced trees. "Skip lists" are efficient and easy to deal with. The
author of the article, Bill Pugh <pugh@cs.umd.edu> has contributed a C
implementation of skip lists for Volume 13, Issue 107.
A long-awaited program has been 1j2ps, a converter of Laserjet PCL4 output
into PostScript. This version, 1.2, is able to convert only a subset of PCL4
that includes page motion, page setup, primary font selection, and text. It
cannot do macros, position stack commands, graphics, or downloadable fonts.
However, the scanner does recognize all the PCL4 commands and has hooks for
all the missing features. Chris Lishka <lishka@uwslh.slh.wisc.edu> contributed
it as Volume 13, Issues 109-121. The author is happy to hear from others
wishing to add more of the missing features.
U386MON, also has gone through a major update. Warren Tucker <n4hgf!wht>
contributed Version 2.20. U386MON is a performance monitor for SVR3 type
UNIXs. It reads information out of the in-core kernel tables via the kmem
utility objects. It can show the process table, memory utilization, and system
call information. CPU utilization is shown with a smoothing of one, five, and
10 seconds. Color and 43 line screens are used if available. Xenix and BSD
type systems use a very different kernel, so this monitor will not work on
those. It does work for SCO and ISC UNIX, however, as well as many 68K UNIX's
and Tandem's Integrity S2. It can be a useful tool for tuning the system, and
for determining why a mix of processes is slow. U386MON is Volume 14, Issues
54-59.
An enhanced make-like utility, dmake version 3.5, has been released by Dennis
Vadura <dvarudra@watdragon.waterloo.edu> in Volume 14, Issues 11-31, patch 1
Volume 14, Issues 46-49. dmake is different from other makes because it
supports significantly enhanced macro facilities, enhanced inference
algorithms, file system traversal both during inference and during the make,
parallel makes on those architectures that support it, attributed targets, and
text diversions. It is portable to both UNIX and DOS and includes support for
command.com and the MKS korn shell under DOS.
PAC has been officially released. David Fiedler reviewed the release in the
December 1989 issue in The C Users Journal, as a submission in alt.sources.
The official release is now in comp.sources.misc as Volume 14, Issues 39-43.


Several Favorites Are Updated


Since games from the net keep changing, they are kept fresh and challenging.
Several favorites have gone through upgrades this time in comp.sources.games.
NetHack3, the display-oriented dungeons and dragons game, was updated to patch
8. Support for more ports has been added, including updates for MS-DOS, Amiga,
and the Mac. This is mostly a bug fix release. Patch 8 is Volume 10, Issues
19-42 from Izchak Miller <izchak@linc.cis.upenn.edu>. However, it was quickly
followed by a complete post of the source at patch level 9. This was in 56
parts, no less. It made up Volume 10, Issues 46-101. It appears that nothing
is really new in patch 9 except for some bug fixes to patch 8, but for those
who always ask, "Where is the original?" here is a complete posting. At 56
parts, it is over 3Mb.
The other update was to one of the more popular multi-user games, Galactic
Bloodshead, which was reviewed in the last column. This patch takes the server
to version 2.2.2 and the client to 1.3.1. It is patch 2, but don't worry.
Patch 1 was already in the full posting reviewed last time. Garrett Warren van
C1eef <vancleef@Pooh.caltech.edu> submitted gb3/patch2 as Volume 11, Issues
39-47.
James Aspnes <asp@cs.cmu.edu> contributed Tinymud2 and Tinyclients. This is a
user extensible multi-user adventure game. The server, or main program, is
Tinymud2. The clients are one for emacs, one for curses under UNIX, and one
for VMS. It does require BSD sockets, so MS-DOS users are probably out of
luck. However, it allows several players dispersed on machines throughout the
network. Tinymud2 is Volume 11, Issues 5-14. The clients, Tinyclients, are
Volume 11, Issues 15-17.
Craps lovers have a new full-screen, casino-style craps game based on curses
to tinker with. Contributed by Robert Steven Glickstein
<bobg+@andrew.-cmu.edu>, it is a full implementation of professional
casino-style craps. It is a one dollar table and supports both single and
double free odds. It provides online help for both betting and for play.
Vcraps is Volume 11, Issues 34 and 35, with patches in Issues 36-38.


Previews From alt.sources


For those doing network-wide backups, a program to speed up writes to tapes on
remote systems was posted by Lee McLoughlin <lmjm@doc.ic.ac.uk> on July 19,
1990. It uses a shared-memory buffer to block the input and a second process
to write the blocked data to the tape drive. It reads from standard-input and
writes to standard-output.
Those using multiple terminal types on a system might benefit from David J.
MacKenzie's <djm@eng.umd.edu> query terminal type program. It sends a couple
of short strings to the terminal and tries to determine the terminal type from
the response. It uses the ANSI response codes and answerback to try and
determine the terminal type, allowing for a default if the terminal is not
recognized.
Another vi clone, elvis version 1.3, was posted by Steve Kirkendall
<kirkenda@eecs.cs.pdx.edu>. Like vi, elvis edits the file using temp files to
allow for editing a file larger than available RAM. It supports most of the vi
commands both in vi and ex mode. elvis has been ported to BSD, System V,
Minix, MS-DOS, and Atari TOS. It was posted in six parts on August 24, 1990.
In a return to a program I have not seen since my days using BASIC, Mark
Baranowski has posted a set of routines to allow arithmetic on arbitrarily
long integers (bignums). He also provides routines for converting ASCII to
bignums and bignums to ASCII as well as comparing two bignums. It was posted
on August 21, 1990.


















C-scape And Look & Feel


Ian Ashdown


Ian Ashdown is a consulting software engineer specializing in C/C++
programming and embedded systems design. He has a B.App.Sc. in electrical
engineering from the University of British Columbia. He may be contacted at
byHeart Software, 620 Ballantree Road, West Vancouver, B.C., Canada V7S 1W3,
Tel. (604) 922-6148 / Fax. (604) 987-5216.


C-scape and Look & Feel from Oakland Group, Inc., are two complementary screen
management and design packages that allow you to create sophisticated and
powerful character-based user interfaces for MS-DOS, OS/2, UNIX, and VMS
machines. C-scape, the screen management package, offers an object-oriented
library of more than 300 carefully crafted and documented C functions. Look &
Feel, a companion screen design program and code generator, allows you to
interactively design your screens and then automatically produce C source code
for them.


C-scape


C-scape supports the full range of user interface features that you expect
from professional application programs, including tiled, layered, bordered,
shadowed, popup and exploding windows, scrolling lists, and popup and pulldown
menus. Many sophisticated data entry and validation capabilities are
available, as are hypertext help screens, built-in text editing, mouse
control, and PCX color graphics support. The package provides fully documented
C source code for all library and support functions.
Creating screens with C-scape is greatly simplified because it adheres to the
object-oriented design paradigm. Windows are modeled as objects
(dynamically-allocated data structures). You can modify the runtime appearance
or behavior of a window or one of its fields by setting (binding) a function
pointer in its data structure or passing it a pointer to another data
structure.
You create individual windows in two phases. First, create a menu object, then
repeatedly call an enhanced printf function to lay out the text and fields of
your window. C-scape's menu_Printf function is a superset of printf, designed
for sending its output to a color video display rather than a teletype
printer. C programmers will feel right at home here.
In the second phase, you create a screen editor (or sed) object that
determines how the window will appear and behave on the display. Specifying
its attributes -- display colors, border type, shadow effects, resizing, mouse
control, and so on -- requires no more than calling a C-scape library function
to set a function or data pointer in the sed data structure. For example, you
can specify that the window border display a title, prompt string, and text
scroll bar, and that the window can be resized and moved with a mouse, all
with the function call sed_SetBorder(sed, bd_mouse) (where sed is a handle for
the sed object and bd_mouse is a pointer to a data structure).
Managing the display -- recognizing the video adapter and mode in use,
producing fonts for graphic modes, formatting and positioning the windows,
responding to keyboard and mouse events, and so forth -- is handled by an
underlying window manager (the Oakland Windowing Library). Although the
package provides complete source code for this manager, you rarely have to
consider its operation. Once you have defined your windows, you need only
paint them on the screen and activate them. C-scape handles the complex
details of their presentation and interaction reliably and transparently.
A varied selection of useful and powerful library functions handles data entry
and validation for fields. Each set of functions is encapsulated in an array
of function pointers that is passed to menu_Printf when defining the field in
the menu object. You have an opportunity to write a function that performs
extra formatting and validation on the field data and bind it to the sed
object. Alternatively, for those fields with specific or unique formatting and
validation requirements, you can duplicate the field function array by
substituting one or more pointers to your own functions.
C-scape's concept of fields extends beyond text-based data entry and
validation to include the basic object, or bob. The various types of
displayable objects -- windows, scrolling lists, menus, and so forth -- can be
bobbed and then embedded in another window. This is a form of encapsulation in
that the parent window does not know or care about the details of the bobbed
object. The parent considers this object (say a window-based text editor) as
just another data entry field.
Speaking of text editors, C-scape provides a family of text editing functions,
so you can build anything from a popup note pad to a basic but complete text
editor within a window. You can choose the default editor or modify the source
code to make it perform exactly the way you want.
C-scape offers more features than I can review in this article. These features
include higher-level functions for creating various types of menus and popup
message boxes, a context-sensitive help system that can be added to any
application, the ability to save and load window definitions from disk, and
full access to the video display hardware and keyboard. All in all, C-scape
occupies more than 1.6 megabytes of C source code spread over some 450 files,
and comes with nearly 800 pages of documentation in two soft-cover manuals. It
is a serious and fully professional product.


User Comments


I have been using C-scape daily for the past six months to create a
medium-sized commercial application (35,000 lines of C source code), and I
remain very impressed with the product, with only a few minor complaints. Its
object-oriented design makes it easy to learn and very easy to use. The source
code is well commented and cleanly designed, and the documentation is
outstanding for its clarity and completeness. I have found only five bugs,
each of which I managed to fix in an hour or so of sleuthing through the
source code.
C-scape does tend to produce large executables. A minimum application (the
ubiquitous "Hello, World!") that can run on any MS-DOS video adapter requires
approximately 100K of code. On the positive side, the execution speed is good
to excellent and is acceptable even on a decrepit 4.77MHz PC. (Two exceptions
are the text editor search and replace functions -- they can be painfully
slow. You can watch the cursor move as they chug through the on-screen text,
even on a 16MHz AT-class machine.)
C-scape transparently supports all common MS-DOS video adapters with the
exception of Super-VGA and higher resolution boards; however, support for the
Hercules board in graphics mode has an unusual quirk: a 90-character wide
display. Developing an application on an 80-character wide EGA display and
then seeing it squashed on the left-hand side of a Hercules display is
somewhat disconcerting. Unfortunately, C-scape provides no means of centering
the entire screen.
Debugging C-scape applications will require a mental shift of gears for C
programmers who are used to procedural code designs. The operation of the
window manager can be extremely complicated, with a lot of activity occurring
that you wouldn't normally be concerned with. Rather than stepping through
code execution line by line, you should set breakpoints on the functions of
interest, and then wait for them to be executed.
The C-scape advertisements prominently proclaim, "Graphics: combine
high-resolution color graphics with text or menus". While this is true, it is
somewhat misleading. Although C-scape applications can run in both text and
graphic modes, C-scape is not a graphical user interface. Your screen design
options are strictly limited to the IBM Extended ASCII character set under
MS-DOS, or the native character set for other environments. It provides
limited support for the display of color PCX-format graphics -- you can load
an image into a window and then move and resize the window as required, but
you can't resize the image. Moreover, in contrast to the printed C-scape
manuals, the only documentation for this feature is a few poorly-written
paragraphs in a READ.ME file. [Vendor claims that version 3.2 will contain
more graphics documentation -- Ed.]
C-scape coexists with third-party graphic library routines, although it cannot
normally interact with them. I have successfully used the Microsoft v5.1 C
compiler, INGRAF, and C Source's GFX graphics libraries with C-scape to draw
patterns and images inside C-scape windows. However, what is sorely missing is
a set of C-scape functions that grab whatever image is on the screen inside a
window and move it when you move the window with a mouse. These functions
should also allow the window and enclosed image to be hidden and redisplayed
without having to separately erase and redraw the image. [The vendor also
claims extended mouse support in v3.2 -- Ed.]


Look & Feel


Designing windows for the C-scape screen manager by hand is possible but very
tedious. With Look & Feel, screen design is actually an enjoyable experience.
All of the low-level coding and interface details are taken care, leaving you
free to concentrate on the aesthetic and ergonomic aspects of your design.
Look & Feel supports all of the window design features C-scape offers. It's a
true WYSIWYG editor, so that the windows you design appear exactly as they
will in your finished application. Its own built-in screen editor allows you
to move, cut, paste, and copy blocks of text and data fields as you design.
Multiple windows can be edited at once, and online, context-sensitive help
offers detailed information on every feature. Finally, you can use a built-in
text editor to perform such housekeeping chores as writing notes and editing
associated C source code.
Using Look & Feel, you can simulate the operations of windows and menus as you
design them, without generating and compiling C source code. The simulation
calls the same data entry, formatting, and validation functions that your
finished application program will use, so there are no questions or doubts
about its accuracy or runtime behavior.
One of C-scape's more useful features is the ability to write custom data
entry, formatting, and validation functions for individual fields. Look & Feel
supports this ability by allowing you to easily modify and recompile part of
its user interface such that your custom functions can be added to its menu of
more than 30 standard field functions. You can also add your own custom screen
border functions, mouse handlers and keyboard handler functions. While the
Look & Feel package does not provide source code for the entire Look & Feel
program (although it is available from Oakland Group), source code access to
its user interface should be more than satisfactory for most users.
After completing a screen design, you can instruct Look & Feel to generate C
source code for it or save the screen to disk as an ASCII screen description
file. Your application program can load this file at runtime, with the screen
being displayed just as if it were generated directly from executable code.
One advantage of this feature is that the screens can be rewritten -- perhaps
with the text in a different language -- without having to recompile your
program. Another is that it reduces the size of the executable image, albeit
at the expense of brief pauses at runtime while the files are being loaded.
Another feature offered by Look & Feel is the ability to load screens from
ASCII files or files generated with Dan Bricklin's Demo Program. While there
should be little need to create screens outside of Look & Feel's environment,
it's still nice to have the capability available.
As does C-scape, Look & Feel allows you to link one screen to another for such
purposes as embedding a text editor in a data entry screen or creating a menu
bar with pulldown submenus. You can also attach user-defined functions to
windows or individual fields for specific data manipulation requirements.
Look & Feel is very much an adjunct of the C-scape screen management package.
Its documentation is brief, consisting of one well-written, 120-page
soft-cover manual. This is more than adequate, however, since much of what it
refers to is covered in the C-scape manuals.


User Comments


As with C-scape, I am very impressed with Look & Feel. As a design tool, it
serves its intended purpose, which is managing the complexities of designing
screens for C-scape. The user interface (which appears to have been developed
using C-scape) is fast, visually and ergonomically pleasing, and very
intuitive. It can be used with or without a mouse.
I am less impressed with several serious bugs that should never have gotten
beyond the alpha test stage. You can crash the program by making the window
being designed as large as the screen display. You can lock up the program by
testing screens containing fields with incorrect data validation parameters.
The program occasionally locks up when writing screen descriptions or their C
source code to disk. When you have just spent the last two hours designing a
complex screen and are trying to save your work to disk, such glitches are
anything but amusing.
The C source code Look & Feel produces is commented and easy to follow,
although it does suffer from a few cosmetic defects. For example, the code
produced for accessing embedded (bobbed) screens is rather inefficient
compared to the examples offered in the C-scape manuals. On the other hand, I
have never found an instance of incorrect code being produced.
Look & Feel can also generate C code for stand-alone applications from your
screen designs, essentially by creating a very small shell program that
initializes the screen manager and then calls the screen code. This is useful
when you want to test the screen with some associated data manipulation
functions before attempting to integrate it into your application program.

To become proficient with the C-scape screen management package you must
invest considerable time and effort. Look & Feel enables you to become
productive almost immediately, while helping you learn the C-scape function
library by examining the source code it produces.


Summary


C-scape and Look & Feel comprise an excellent screen management and design
package for the professional programmer. The suggested list price of $499 for
the MS-DOS and OS/2 versions (UNIX versions start at $999) places them beyond
the means of most hobbyists. However, for the programmer designing user
interfaces for serious applications (commercial or otherwise), I feel they are
worth every penny.
C-scape, Look & Feel
Oakland Group, Inc.
675 Massachusettes Avenue
Cambridge, MA 02139-3309
(800) 233-3733



















































Numerical Recipes In C


Dwayne Phillips


Dwayne Phillips works as a computer and electronics engineer with the United
States Department of Defense. He has a Ph.D. in electrical and computer
engineering at Louisiana State University. His interests include computer
vision, artificial intelligence, software engineering, and programming
languages. 


Numerical Recipes in C is a text on numerical methods that has been revised
for C programmers. Previous versions of this book were written for FORTRAN and
Pascal audiences. This version contains much of the same text and material
with the source code now written in C.
The authors call it a cookbook of numerical methods because it "reveals the
individual ingredients and explains how they are prepared and combined." The
authors' purpose for writing the book is "to open up a large number of black
boxes to your scrutiny. We want to teach you to take apart these black boxes
and put them back together again, modifying them to meet your needs." The
authors succeed for the most part.
Numerical Recipes in C is a college textbook on numerical methods with one
striking difference. That difference is source code. All of the numerical
methods discussed are implemented completely in hundreds and hundreds of lines
of C source code. It is this source code that makes the book so useful to the
professional (and student alike). The source code probably rules out the book
as a college text because the students would have the answers done for them.
To the busy professional, however, having the source code to complex numerical
routines is essential.
Numerical Recipes in C covers the usual topics for a numerical methods book
and a few unusual ones. Figure 1 shows the chapter titles. The unusual topics
include Sorting, Fourier Transform Spectral Methods, and Statistical
Description of Data. These topics are more common to books on data structures,
digital signal processing, and statistics. They are welcome additions here --
especially the source code.
The book is easy to read. The authors use a conversational style rather than
the dry style usually found in academic works. They give practical tips for
using the functions in the book. For example, when solving N equations in N
unknowns, they tell you that when using 32-bit floating point representation N
can go up to 20 or 50. You should use 64 bits for N in the hundreds.
The authors provide several aids to help you use their text and source code.
The first is a list of subroutine names with the chapter and section in which
they are found. Next, they give a table of program dependencies. It tells you
whether the routine you want to use needs any other numerical routines. This
table shows that the vast majority of routines in the book are independent,
i.e. they do not need a set of other functions for them to work. The authors
also give an appendix of utility functions. These are short functions that
almost all of the numerical functions use.
The publisher also provides several aids. The first is the complete source
code on the Numerical Recipes C Diskette ($29.95 for PCs, also available for
the Macintosh). This diskette is essential if you are using the functions
listed in the book. The next aid is the 200-page book, Numerical Recipes
Example Book (C) ($19.95). It contains source programs to call all of the
functions in Numerical Recipes in C. The publisher also provides input data to
demonstrate the functions. This source can be purchased on the Numerical
Recipes Example Diskette (C) ($24.95). If you're going to buy the book, I
suggest that you buy the disk, too.
The total package comes to $120. This is not cheap, but then what is knowledge
worth in dollars? I think the disk sets are too expensive. Given that
Numerical Recipes in C costs $45, the publisher should distribute the disk for
a nominal charge.
The book is excellent as a quick reference. As a test, I decided to try
solving N equations in N unknowns. The table of contents led me to the chapter
on "Solution of Linear Algebraic Equations." I thumbed through a few pages
until I encountered the routine for Gauss-Jordan elimination. A quick look at
the text preceding the routine confirmed that this was the correct place. It
took me an hour to type in the routine, which is shown in Listing 1. This
should give you an idea of the style and readability of the code in the book.
It took another few minutes to make up an example problem and a three-line
calling routine. Except for typing in the source code, the process took about
15 minutes.
Who should buy this book? Do not buy it if you don't know the difference
between integration and differentiation. Numerical methods is a deep
mathematical subject that is boring and useless to the general public. If,
however, you're a working programmer and need to crunch numbers from time to
time, then this is a good reference book for you. If you are a college student
majoring in engineering, physics, or math, this book will help you through
some tough situations when you don't have time to reinvent the wheel. Finally,
if you teach a college or high school course in which you crunch numbers in C,
then you also can use Numerical Recipes in C.
Figure 1
. Solution of Linear Algebraic Equations
. Interpolation and Extrapolation
. Integration of Functions
. Evaluation of Functions
. Special Functions
. Random Numbers
. Sorting
. Root Finding and Nonlinear Sets of Equations
. Minimization or Maximization of Functions
. Eigensystems
. Fourier Transform and Spectral Methods
. Statistical Description of Data
. Modeling of Data
. Integration of Ordinary Differential Equations
. Two Point Boundary Value Problems
. Partial Differential Equations
Numerical Recipes in C -- The Art of Scientific Computing
William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling.
Cambridge University Press, 1988
$44.50, 735 pages
ISBN O-521-35465-X book

Listing 1
#include "a:\include\stdio.h"
#include "a:\include\io.h"
#include "a:\include\fcntl.h"
#include "a:\include\sys\stat.h"
#include "a:\include\math.h"
#include "a:\include\float.h"
#include "a:\include\malloc.h"

#define SWAP(a, b) {float temp=(a); (a)=(b); (b)=temp;}


void gaussj(a, n, b, m)
float **a, **b;
int n,m;


/* Linear equation solution by Guass-Jordan elimination, equation (2.1.1)
above. a[1..n][1..n] is an input matrix of n by n elements. b[1..n][1..m]
is an input matrix of size n x m containing the m right hand side vectors.
On output, a is replace by its matrix inverse, and b is replaced by the
corresponding set of solution vectors. */

{
int *indxc, *indxr, *ipiv;
/* The integer arrays ipiv[1..n], indxr[1..n] and indxc[1..n] are
used for bookkeeping on the pivoting */
int i, icol, irow, j, k, l, ll,*ivector();
float big, dum, pivinv;
void nrerror(), free_ivector();

indxc=ivector (1,n);
indxr=ivector(1,n);
ipiv=ivector(1,n);
for (j=1;j<n;j++) ipiv[j]=0;
for (i=1;i<=n;i++) { /* This is the main loop over the columns
to be reduced. */
big=0.0;
for (j=1;j<=n;j++) /* This is the outer loop of the search
for a pivot element. */
if (ipiv[j] != 1)
for (k=1;k<=n;k++) {
if (ipiv[k] == 0) {
if (fabs(a[j][k]) >= big) {
big=fabs(a[j][k]);
irow=j;
icol=k;
}
} else if (ipiv[k] > 1) nrerror("GUASSJ: Singular Matrix-1");
}
++(ipiv[icol]);
/* We now have the pivot element, so we interchange rows, if needed,
to put the pivot element on the diagonal. The columns are not
physically interchanged, only relabeled:indx[i], the column of
the ith pivot element, is the ith column that is reduced, while
indxr[i] is the row in which that pivot element was originally
located. If indxr[i] != indxc[i] there is an implied column
interchange. With this form of bookkeeping, the solution b's
will end up in the correct order, and the inverse matrix will be
scrambled by columns.
if (irow !=icol) {
for (l=1;l<=n;l++) SWAP(a[irow][l],a[icol][l])
for (l=1;l<=m;l++) SWAP(b[irow][l],b[icol][l])
}
indxr[i]=irow;
indxc[i]=icol; /* We are now ready to divide the pivot row by
the pivot element, located at irow and icol */
if (a[icol][icol] == 0.0) nrerror("GUASSJ:Singular Matrix-2");
pivinv=1.0/a[icol][icol];
a[icol][icol]=1.0;
for (l=1;l<=n;l++) a[icol][l] *= pivinv;
for (l=1;l<=m;l++) b[icol][l] *= pivinv;
for (ll=1;ll<=n;ll++) /* Next we reduce the rows, except for the
pivot one, of course */
if (ll != icol) {

dum=a[ll][icol];
a[ll][icol]=0.0;
for (l=1;l<=n;l++) a[ll][l] -= a[icol][l]*dum;
for (l=1;l<=m;l++) b[ll][l] -= b[icol][l]*dum;
}
} /* This is the end of the main loop over columns of the reduction.
It only remains to unscramble the solution in view of the
column interchanges. We do this by interchanging pairs of
columns in the reverse order that the permutation was
built up. */
for (l=n;l>=1;l--) {
if (indxr[l] != indxc[l])
for (k=1;k<=n;k++)
SWAP(a[k][indxr[l]],a[k][indxc[l]]);
} /* And we are done. */
free_ivector(ipiv, 1,n);
free_ivector(indxr,1,n);
free_ivector(indxc,1,n);
}

void nrerror(error_text)
char error_text[];
/* Numerical Recipes standard error handler */
{
void exit();

fprintf(stderr,"Numerical Recipes run-time error...\n");
fprintf(stderr,"%s\n",error_text);
fprintf(stderr,"...now exiting to system...\n");
exit(1);
}


int *ivector(nl,nh)
int nl,nh;
/* Allocates an int vector with range [nl..nh] */
{
int *v;

v=(int *)malloc((unsigned) (nh-nl+1)*sizeof(int));
if (!v) nrerror("allocation failure in ivector()");
return v-nl;
}


void free_ivector(v,nl,nh)
int *v,nl,nh;
/* Frees an int vector allocated by ivector() */
{
free((char*) (v+nl));
}








































































X Window System Series (8 volume set)


Vince Guarna


Vincent A. Guarna, Jr. is the manager of the Operating Systems Tools Group at
the Motorola software design center in Urbana, IL. He holds BS and MS degrees
in computer science and is a part-time doctoral student at the University of
Illinois. His development and research interests include user interfaces,
programming environments, and parallelizing compilers.




Overview


The X Window System Series from O'Reilly and Associates is a comprehensive
collection of manuals targeted at X programmers and users at all levels of
experience. The series is divided into six logical subject areas:
Volume 0 -- X protocol manual
(ISBN: 0-937175-50-1) $30.00
Volumes 1 and 2 -- The X C language library (Xlib)
(ISBN: 0-937175-13-7) $60.00
Volume 3 -- X Window System user's guide
(ISBN: 0-937175-14-5) $30.00
Volumes 4 and 5 -- The X Toolkit
(ISBN: 0-937175-33-1) $55.00
(Volume 6 was not available for review.)
Volume 7 -- XView programming manual
(ISBN: 0-937175-52-8) $30.00
The X Window System in a Nutshell
(ISBN: 0-937175-24-2) $24.95


X Window System User's Guide


For people with no prior X Window System experience, Volume 3 is an excellent
introductory reference. It overviews the X Window System architecture and
describes various aspects of X from the user's perspective. It includes
chapters on starting the system, running X applications in a network
environment, using and selecting fonts, creating and using bitmaps, using
commonly available X client programs, and customizing various components of
the X environment.
The descriptions in Volume 3 were lucid and thorough. Beginning with basic
concepts, the author methodically takes the reader through the X Window
System's functionality. In Volume 3, the author discusses differences between
X11R2, X11R3, and X11R4, which should be useful for users running old
versions. The most recent printing is the third edition (May, 1990), and it
covers X11R4 is some detail. Most users will want this edition, because it
covers the changes and additions in Release 4, and because it focuses on the
popular window manager twm for its examples, rather than uwm as in previous
editions. uwm is still covered in the appendices.


The X C Language Interface


For programmers doing low-level work in C, the X Window System Series offers a
two-volume set devoted to the description of Xlib. The first, Volume 1, is the
Xlib programming manual. This manual is a tutorial on programming with the X
Window System. The introductory material describes window creation and
manipulation. It follows with detailed chapters on window attributes, graphics
contexts, drawing and text primitives, colors, events, input devices, and
interclient communication. Although the information in Volume 1 appears to be
the same as the documentation included with the distribution, the presentation
is much more organized and coherent. It provides many working examples -- the
printed C programs have been well proofread. Volume 1 also provides a
glossary.
The second, Volume 2, is the Xlib reference manual. This volume mostly
contains the manual pages for the service calls X supports. This documentation
is essentially the same as is delivered with the standard X release. However,
Volume 2 also contains several appendices that are useful to the programmer.
These include lists of data structures, color database names, symbols, macros,
fonts, and error codes. These appendices provide handy references in one
location that can save significant time searching through other documents.
As with the user's guide, the Xlib manual discusses differences between X11R2
and X11R3, and touches on compatibility issues with X10R4. Also, the volumes I
reviewed (second edition, April 1990) were updated to reflect additions and
changes for X11R4.


The X Toolkit


The X Toolkit is offered as a two-volume set organized in a manner similar to
the Xlib manual. The first part, Volume 4, is called the Programming Manual.
This manual contains the tutorials and examples to explain toolkits and
widgets. Volume 4 discusses the X Toolkit at two levels. The first level
describes the development of X applications using widgets. Volume 4 supplies
example programs using the Athena widgets (supplied as part of the X Window
System distribution), but these concepts are also applicable to any user
interface toolkit, including Motif and Open Look.
A particularly nice feature of the example programs is that they are available
online as part of the standard X distribution. This permits you to spend less
time typing in the programs while going through the tutorials.
Volume 4 also contains a tutorial on creating new widgets. It presents an
architectural overview of a widget and its components, and an overview of the
process of creating new widgets. Creating new widgets using the X toolkit is a
complex task. The explanation and guidance Volume 4 offers is much more
helpful than the documentation provided with X.
The second part, Volume 5, is the Reference Manual. This contains the list of
manual pages and macros for functions that are supported by the X toolkit.


XView Programming Manual



Volume 7, the XView Manual, describes the Open Look toolkit that is now
included with the X distribution (Release 4). The manual is self-sufficient
from the perspective of developing user interfaces with XView. It gives an
overview of the X Window System and describes the object-oriented programming
model used with the toolkit.
Unlike the X Toolkit and Xlib manuals, O'Reilly offers no reference manual for
XView, only a programming manual. People developing applications using XView
will probably also want to have Volumes 1 and 2 of the X Window Series, since
Volume 7 focuses solely on the user interface components supported by the
XView toolkit. It does not cover detailed aspects of the C language interface
to Xlib.


X Protocol Manual


The X Protocol Reference Manual (Volume 0) is primarily designed for people
working on the X server, or on applications that interact with the
communication between the server and clients. Although the detailed
information is beneficial from a debugging perspective, most X application
programmers are not likely to need this volume.
Volume 0 is largely a summary of documentation from other sources. The manual
pages and several appendices are essentially the same as the original X
protocol specification supplied with the X distribution. However, significant
reformatting improved the organization and readability.


The X Window System In A Nutshell


This is a new addition to the X Window Series. It is a condensation of Volumes
1 through 5 targeted at the experienced X user who wants a quick reference
guide. Focusing on X11R4, the guide contains:
Categorical and alphabetical lists of functions and macros in Xlib
Categorical and alphabetical lists of functions and macros in X toolkit
An overview of commonly used X clients
A description of X events
A summary of fonts, colors, cursors and bitmaps
A summary of X errors
Information in the guide is terse and seems to be easy to use for many simple
programmer queries. Also, the compact size (8« x 6 x 1 inches) is convenient
to handle. For experienced X users, this book could be a viable alternative to
purchasing Volumes 1 through 5. However, newcomers to X will probably want the
individual volumes for the tutorials and examples. This book definitely
assumes you know what you are doing.


Summary


I consider the The X Window System Series from O'Reilly and Associates to be
an indispensable set of manuals for the serious X user or programmer. The
tutorials are comprehensive, the program examples are clear and correct, and
the presentations are consistently complete, well-organized, and attractive.
Anyone using the X Window System can probably benefit from Volume 3. Anyone
doing development of X applications will probably want Volumes 1 through 5
(and Volume 7 if XView is being used). People doing work on porting or
modifying the X server will want Volume 0. Based on my experience and the
experiences of others whom I have consulted, the X Window System Series is a
quality set of references that are a worthwhile addition to any technical
library.
X Window System Series
(8 volume set)
O'Reilly and Associates, Inc.
Series Price: $195.50,
8 volumes





























Editor's Forum
I write these words just after the first C Forum, sponsored by Boston
University and The C Users Journal. One day of tutorials and three days of
talks were given October 16-19 at BU's beautiful Corporate Education Center in
Tyngsboro, Mass. (This pastoral site was formerly the Wang Institute, and a
monastery before that. New England weather and foliage were on their best
behavior.)
Brian Kernighan delivered the keynote address. He spoke on "Programming Style
in C," a refreshing update of material from our first book, The Elements of
Programming Style. As always, Brian's presentation was at once entertaining
and incisive.
I served as conference technical chair. That gave me the opportunity to invite
many old friends who I knew could speak well and had something to say. Our
peerless publisher, Robert Ward, delivered both a talk and a tutorial. Other
notables are literally too numerous to list exhaustively. I highlight only
Larry Constantine, Rex Jaeschke, Tom Plum, and Jack Purdum as names that will
be familiar to many. My personal payoff was the feedback from several
attendees. They were quick to notice that the speakers were much better than
average.
This was the first time that The C Users Journal got seriously involved in
sponsoring a major seminar. You, the readers of this magazine and the members
of The C Users' Group, have long requested regular get-togethers. Given the
success of this joint venture, I'm sure that you can look forward to
additional seminars in the future.
As for me, I don't expect to participate in another such conference anytime
soon. Coming on the end of several weeks of meetings and talks, the C Forum
left me exhausted. I also find that editing The C Users Journal soaks up much
of my spare energy. (I enjoy the job, but it ain't always easy.) I will be
spending calendar 1991 as a Visiting Professor in Sydney, Australia. Between
teaching, writing, and editing, my time is well spoken for in the coming year.
If you attended the C Forum, thanks for coming. If not, don't fret too much.
You will see quite a number of the talks recast as articles here in the coming
months.
P. J. Plauger
Editor




















































Quick Takes
Victor R. Volkman received a BS in computer science from Michigan
Technological University in 1986. Mr. Volkman is the new Products Editor for
The C Users Journal. He is currently employed as software engineer at Cimage
Corporation.




MKMK Utility


MKMK (Loch Ness, Inc.) is a utility designed to automate the maintenance of
your make files. It can create make files based on exact specifications or
simply the contents of a directory. It can also update previously created make
files as source code is added or deleted from the system. MKMK is designed to
work solely with .c and .asm files. It scans the .c files for #include
directives and builds those into dependency lists. MKMK performs a similar
service on .include files as used in .asm programs. Additionally, it has
special support for the quirks involved in developing MS Windows programs.
It is not clear whether MKMK is shareware or not. The only documentation is on
diskette and it contains ordering information. In any case, MKMK can be bought
from the author for $25.00. This is a small product, only 20 pages of
documentation are on the diskette. No printed documentation was provided.
MKMK correctly created a make file for one of my existing projects correctly
on the first try. Developers with very large projects would probably find some
idosyncracies in their projects which MKMK would be hard pressed to serve. I
think its best use is in creating the initial make file for a new project you
may have started or an old project you may have inherited.
On the other hand, products such as Polytron's make now include utilities
similar to MKMK in their basic system. Additionally, the new integrated
environments such as provided by MSC 6.0 and Borland Turbo C also have
built-in facilities for this sort of thing.
MKMK Utility v2.0, Loch Ness, Inc., 3700 Colfax Ave. S., Minneapolis, MN
55409. (612) 822-8942; (800) 323-8623.


DIVVY Multitasking Library


The DIVVY library v1.0 from Drumlin is a small but very capable multi-threaded
system for DOS. Unlike DESQview, DIVVY is a non-preemptive multitasker. This
means that a task remains running until it eventually makes a call to the
DIVVY scheduler. The scheduler then restarts the task with next highest
priority which is ready to enter the run state. It is up to the application
programmer to make sure that no single task monopolizes the CPU.
For intertask communication, the DIVVY system supports flags and queues. The
"flag" construction is used mostly for managing critical sections, areas of
code in which common global variables may be accessed. Typically, one process
sets the flag and another process suspends itself until the flag is clear. If
the designers had allowed for "counted" flags, then a regular semaphore could
have been implemented. The queues are simply pipes through which messages can
be sent between tasks. Although queues cannot be named, processes can be
addressed by a user-defined name (e.g. spooler). The queues pass pointers to
message buffers rather than copies of the message buffers. Both flags and
queues can be dynamically allocated and destroyed as desired.
The library itself is quite small (less than 20K) and should add very little
overhead to an application program. Both Turbo C v2.0 and MSC v5.0-5.1 support
are included in the same package. The library comes only in Small and Large
memory models.
The DIVVY manual is a 100-page spiral-bound technical reference. The
"Multitasking Concepts" tutorial section is a scant five pages. The remainder
of the manual is solid function descriptions. The clarity of the writing
together with the simplicity of the design makes up for the lack of an
extensive tutorial.
The only thing missing from the DIVVY system is a way for separate DOS
programs to interact. All of the threads must be contained within the same
executable task. This is really multitasking within a program as opposed to
multitasking between programs, as done in DESQview, Omniview, etc.
The DIVVY library appears to be a viable product for non-preemptive
multitasking.
DIVVY v1.0 Multi-tasking Library, Drumlin, 1011 Grand Central Ave., Glendale,
CA 91201. (919) 244-4600.


Blaise Turbo C Tools V2.0


This is a significant update to the Blaise Turbo C Tools package. Turbo C
Tools lists at $149. It is an all-in-one library providing applications with
capabilities for window management, menus, help screens, sophisticated data
entry, interrupt service routines, hot-key and timer function activation, easy
TSR installation, and flexible string handling. Turbo C Tools is provided with
source which can be compiled to any memory model desired.
The screen functions are all variations on the familiar INT 10H BIOS calls.
The window functions support advanced capabilities, such as writing to
invisible windows, virtual windows large than the screen, proper handling of
overlapping windows.
The menu functions allow horizontal, vertical, two-dimensional (desktop), and
Lotus-style menus. Advanced menu features include ghosting, keyboard
shortcuts, and three styles of mouse interaction.
The field editing functions enable you to easily add a complete scrollable
word processing window to your application. Field editing keys can be assigned
to various operations such as delete a word, undo, and move cursor.
The help system includes a special macro-language for describing help screens.
The help macro files are compiled into a binary file and indexed for rapid
retrieval. Window attributes for each help screen can be individually
specified.
The keyboard, mouse, printer, memory, and file handling functions are
unsophisticated for the most part. A special keyboard function allows you to
re-vector the keyboard interrupt handler.
The Turbo C Tools package provides functions to help you write TSRs. These
functions allow you to install TSRs, detect if a TSR has already been
installed, uninstall resident TSRs, and set the PSP. The "intervention code"
is a highly useful technique for managing the re-entrancy of BIOS and DOS
calls for user-written ISRs and TSRs. The intervention code manager also
indirectly invokes hot-key and timer activated functions.
The Turbo C Tools package is shipped with six diskettes and a comprehensive
600-page paperback bound manual. The first 150 pages are devoted to tutorial
explanation of each of the function groups. The remaining 350 pages is solely
devoted to function descriptions. The writing is clear and easy to understand.
The market for Turbo C Tools package is limited only because of its
single-vendor support (Borland). Although creating TSRs might the primary use
of Turbo C Tools for some developers, it supports text user interface aids
(i.e. windows, help screens, edit fields, etc.) useful to a broad range of
developers. An added appetizer is that the source code is included with the
package.
Blaise Turbo C Tools v2.0, Blaise Computing Inc., 2560 Ninth St. Suite 316,
Berkeley, CA 94710. (415) 540-5441.


Menuet


Menuet (Ithaca Street Software) is a graphical user interface construction set
for DOS applications. Menuet is shipped with versions for the Turbo C, MSC,
and Zortech C/C++ compilers. The list price for Menuet is $350. Menuet runs on
top of MetaWINDOW by Metagraphics Software Corp. MetaWINDOW is another
graphics library which provides over 200 functions. This list price for
metaWINDOW is $250. This means that any device supported by MetaWINDOW is
automatically supported by Menuet. The Menuet package adds another 300
functions to the total picture. The combination of MetaWINDOW and Menuet will
add 200K to your application.
The Menuet reference manual is a 500-page spiral bound book. The first 100
pages is a GUI tutorial along with several sample programs. The tutorial takes
the user on a tour of all the parts of the Menuet GUI. Each of the sample
programs highlight a different aspect of the GUI such as icon panels or
pull-down submenus. Roughly half of the tutorial is devoted to the sample
programs and their explanation. The remaining 400 pages of the book is solely
devoted to function descriptions.
Menuet also includes a program called the Interface Design Tool. This program
is an interactive dialog box creation tool that generates the C code
neccessary to activate it.
Some of the GUI elements supported by Menuet are movable/resizable windows,
pull-down and bar menus, dialog boxes, data entry forms, scrollable lists,
check boxes and radio buttons, icons and icon bars, and button panels.
In addition to GUI functions, there is support for loading and saving graphics
images in native file formats of GEM, MS Windows, PC Paintbrush PCX, and First
Publisher. Also, screen saving and restoring transparently uses EMS,
conventional memory, and disk as needed. Interrupt-driven mouse and keyboard
functions are also provided.
The Menuet system takes an object-oriented approach. All the elements of the
GUI (menus, check boxes, etc.) are described by a common generic structure.
The additional data required to describe the unique characteristics of an
element are handled through a tag pointer to element-specific structure.
The Menuet mouse handling does not make good use of the mouse buttons. No
distinction is made between left and right buttons. Additionally, the manual
recommends completely ignoring the center button on three-button mice.
Font support is also noticably lacking: there is only one font provided for
each screen resolution available. Additionaly, presentation-quality aspects
such a scalable stroke-fonts and proportional spacing are also not provided.
Since Menuet requires metaWINDOW as a prerequisite, the market will be limited
to those already interested in metaWINDOW. The combined cost of both
metaWINDOW and Menuet is almost $600, which will eliminate another segment of
the market.
Between MS Windows, HP's New Wave, and DRI's GEM environment, there seems to
be little need for another "metoo" GUI for DOS. Menuet does not add anything
new to the functionality offered by existing GUIs. Furthermore, Menuet does
not provide intertask communication and can also only run one application at a
time.
Menuet v1.0, Ithaca Street Software, Inc., 1145 Ithaca Dr., Boulder, CO 80303.
(303) 494-8865.



PC-Tags


PC-Tags (Moderne Software) is shareware. I was shipped both registered and
unregistered copies of version 1.01. The unregistered version is freely
distributable and would make a nice addition to the CUG library disks. The DOS
registered version costs $34.95, the combined DOS and OS/2 registered version
costs $69.95. PC-Tags indexes the functions and structures in your source code
so you don't have to remember the file in which each of these is defined.
PC-Tags Supports retrieval from many languages including C, Pascal, BASIC,
dBASE, Modula-2, and Assembly. PC-Tags knows something of the structure of all
of these languages, but does have limitations. PC-Tags supports both explicit
and implicit tags. Implicit tags are function and structure definitions that
are recognized according to the language type (C, Pascal, etc.). Explicit tags
are indexing information manually added by the user. The most common purpose
of explicit tags is to tag global variables, which are normally excluded from
the implicit tagging. One problem with using explicit tags is that PC-Tags
must be run twice in succession in order to pick them up. PC-Tags is not fully
cognizant of all C preprocessor constructs, most notably #ifdef, #include, and
#define.
The other half of PC-Tags is the Retro code retrieval system. Retro can
operate in three modes: invoked from the command-line, invoked as a macro from
your editor, or invoked as a resident program. The TSR mode is suitable for
editors without macro programming languages (e.g. Wordstar). When loaded as a
TSR or an editor macro, Retro can operate on the function name pointed to by
your text cursor. A single keypress then brings up the corresponding source
file with the function right where it is defined. You can also manually enter
the name of the function or structure definition you wish to retrieve.
The manual is 72 pages, staple-bound and paperback-sized. The manual covers
installation, managing tag index files, specific hints and limitations of each
language environment, and how PC-Tags works with each text editor supported. A
small reference section describes the exact file format of the tag files.
If you install PC-Tags you will quickly become hooked on it. I was able to
perform my first Retro retrieve literally within minutes of installing it.
This product will pay for itself in the time savings alone.
PC-Tags v1.01 Source Code Retrival System, Moderne Software, P.O. Box 3638,
Santa Clara, CA 95055-3638. (408) 244-5530.




















































New Products


Industry-Related News & Announcements




GSS Supports UNIX Platform


Graphic Software Systems has released a Graphics Development Toolkit (GDT) for
UNIX in the Interactive -386/IX environment. Current releases of the GSS GDT
support both DOS and OS/2 operating environments.
UNIX GDT supports display devices through an X Window System driver and also
offers a wide variety of hard copy device support. The DOS GDT and OS/2 GDT
and UNIX GDT are source code compatible. Applications developed with one can
be relinked using the other to operate in any of the environments.
The toolkit provides a portable and device-independent graphics programming
environment for PCs. With the GDT's device drivers, developers don't have to
write code to support input, display, and hard copy devices.
The UNIX GDT is now available from GSS and its distributors. The suggested
list price is $995 and includes both 3«-inch or 5¬-inch disk formats. For more
information, contact Graphic Software Systems, 9590 SW Gemini Drive,
Beaverton, OR 87006, (503) 641-2200; FAX (503) 643-8842.


Softaid Introduces 68000 Emulator


Softaid has introduced an in-circuit emulator for the 68000 microprocessor. A
single emulator supports the 68000, 68008, and 68010 microprocessors.
The emulator includes 131,072 hardware breakpoints that work at speeds up to
16MHz with no wait states. Breakpoints can be set on virtually any condition
or machine cycle type. The user can set complex IF_THEN conditions up to five
levels deep; breaks can thus be deferred until a highly specific sequence of
events, involving different data value transfers, cycle types, and addresses,
has occurred. A pass counter lets the user delay breakpoint execution for up
to 65,536 cycles.
The emulator includes a real-time performance analyzer that has 256
variable-width bins. The source-level debugger automatically sets the bins to
exactly correspond to the user's C functions. After collecting performance
data, a histogram identifies exactly how much time is spent in each routine.
The 68000 UEM-Series emulator costs $5,995 and is available from stock. For
more information, contact Softaid, 8930 Route 108, Columbia, MD 21045-2101,
(301) 964-8455; FAX (301) 596-1852.


StratosWare Introduces Debugging Package


StratosWare Corporation has introduced a memory debugging package for C
developers on the PC platform. The Memcheck product identifies the source file
and line number of unfreed allocations, memory buffer overwrites and
underwrites, and instances of invalid pointer usage.
Memcheck also identifies out-of-memory conditions that can cause program
crashes. For more intensive debugging efforts, the Memcheck libraries support
calls to verify memory integrity and produce current memory allocation lists.
Memcheck supports all memory calls for the Microsoft C and Borland Turbo C
compilers for the IBM XT, AT, 386, 486, and compatibles. It requires four
bytes of overhead per allocation at runtime. Installation on any project
requires the addition of one #include per source file and linking with the
appropriate Memcheck library; the package includes a "smart utility" for quick
configuration of existing projects. Memcheck's allocation monitoring can be
enabled or disabled at compile-time, at link-time, or at runtime.
Memcheck is priced at $139.95. It is available from StratosWare, 230
Collingwood, Suite 160, Ann Arbor, MI 48103, (313) 996-2944; FAX (313)
996-0966.


Perennial Wins ANSI/FIPS C Validation Award


Perennial is a winner of the competitive procurement by the United States
Government's National Institute of Standards and Technology (NIST) to provide
a computer software test system for FIPS C.
The proposed Federal Information Processing Standard (FIPS) for programming
language C will be based on a recently approved ANSI C Standard from the
American National Standards Institute (ANSI). Perennial will provide NIST with
the certification test system used to validate conformance of implementations
to the proposed FIPS C. The new standard will apply to C compilers acquired
for federal use to allow portability of C applications across a variety of
hardware configurations.
The cornerstone for ACVS is Perennial's existing C Compiler Validation Suite,
CVS-A, licensed worldwide since 1984. Origianlly, CVS-A was developed to
provide testing of early C language implementations as well as the draft for
ANSI C Standard.
For more information, contact Perennial, 4699 Old Ironsides Drive, Suite 210,
Santa Clara, CA 95054, (408) 727-2255.


Crystal Software Offers New Library Products


In a joint effort, Crystal Software and Zortech have made availabe two new
library products. Crystal Software was formed to help Lattice C users convert
to another compiler safely and with minimal changes. The first product is the
Lattice C Compatibility Kit, or LC-Port.
LC-Port is a collection of libraries and header files that duplicate the
features of the Lattice system libraries. The kit contains several hundred
library functions from the system libraries and from the OS/2 simulation
library. These functions are provided in object form for current versions of
Borland, Microsoft, Watcom, and Zortech compilers. All object versions are in
the same kit.
The kit also contains Lattice-compatible header files that won't conflict with
the header files from other compilers, macro files to simplify conversion of
assembly language functions, and comprehensive documentation including
guidelines for converting to each of the supported compilers. The Lattice
Compatibility Kit is available for the single-copy price of $125 without
source code.
For more information, contact Crystal Software, P.O. Box 4316, Wheaton, IL
60189-4316, (708) 653-4414; FAX (708) 653-4667.


Softaid Introduces 80386 Emulator


A new in-circuit emulator for the 80386 microprocessor, the CodeStalker 386,
is now available from Softaid.

The CodeStalker suports full speed emulation at clock rates to 33MHz. No wait
states are inserted or required, so it works non-intrusively even in systems
designed with cache memory.
The CodeStalker comes with a fiber optics communications link that downloads
code at a sustained rate of more than 250K per second, essentially eliminating
download delays. The unit is supplied with a short-slot fiber interface board
for PCs.
The CodeStalker includes four independent hardware breakpoints that can be set
anywhere in the processor's 32-bit address space, in either RAM or ROM
locations. Any number of software breakpoints can also be set. All run at full
speed without injecting wait states. It provides a breakpoint input to stop
emulation when an external hardware event takes place.
CodeStalker sells for $9,995. For more information, contact Softaid, 8930
Route 108, Columbia, MD 21045-2101, (301) 964-8455; FAX (301) 596-1852.


MMC AD Systems Releases New McCLint


MMC AD Systems is now shipping McCLint Rev2.1. McCLint is the C source code
semantic checking system for the Apple Macintosh computer. McCLint is a
standalone lint type programming tool that complements an existing
programmer's development environment by helping to locate and identify latent
programming bugs. It also incorporates a multiple window editor and a source
code highlighting system.
This release adds support for Think 4.0 OOPs (C++ derivative) and Microsoft C
6.0. Other new features and options include ANSI function prototype generation
for any C source code; a new "click and fix" facility that enables easy
movement between specific entries in an error log report and the actual C
source code that caused the report entry.
McCLint works with any Macintosh system that has System 4.2 or later and at
least 1Mb of memory. McCLint runs as a standalone application and fully
supports MultiFinder foreground and background processing. McCLint is priced
at $149.95. An update is available for existing McCLint customers for $25.
For more information, contact MMC AD Systems, Box 360845, Milpitas, CA 95036,
(415) 770-0858.


Symantec Sponsors Contest


The Symantec Languages Programming Contest, sponsored by Symantec Corporation,
is designed to provide a challenging incentive for students to be creative as
they develop confidence in their technical abilities.
The contest format consists of two sections. The first section contains three
programming questions which require objective answers. The second section
contains a challenging programming problem. The contest can be entered either
by an individual or a group. There is a high school version of the questions
and a more difficult college version. A panel of Symantec engineers has
designed the contest questions and will also serve as the judging committee.
The contest will run through the end of March 1991. Judging decisions will be
made and prizes will be awarded in June 1991. Two grand prize winners will
each win a $5,000 scholarship and a Macintosh llci system. In addition to the
two grand prizes, the runners up will also receive prizes.
To register or request contest information, interested teachers and students
can call the Contest Hotline in Symantec's Bedford, MA office at (617)
275-4800.


Periscope Ships Debugger Software


The Periscope Company is now shipping Version 5 Periscope software. Version 5
offers a new user interface and enhanced language support. The new user
interface features optional pull-down menus and additional online help.
Version 5 software also supports source-level and symbolic degubbing of
Microsoft C 6.0, Borland C++ 1.0, and Borland Pascal 5.5. Users may select
CodeView or Turbo Debugger function-key emulation instead of using the
standard Periscope function keys, or they can customize the function keys.
Other features include support for debugging spawned (child) processes; the
ability to debug at ROM-scan time, which enables users to debug most of the
boot process; passive remote debugging wth Model IV to enable the use of a
host system to capture and examine real-time events in a target system; and
the ability to symbolically debug background and foreground programs in the
same debugging session using two symbol tables.
Prices for the various models of Periscope with the Version 5 software range
from $195 to $2,395. For more information, contact The Periscope Company, 1197
Peachtree St., Atlanta, GA 30361; (404) 872-1973.


Spectragraphics Offers X Window System Guide


Spectragraphics is offering its X Window System Guide to Technology free of
charge as an information resource. The X Guide approaches the subject of the X
Window System from the application user's point of view.
The X Guide provides a brief history of the X Window System, a working
definition and descritption of how the system works, its effect on the LAN and
mainframe environments, and its future.
To receive a free X Window System Guide to Technology, call (619) 587-6969, or
write X Guide, c/o Sprectragraphics Corporation, 9707 Waples St., San Diego,
CA 92121.


SLR Systems Offers Linkers


SLR Systems is now offering three linkers. "Compress," Windows," and "2"
linkers, which are components of Optlink, are compatible with all language
compilers that produce standard .OBJ files, including C 6.0.
Optlink/Compress generates smaller .EXE files, allowing larger programs to be
stored on a single floppy disk. The quick self-extract code is effective for
most applications and needs less than 1K of extra memory for decompression.
Optlink/2 is a complete linker for 16-bit OS/2 and PM applications. Included
with this linker is Optimp, a import library builder.
Optlink/Windows is compatible with Optlink/Windows 3.0 and generates DOS,
Windows, and OS/2 .EXE files.
Each of these packages sells for $250. For more information, contact SLR
Systems, 1622 N. Main St., Butler, PA 16001, (412) 282-0864; FAX (412)
282-7965.


BSO Tasking Improves Compiler


BSO/Tasking has released a new version of its C cross compiler for the Intel
8051 microprocessor. The 8051 toolset offers a complete development
environment from code to test for the 8051 family. It has been tested using
the Plum Hall ANSI C test suite, and provides an ANSI C compiler with
extensions to allow the user to effectively use all the features of the 8051.
The 8051 C cross compiler supports all members of the 8051 family with its
improved code generation. C library and runtime support, including I/O calls,
memory management, floating point math and arithmetic functions are supplied
with the compiler. The compiler has a three-layer development design that
helps simplify maintenance.
The compiler is offered on several different hosts, including PC, VAX, SUN,
HP, Apollo, and IBM. Prices start at $1,695 for the compiler, assembler, and
linker. For more information, contact BSO/Tasking, 128 Technology Center, P.O.
Box 9164, Waltham, MA 02254-9164.



MMC AD Systems Offers McCPrint 2.1


MMC AD Systems is now shipping McCPrint Rev2.1. McCPrint is the C source code
beautifier/reformatter/pretty printer for the Apple Macintosh computer. It is
a standalone programming tool, enabling programmers to transform any C source
code from one programming style to another. McCPrint also incorporates a
multiple window editor and a source code high-lighting facility.
Language dialect extensions include full formatting support for Symantec's
THINK C OOPs (C++ derivative) and Microsoft C 6.0. Pseudo operators, such as
EQU in varA EQU varB, and imbedded database commands of the form EXEC SQL are
now supported. Also, the formatting of imbedded comments, multiple
continuation lines and multiple, contiguous string constants have been
improved.
McPrint works with any Macintosh system that has System 4.2 or later and at
least 512Kb of memory. McCPrint runs as a standalone application and fully
supports MultiFinder foreground and background processing. McCprint is
compatible with all Macintosh and non-Macintosh C compilers, including Apple's
MPW C and Symantec's THINK C.
McCPrint is priced at $99.95. An update is available for existing McCPrint
customers for $25. For more information, contact MMC AD Systems, Box 360845,
Milpitas, CA 95036, (415) 770-0858.


C-Cover Measures Testing


Bullseye Software has introduced C-Cover, a code coverage analysis tool for C.
C-Cover measures testing completeness and testing effort, in quantitative
terms. C-Cover shows cases that have not been performed, in terms of the
boolean conditions in a program's control structures.
C-Cover works with ANSI C compilers running on MS-DOS and sells for $495. For
more information, contact Bullseye Software, 5129 24th Ave. N.E., Suite 9,
Seattle, WA 98105, (800) 421-8006.


Kernel Supports CISC and RISC Processors


JMI Software Consultants have introduced several new versions of the C
Executive, a real-time, multitasking, ROMable kernel that supports a variety
of CISC and RISC processors. The new versions support Advanced Micro Device's
Am29050, Intel's i960, and the Inmos transputer.
All versions of the C Executive offer the optional DOS-compatible file system,
CE-DOSFILE. With this file system, any CISC or RISC microprocessor can be used
to maintain MS-DOS file sytems online.
For more information, contact JMI Software Consultants, 904 Sheble Lane,
Spring House, PA 19477, (215) 628-0846.


Coherent Device Driver


The Mark Williams Company has released the Coherent device driver kit, which
includes source code for more than twelve existing Coherent drivers. The kit's
100-page manual provides an in-depth tutorial of two sample device drivers and
details the kernel's accessible functions in a Lexicon.
For more information, contact Mark Williams Company, 60 Revere Dr.,
Northbrook, 1L 60062, (708) 291-6700; FAX (708) 291-6750.


Canonizer Automates Database Design


The Canonizer, a CASE tool that automates database design, from Six Sigma
CASE, is now available for database designers working on IBM RS/6000 hardware.
The Canonizer writes a data model in seveal types of file formats, including
ANSI Standard SQL (used by Oracle, Sybase, Informix, Ingres, Revelation 2000,
DB2 and others), as well as C, Raima DDL, and Unify 4.0. When used with SQL
Converter, the Canonizer can be used to convert an existing data model from
SQL to any of these output file formats.
The Canonizer runs on XENIX, UNIX System V, BSD, AIS and Sun OS, as well as
MS-DOS. The Canonizer for the RS/6000 retails for $1,995, and the SQL
Converter sells for $395. As part of an introductory offer ending Dec. 31,
1990, the SQL Converter is included free when you purchase the Canonizer.
For more information, contact Six Sigma CASE, 13456 S.E. 27th Place, Bellevue,
WA 98005-4211, 206-643-6911; FAX (206) 641-7501.


CMOS Microprocessor


MIPS Computer Systems announced that the R3000A has been certified for its
five semiconductor partners, assuring that components produced from mask-level
data by each partner are pin-and specification-compatible.
The R3000A, an enhanced version of the MIPS R3000, achieves system performance
of 33 VAX MIPS at clock speeds up to 40MHz. The R3000A consumes 20 percent
less power than the R3000, provides new features for fault-tolerant systems
and reduces the number so SRAMs required to build a cache memory. The R3000A
also offers dynamic endian switching, a feature that enables application
software programs to run on machines configured in either big endian or little
endian byte order without recompiling.
Pin-compatible R3000A components are available from MIPS five semiconductor
partners: NEC, Siemens, LSI Logic, Integrated Device Technology, and
Performance Semiconductor.
For more information, contact MIPS Computer Systems, 928 Arques Ave.,
Sunnyvale, CA 94086-3650, (408) 720-1700; FAX (408) 991-7777.
















New Releases


Updates




CUG308 MSU, REMZ & LIST


Michael Kelly (MA) has ported his generic, doubly-linked list program, List,
to C++ linked list objects. The volume now has both C and C++ versions of List
programs.


New Releases




CUG331 SE Editor


Contributed by Gary Osborn (CA), SE is a revision of the GED editor (CUG
#199), which is itself a revision of the "e" editor (CUG #133). This version's
enhancements and new features are:
Uses up to 500K of RAM for text storage, while functioning with as little as a
6K of allocatable memory.
The virtual disk system's efficiency has been doubled by adding a stale page
directory. The speed of disk reads has been improved.
An embedded runoff function will reformat internal text as per dot commands.
A text push stack has been added for pushing and popping lines.
The undo capability has been extended to include redo.
The program supports free cursor movement.
The command and display structure have been enhanced, but still retain
Wordstar compatibility where feasible.
Many of the enhancements are cosmetic and do not affect the command structure,
but they do improve the visibility and ease of use. The program was developed
under Microsoft C v4.0. The distribution disk includes C source code,
documentation, and an executable file.


CUG332 PCcurses


Written by Bjorn Larsson (Sweden) and submitted by Anders Thulin (Sweden),
this volume includes the PCcurses v1.4 cursor/window control package. PCcurses
offers the functionality of UNIX curses, plus some extras. Normally, you
should be able to port curses-based programs from UNIX curses to PCcurses
without making changes. PCcurses is a port and rewrite of Pavel Curtis' public
domain ncurses package. All the code has been re-written -- it is not just an
edit of ncurses (or UNIX curses). The disk includes C and assembly source
code, user documentation, makefiles for various compilers, and a public domain
make executable file. In addition, the distribution disks include some game
programs such as stone, bugs, jotto, yahtzee, and others. Those programs are
from CUG#101 and are modified by Anders Thulin to use the cursor routines. The
program can be compiled under Microsoft C v3.0,4.0,5.1 or Turbo C v1.0, 2.0 or
68K Paragon C. MASM is required for the assembly file.


CUG333 gAWK


Bob Withers (TX) has modified the GNU version of AWK, gAWK. This AWK version
provides all the features and functionality of the current UNIX AWK version,
except for using pipes and user-defined functions. The program was developed
under Microsoft C v5.1. and can be executed under MS-DOS and OS/2. The
distribution disk includes C source code, Yacc source, makefile, user
documentation, sample AWK programs, and AWK executable file. Yacc (CUG#285
BISON) is required to compile the Yacc source.


















We Have Mail
Dear C Users Journal:
I have two comments to make concerning Bill Plauger's column in the September
1990 issue.
The implementation of assert shown on page 16 is:
void _Assert (<parameters>);
#define assert(test) \ ((test) \
_Assert (<arguments>))
A complete example is next to it in Listing 1.
The idea is to use to take the place of an if statement. However, I did not
understand the use of logical-OR with a void type as the second parameter. I
looked it up in Standard C (by Plauger and Brodie), and it states that both
arguments must be scalar. On page 29 of the same book, it clearly shows that
void is not a scaler.
I'm wondering what could do in that situation. It returns an int anyway, not a
void which is the correct type that the macro is supposed to expand into. If
is supposed to evaluate the first argument, and then evaluate the second only
if the first was non-zero but not return a result if the second argument is
void, that is something I had never heard before.
It sounds useful though, so I checked the Standard. Alas, section 3.3.14
states "The result has type int."
I think what he meant was something more like:
#define assert(test) \
(void) ((test) \
_Assert (<arguments>), 0)
The other point I want to make was that defining one's own assert should be
more common than it is. For example, I use a text-based windowing system in my
programs. Using an assert that wrote to standard output would not work.
Instead, I wrote my own that knew how to output to the screen under that
library.
A programmer can also enhance assert to his own tastes and practices. For
example, the built-in assert will call abort(), which had the undesirable side
effect of not freeing up any EMS the program was using. So I would have to
periodically reboot after a few abnormal terminations when I ran out of EMS. I
wrote my own assert to call exit() instead, which I knew to be safe under the
circumstances I was debugging under, and served me much better.
Likewise, my windowing system does not take well to an abort (). My own assert
has a popup window which lets me select Exit, Abort, or Resume. I normally
exit, but can still abort under dire conditions. And letting me resume allows
me to assert suspicious conditions as well as outright errors, and to assert
things that would be far too difficult to compose a complete test for but are
OK if they have a false hit every once in a while. Or perhaps I know that this
bug is just cosmetic, and I can go on with the next test instead of starting
over.
To summarize, you would be well advised to use assert statements or something
like it during testing and development. Crafting your own debugging aids in
the same light can pay big dividends.
John M. Dlugosz
2629 Glenhaven
Plano, TX 75023
Mr. Dlugosz is right, as is so often the case. John's magazine articles
frequently clarify the darker aspects of computer programming. See my apology
and corrections at the end of the "Standard C" column. -- pjp
Dear Mr. Ward,
Re: CUG disk #236 ("start-up disk") CFLOW.C
I have taken the trouble and analyzed the CFLOW.C program. I will share my
findings with you:
1) The program fails to identify any function declarations which have
whitespace (tabs or blanks) between function name and the left parenthesis.
2) The program fails to process (input) programs correctly, if their last line
has no linefeed, i.e. if an EOF is reached at the end of this last program
line.
3) The nesting of tests should strictly be:
If not within comment, are we perhaps within a double quote?
If not within a double quote, are we perhaps within a single quote?
The current logic is wrong.
4) The programs fails to handle cases which have three double quotes in a
printf argument, one destined to be printed (preceded by a backslash).
5) The program fails to handle cases which have single quotes directly
imbedded in printf arguments (or in string literals for that matter).
6) The program arbitrarily assumes that single quotes are three chars wide
(the normal case, agreed). It therefore identifies two double quotes if it
hits any four-character quotes (those with backslashes, backslash followed by
two single quotes e.g.).
You will agree that in such a case it is better to reinvent the wheel rather
than rely on poor code. I strongly suggest you withdraw CFLOW from the
library.
Best regards,
L. Engbert, Engbert UB
Taunusstrasse 8, D-6384
Schmitten 1
West Germany
Ouch. - pjp
CUG 329 UNIX Tools for PC includes a better cflow. -Kenji
Dear Howard,
I believe that R&D Publications needs to create an online presence in the
programming world. There are several good reasons to go online, and with
careful planning, costs can be limited. The reasons for going online include
maintaining competitiveness, electronic mail for authors and readers, reader
services, and product sales.
The computer magazine business is notoriously competitive. Readers vote with
their subscriptions to the magazines which best meet their needs. Some of the
largest technical computer magazines have online services. These online
services, such as PC Magazine's PC Magnet and Byte's BIX, have helped
establish them as being on the leading edge of technology. Nowadays, computer
users expect more than just printed pages from a magazine and want to interact
with it and through it.
Electronic mail for authors and readers provides the interactive dimension to
online services. Readers can ask simple questions or pose difficult problems
and together share solutions. For authors, an online connection means an
easier method to submit stories, receive proofs, and correspond with readers.
Computer Language routinely receives manuscripts through their online system.
This can be faster and cheaper than overnight mail.
In addition to electronic mail, you can provide other reader services with an
online system. First, many readers are anxious to try out source code printed
in the magazine but are unwilling to spend the hours necessary to type it in
correctly. Many of these people won't go through the trouble of mailing an
order for a diskette and waiting around for it to be mailed back either.
Online availability of source code from the magazines is what these readers
want. I have been providing this service for readers of The C Gazette for the
last 18 months. Last, an online service can provide another channel for sales.
Interactive ordering forms can provide the reader with the means to purchase
books, diskettes, and even subscription renewals. Of course, online services
can also generate revenue themselves if a fee is charged for registration,
extended access time, or file transfer privileges.
Thus far I have avoided mentioning the specific vehicle for "online services."
There are many avenues available including starting your own forum on existing
networks like CompuServe, Genie, Delphi, and Prodigy. The alternative route is
bulletin board system (BBS). Advantages of the BBS include control of costs,
ability to scale up the level of service, and no limits on file storage.
These have been my ideas on this subject and I thought I should share them
with you.
Respectfully,
Victor R. Volkman
5145 Pontiac Trail
Ann Arbor, MI 48105
Mr. Volkman, a regular contributor to CUJ, has a valid point. We are
increasing the use of electronic mail at a rapid rate. I intend to depend
heavily on it if I am to edit this magazine from Down Under for a year.
R&D Publications has a fledgling BBS as well. To what extent it will be used
as Victor describes, or when, has yet to be determined. That, in the end, is
for Robert Ward to decide. - pjp
Attention: Microsoft Product Support

Technician Mr. Dan Kirby
Dear Mr. Kirby,
Thank you for your prompt (4 September) response to my problem concerning the
IDE for Quick C 2.51 always opening in the /h (maximum lines) mode whether
CGA, EGA, or VGA.
Based on your suggestions, I quickly tracked the problem to the file
REMIND.COM which was called as the last item of my AUTOEXEC.BAT. This program
was distributed by the late "Micro Cornucopia" in their General Utilities disk
MS-36 and possibly has received wider distribution. The 13K COM program
contains the words, "Copyright (C) 1984 Borland Inc ... display 80x25eP,"
perhaps indicating that it was written/compiled in an early Turbo Pascal. The
DOC file gives the author as R. Schnapp.
Your "mode co80" also worked. I'll retain REMIND.COM, insure that MODE.COM is
available, and add the appropriate mode co80 command as the last call to
AUTOEXEC.BAT on the computer in which QC 2.51 is installed.
Microsoft should still be interested in why the problem occurs in Quick C 2.51
but does not occur in the MS BASIC 7.1 QBX environment and take corrective
actions. Thank you again.
Very truly yours,
David W. Fischer
4609 Braeburn Drive
Fairfax, VA 22032-1830
Dear Sirs:
I have a couple of comments on Rex Jaeschke's useful article, "Using the
Quicksort Function" (CUJ, September 1990).
We must use if tests as in Listing 1 of that article not only for unsigned
values but for any types whose difference cannot always be expressed in an int
return value, such as long or float. (This is probably clear to most users,
but it was not made explicit.)
For multi-key sorts, Mr. Jaeschke states that qsort sorts elements according
to one key, and outlines a method for sorting on multiple keys, which he
admits may not be the most efficient. Actually, qsort knows nothing about
keys; what it does know about is only one compare function, but that function
may incorporate comparisons for as many keys as desired. The usual method for
handling multiple keys in the compare function is first to compare according
to the primary key and, if unequal, return with the appropriate positive or
negative value. If equal, however, then compare according to the secondary
key, etc. That is, proceed on down the line of keys until the highest of them
produces an unequal comparison. The function can then return a positive or
negative value without examining lesser keys. Only if the last key also
compares equal is there a zero return value for the compare function. With
such a compare function, qsort sorts on multiple keys in one pass. Hope this
helps.
Sincerely,
Stuart T. Baird
927 Parkview Drive F-301
King of Prussia, PA 19406
Mr. Baird is correct. - pjp
Dear Robert,
I have seen a bunch of articles about sorting lately, so the attached table
might be of interest. What inspired me to send it is the oft-repeated comment
"qsort is the fastest..." I got into this testing because I have an
application that may do 100,000+ calls to a sort. The final word is you must
test with actual data -- Shell's may be the best for some things, but if all
the real data are almost sorted, an insertion sort is faster.
Note one bit of unfairness in the times: qsort() has the overhead of a
function call for comparison where the rest had the comparison in-line. The
test is realistic for me, since I don't have the qsort() source and thus was
stuck with the call.
Table 1 shows the times for various sorting algorithms. qsort() is the
standard SYS/V library function; Shell's sort is the implementation in K&R;
Shell's sort II is the implementation in Algorithms, 2nd Ed., Sedgewick;
insertion sort is per Sedgewick; heapsort is from Numerical Recipes in C.
Count is the size of the array. Sorting performed on an array of integers:
same -- all elements in the array = 1
sorted -- elements are 0 to count --1 in numerical order
reversed -- elements are count -- 1 to 0 in reverse numerical order
random -- elements are generated by drand48()*count calls
Times are in seconds obtained from clock(3C); all testing on 3B2-700 running
Sys V.3.2.1; programmed in C and compiled with:
cc -0 shell.c -lmalloc
Simply yours,
David X. Callaway
1B49 Independence Rd.
Fort Collins, CO 80526
Dear Editor:
Your September 1990 issue featured an article on CRC-16 calculation. The
algorithm presented was a straight software emulation of the hardware solution
to the problem of calculating CCITT compatible 16-bit Cyclic Redundancy
Checks. Since software solutions are usually more efficient if they are
byte-oriented rather than bit-oriented, as a hardware solution would be, I was
asked by my employer to develop such a byte-wise algorithm for the same
problem. By running a symbolic model of the bit manipulations involved in
calculating the new CRC after one byte has been processed, I was able to
develop a solution that does not involve a loop to handle each bit separately,
but processes all eight bits in parallel. As expected, the algorithm performed
much faster than the standard bit-oriented version. In my own informal timing
tests, I replaced the bit-oriented routine in your example program by the
byte-oriented routine I developed for Argilla Corporation and compiled both
versions on Microsoft C 6.0 to 80286 code for MS-DOS. The result was a
performance increase by 500 percent over the version you published. Listing 1
is the code required to calculate the inverted and byte swapped (for
transmission) CRC of a message. For the test cases and examples, please see
Bob Felice's article in the September 1990 issue of The C Users Journal.
Ammo Goettsch
Argilla Corporation
(and a student at USA)
Mobile, Alabama
Dear Robert,
SUBJECT: qsort, September issue.
An easier way to do multi-key sorts given, e.g., a struct like this:
struct stuff
{
char str [20]
int cnt;
int subcnt;
}
To sort struct stuff * array[], use a comparison function, as in Listing 2. It
seems a lot more straightforward than sorting on one key and then sorting
sub-blocks of data by another key (and even better, it works).
Also, to sort reversed, as on p. 30, use:
return(strcmp(p1, p2))
/* forward, or * /
return(strcmp(p2, p1))
/* backward */
In other words, just reverse the pointers.
Simply yours,
David X Callaway
IB49 Independence Rd.
Fort Collins, CO 80526

Dear Sirs,
I'd like to point out a problem with Arthur Held's article "Function Returns:
How to Use Them" in your October issue. I wouldn't bother, except the article
was obviously intended for beginners in C and contains a glaring syntactical
error which might cause some confusion and frustration to new users.
Held writes the ternary operator to convert ch to upper case with the
following syntax:
(if ch >= 'a' && ch <= 'z' ?
ch - ('a' - 'A') : ch);
As many readers have informed you, I'm sure, the if is incorrect. The ternary
operator does not require an if. The correct syntax is:
(ch >= 'a' && ch <= 'z' ?
ch - ('a' - 'A') : ch);
Note the outer parentheses are unnecessary but harmless, and do serve to group
the ternary operator together. Unfortunately the article contains the same
error three times. Also, in Held's first example, the programmer will need to
assign the result of this ternary operator to something in order to actually
save the converted ch. The corrected syntax I gave above still doesn't do
this. A beginner might be confused about that as the ternary operator is
probably something quite unique from anything they've used before.
On another more positive note, I'd like to provide a little input for
beginners that might have read this article. I noticed Held used the printf
function to display string constants when giving examples of prompts and error
messages. Certainly this is a common practice and I'm not picking on Held. A
hint: if you are going to output a string constant, it can be very beneficial
to use the puts or fputs function. They deal explicitly with string constants
and thus don't suffer the overhead of linking in all the baggage to print
every kind of integer, character, and floating point like printf will. If
printf is not needed anywhere in your code, avoiding it and using puts or
fputs instead can greatly reduce the size of the executable.
I think your mix of beginner and advanced material is wonderful, and serves a
great purpose in the C community. I consider myself an advanced C programmer,
having worked with C exclusively in many major projects over the past eight
years. Having received your publication over the last two or three years, I
can honestly say I get something out of every issue. The C Users Journal is
consistently the best publication I receive.
Thanks for the good work,
John T. Pearson
3405 Elmwood Ave.
Lubbock, TX 79407-4027
Mr. Pearson is quite correct. The errors he cites are egregious. We can only
hope that a less overworked editorial staff will catch more such gaffes before
they see the light of day. - pjp
Dear Sir:
I am a current subscriber in your magazine and I am interested in finding a
font editor for stroked fonts of Borland graphics library. Your help will be
highly appreciated.
Sincerely,
John Kolias
74 Lelas Karagiani Str.
Athens 11361, Greece
Anybody? - pjp
Mr. Ward:
If some of my emotion leaks in your direction please forgive me because it is
not my intent. I believe that you, as well as I, are simply the victims in
this matter. The perp, as the cops like to say, is SoftC in Vol. 326. They
have used you to market for them, at your expense, a very uncompetitive
product. It is unknown at this time if the product even works or how many bugs
it has, and it is economically unfeasible to find out. In brief, it is because
their product costs $100 for the source code and that is for one user on one
computer at a time per their license agreement in license.doc and manual.doc
on Vol. 326. I have plans to produce and market a dBase access utility for
$49.95. This means that I would be $50.05 in the hole for every copy I sold,
using SoftC's library, per their license agreement. You're probably thinking,
"Why don't I just use the runtime library, it's cheaper." The problem is that
I don't want the whole load, just a couple of read functions and once I buy
the source at $100, I'm restricted per the license from using any portion of
it except on one computer at a time, thus $100/copy royalty on a $49.95
package. If SoftC was the only player it would be take it or leave it. But
they're not. CodeBase 4 from Sequiter Software is available at full retail for
$295 with source code and with no royalties and they have windows and menus on
top of the dBase access routines.
So the bottom line is I got skunked for $8 and you're doing the dirty work for
SoftC for free and our group (CUG) is being abused by a greedy and cheap
marketing operation.
Tom Curren
Dear Mr. Ward,
Regarding your request for books covering the information engineering area:
There are literally hundreds of texts with titles like Information Systems In
Business, or Management Information Systems. They cover some of the topics
described in your material. I am not familiar with titles that would emphasize
engineering. Three titles which have been used in our program (which doesn't
mean they are any good) are:
Management Information Systems, K. Laudon and J. Laudon, Macmillan Publishing
Company.
Management of Business Systems, R. McLeod, Macmillan Publishing Company.
Foundation of Business Systems, P. Flaatten, et. al., Dryden Publishing.
In addition, I am sure that any text salesperson would happily talk your ear
off about offerings that fit your needs.
Tim Shaftel
Dear Sir or Madam:
Congratulations on an excellent magazine. I look forward to receiving each
issue, which I read cover to cover. I have not been able to read the Volume 8,
Number 6 edition because I did not receive it. I would greatly appreciate it
if you would send it to me. Thank you very much and keep up the good work.
Sincerely,
Calvin Trainor
63 Pincarrow Rd.
Winnipeg, MB R3Y 1E6
Dear Sir:
In testing Rex Jaeschke's assertion that there is no such thing as a negative
constant in C ("Operators And The Precedence Table," The C Users Journal,
August 1990) I find that the code generated by one compiler yields the size of
an int when asked for sizeof(-32768) and that the code coughed out by a
competing compiler returns the size of a long. With either compiler, constants
more negative than -32768 are tagged as longs and those less negative but
still less positive than +32768 are sized as ints.
This seems to mean that the compiler that is capable of storing -32768 in an
int-sized object is indeed using a signed format to store the constant. Rex's
comment therefore leads me to ask whether this behavior is just an unusual
practice in C or whether it constitutes a violation of the ANSI Standard.
Although this side issue proved to be interesting, the exposition of the
precedence table in the article was quite valuable to me, and I look forward
to future columns by Doctor C.
Sincerely,
Maynard A. Wright
6930 Enright Drive
Citrus Heights, CA 95621
Rex Responds:
Thanks for your kind words.
Assuming the compiler is running on a twos-complement 16-bit machine (like the
Intel 80x86 series and not using a DOS extender), if it stores -32768 in an
int it is defficient. Here's why. The grammar for an integer constant token in
ANSI C does not include a sign. Therefore, the expression -32768 consists of
two tokens, the unary minus operator and the constant 32768.
According to the Standard, (page 28, lines 37-41), "The type of an integer
constant is the first of the corresponding list in which its value can be
represented. Unsuffixed decimal: int, long int, unsigned long int; unsuffixed
octal or hexadecimal; int, unsigned int, long int, unsigned long int; suffixed
by the letter u or U: unsigned int, unsigned long int; suffixed by the letter
l or L: long int, unsigned long int; suffixed by both the letters u or U and l
or L: unsigned long int."
32768 is an unsuffixed decimal and it won't fit into an int so long is tried
and that works; long is its type. The negation is applied to the result of
that long int. If you look carefully, you will see that the rules are
different for decimal and octal/hex. The following program demonstrates this.
#include <stdio.h>

main()
{
printf("sizeof(-32768) = %lu\n",

(unsigned long)sizeof(-32768));
printf("sizeof(0x8000) = %lu\n",
(unsigned long)sizeof(0x8000));
printf("sizeof(0100000) = %lu\n",
(unsigned long)sizeof(0100000));
}
The values -32768, 0x8000, and 0100000 have exactly the same bit pattern when
stored in 16 bits. However, the type of the first expression is long while
that of the second and third is unsigned int. The correct output produced is:
sizeoff(-32768) = 4
sizeof(0x8000) = 2
sizeof(0100000) = 2
I tried this program with six DOS compilers and one did perform as you
reported. It was the Power C v1.2.1 compiler from MIX. I wonder if that's the
one you found too.
Incidently, the original K&R (page 180 section 2..4.1) allows you to come to
the same conclusions provided you recognize the minus is not part of the
constant token.
And in case anyone is wondering why I used the casts; the type of the result
of sizeof, size_t, is implementation-defined and it can either be unsigned or
unsigned long. Since I need to choose a print edit mask I chose the larger of
the two and cast the expression just in case. If it was already unsigned long
the cast is vacuous. If it wasn't, the correct code will be generated to make
it work. It's a very subtle porting issue.
Table 1
type same sorted reversed random count
----------------------------------------------------
qsort .03 .09 .10 .20
Shell's I .03 .03 .06 .08
Shell's II .02 .03 .04 .06 1000
insertion ?? ?? 1.9 .96
heapsort .01 .06 .06 .06
----------------------------------------------------
qsort .28 1.2 1.2 2.6
Shell's I .48 .47 .85 1.4
Shell's II .35 .35 .54 .92 10000
insertion .04 .05 206 102
heapsort .15 .87 .80 .85
----------------------------------------------------
qsort 2.8 15.3 15.5 36.9
Shell's I 6.0 6.1 11.3 24.5
Shell's II 4.5 4.5 6.7 15.5 100000
insertion .44 .51 - -
heapsort 1.5 11.0 10.1 11.0
"-" indicates too long to wait; "??" time too short
 to measure

Listing 1
unsigned short crc16(char *pcDataPtr, unsigned short usLength)
{
/* this routine calculates a byte-wise CCITT CRC-16 of a block
of data of length usLength and pointed to by pcDataPtr */

unsigned short usCrc;
unsigned short usData;

/* At this point, to facilitate register optimization, we either make a
local copy of the global variable that holds the current CRC,
or we initialize the local copy to the starting value OxFFFF.
We do the latter if we are using this routine to calculate the CRC of
ONLY the block pointed to by pcDataPtr, as in this case. */

usCrc = 0xFFFF;

while(usLength--)
{
/* The algorithm requires the data bits in the high byte. */
usData = (*(pcDataPtr++)) << 8;

/* This block of code calculates the CRC for one byte of data. */

usCrc = ((usCrc & 0x00FF) << 8) + (usCrc >> 8);
usCrc = usCrc ^ usData;
usData = usCrc & 0xFF00;
usData = usData << 4;
usCrc = usCrc ^ usData;
usData = usCrc & 0xFF00;
usData = usData >> 5;
usCrc = usCrc ^ usData;
usData = usData >> 7;
usCrc = usCrc ^ usData;
};

/* We get the CRC ready for transmission. */
usCrc = ˜(((usCrc & 0x00FF) << 8) + (usCrc >> 8));
return(usCrc);
}


Listing 2
int comp(pl, p2) /* not ANSI obviously */
register struct stuff **p1, **p2;
{
register int retval;
if ((retval = strcmp ((*p1) ->str, (*p2) ->str)) ! = 0)
return(retval);
if ((retval = (*p1)->cnt - (*p2)->cnt) != 0)
return (retval)
return ((*p1) -> subcnt - (*p2) -> subcnt);
}