We Have Mail Dear Mr. Ward: I am not much of a letter writer, but after reading the July 89 issue of the C Users Journal I felt I could save some of your readers a lot of time tracking down a problem with the Microsoft C, version 5.10 memory allocation routines. Enclosed is a listing and the output from the program. This may help Steven Isaacson who is having memory allocation problems using Vitamin C. I found this problem after a week of tracking down a memory leak problem in a very large application. My final solution was to write my own malloc()/free() rountines that call DOS directly. This will let the DOS allocator do what it is supposed to do. No time penalty was noticed in our application. Note if you do write your own malloc()/free() routines, call them something else! MSC uses these routines internally and makes assumptions about what data is located outside the allocated area. I always use a malloc()/free() shell to test for things like memory leaks and the free of a non-allocated block. It also will give you an easy way to install a global 'out of memory' error handler. The code supplied by Leonard Zerman on finding the amount of free space in a system is simplistic and very limited. A better routine would build a linked list of elements and then the variable vptrarray could be made a single pointer to the head of the list. The entire routine becomes dynamic, much more robust, and there is no danger of overflowing a statically allocated array. See the supplied code for an example. The linked list implementation has the side effect that it will work on a virtual memory system. Why you would want to do this is beyond me, but it could be considered a very time consuming way to find out what swapmax is set to on a UNIX system. If you have any questions, please contact me. My phone number is (408) 988-3818. My fax number is (408) 748-1424. Sincerely yours, Jim Schimandle Primary Syncretics 473 Sapena Court, Unit #6 Santa Clara, CA 95054 Thanks for the information. We've included your code in Listing 1. -- rlw Dear Mr. Ward: I'm new to programming and need to extract information from old mainframe files. Each file has its own annoying attributes. Some files are reports for printing on 132 column paper with headers on each page along with errors in tabulation and decimal point alignment. I'd like to know enough about grep, awk, sed, and tr so I'm not reinventing the wheel with my C programs for file manipulation. Where can I find an understandable and brief overview of these UNIX tools? (I know nothing about regular expressions, scanning, and syntactic analysis.) Sincerely, Orion C. Whitaker, M.D. 400 Brookline Ave., #22F Boston, MA 02215 I suggest The UNIX Programming Environment by Kernighan and Pike. This is a tidy little book that does more to explain how the tools work and work together than any other book I've seen. While it's insightful, it's also a good teaching text. You should also consider The Awk Programming Language by Aho, Kernighan and Weinberger (the A. W. K. in awk). If our readers know of other texts that do a good job of explaining how to use the UNIX language-oriented tools, I'd like to hear from you. -- rlw Thank you for your letter/brochure. First, I have some questions. I studied BASIC last Semester at Comm. College, and would now like to learn C. My major problem is MY computer. I have a Commodore 64 with 256K RAM expansion, and plan to use Abacus Software's Super C Compiler 64. I am a retiree with little prospect of buying a new computer. 1. Do you offer much in this format, or am I butting my head against a wall? 2. Would it be practical for me to attend a class where they are using, probably, IBM compatibles, and do my homework on my system? Would work developed on my system operate on "IBM"s? The disks are not compatible, but could my work be 'retyped' into the "IBM"? I have Standard C by Plauger & Brodie, and Transactor Magazine has articles which look like they will be useful when I learn more. Les Maynard P.O. Box 915 Palmer, AK 99645 Unfortunately, we can't write Commodore disks. However, it's my understanding that if you have the right Commodore drive you can get a program that will let you read MS-DOS disks directly. Whether you can do your C homework on your Commodore depends on several things: 1) Is your instructor willing to accept Commodore output. If you have to run your work on an MS-DOS host to make it acceptable, it probably won't work. 2) What subjects and exercises will the class focus upon? If writing direct to the IBM video display is one of the exercises, it probably isn't reasonable for you to try to work along on the Commodore. If, on the other hand, the class will confine itself to general, portable language features and concepts, you will have less trouble. 3) How adept are you at researching your own system? At some point (probably several points), a classroom illustration isn't going to work on your machine. It really isn't fair to expect the instructor to research the problem for you. Can you find your own way? 4) Is your Commodore implementation complete enough to support the scope of the class? Will you be asked to write programs that exceed the memory space? Will you need doubles? Will the exercises require elaborate pre-processor capabilities? At the very least you should have a serious talk with the instructor before you enroll. Whether work you develop will run on an IBM depends entirely upon the code. If you confine yourself to generic file processing and discipline yourself to avoid or at least properly hide any Commodore peculiarities, then your code should run in the IBM environment. (You might find some helpful ideas in Rainer Gerhard's story in this issue.) Please note these are major ifs even for very experienced C programmers. -- rlw Dear CUG, I am writing to warn you and other users of the problems I have found with LEX part 1 and 2 on disk number 172 and 173. The program generates code which crashes the system when run. The problem is in llstin(). If _tabp is NULL, it assigns it to the return of lexswitch(). lexswitch() returns a pointer to the previous table, which is NULL when first cared. The results is _tabp being set to NULL forever. Since this table contains pointers to functions, the program jumps off to an unknown address. The source code that was provided will NOT generate this code, indicating that the exe file was not built from this source! So, I rebuilt it and, in testing, found the new exe produced different tables than the release program did. There are various solutions to this problem. One is by setting _tabp to the location of the table in the .lxi. The solution is to edit the generated source file each time and removing the assignment statement to _tabp in llstin(). Or you could alternately change lexswitch() to return the new value. I don't like the last one because all the documentation states the return value is a pointer to the previous table. Since I am using the -s option, I edit the file as there is another problem with that option. The problem with the -s option may only exist with Microsoft C. llstin() is declared as a void at the beginning. The function itself is NOT. The compiler produces a diagnostic error. With the incorrect source, the only way around this is to edit the file. (A REAL PAIN if you are using a make file to build the final program.) I also have a copy of "Bison". It has worked very well with one exception. I found I had to include stdlib.h in simple.prs in order to get rid of several warning messages under certain conditions. One might include it inside the .y file, instead. By placing it inside simple.prs I don't have to remember to put it inside the .y. In general, I've found bison to be GREAT. Keep up the good work, and good luck. Sincerely, Frank Veenstra 24797 Metric Dr. Moreno Valley, CA 92387 Yes, the .exe and source files are out of phase. We'll test your fix and remaster the volume with the fix. When we have a new master we'll announce an upgrade in the New Releases column. Thanks for the help. -- rlw Mr. Robert Ward: In the May, 1989 issue of the C Users Journal, Timothy Prince presented a rather eloquent and detailed article entitled "Efficient Matrix Coding in C". However, I would like to bring to your attention, excuse me if someone already has, an error in that article. Mr. Prince asserts the following to be true: a[i][j] = *( &a[0][0] + i * I + j ) when given the declaration: float a[I][J] ; C stores array elements in a row-major order and not in column-major order as suggested above. The valid condition is as follows: a[i][j] = *( &a[0][0] + i * J + j) for the given declaration. All the elements of row a[0][.] are located at a lower address than the first element of row a[1][.], which is stored right after the array element a[0][J-1]. Consequently, to access a[i][j], it is necessary to skip i rows, where each row contains J elements plus the j elements before the desired element. I would also like to take this opportunity to commend you and your staff on producing a Journal that is superior technically than all the other superficial computer magazines that I have read. That May issue was my first copy of C Users Journal and it certainly will not be my last. Sincerely yours, Girish T. Hagan 27401 Via Olmo Mission Viejo, CA 92691 Ah yes, the hazard of too much FORTRAN and Pascal. Thanks for correcting our slip -- and thanks for the kind words. -- rlw Dear Robert, I have been a member of the C Users' Group for quite a long time now, around the seven to eight years mark. Over this period I have kept all of your newsletters and your present The C Users Journal publications. I have watched the evolution of the Journal with great interest. During your 'early days' I often reread some of the newsletters when I needed some information on a particular piece of code, or on a bug which another member had discovered. But time seems to compress as you get older. These days I rarely have the time to re-read articles, unless it is important that I do so. WHY is he telling me this...do I hear you ask? Well, I hope I have set the scene properly because I assume you have many more readers than just Phil Cogar who have difficulty in finding enough time to squeeze in their preferred reading. Professionals in any line of work tend to be busy people. Which brings me to the August issue of the Journal and, specifically, the article by Denis Schrader on the FOR_C Translator. Not that I am at all interested in FORTRAN_to_C translators but I always read the Journal from cover to cover and I hope my comments will assist in raising the standard of the Journal even further. With respect to Denis Schrader, who I hope does not take offence that I have selected his article to point out what I believe is wrong with some of the User Reports, I would like to direct your attention to this article with the plea that you consider setting certain standards for authors to write to for future User Reports. So, and without wishing to offend Denis, let me start by asking you to instruct your authors to make their reviews complete (or as complete as they can in the circumstance) as they stand. Don't presume the reader either has access to, or the inclination to look up, an earlier review. Of course rules are meant to be broken so you might give a reference to something written within the previous several months, but I suggest two years is a bit too long. I refer here specifically to the words-"...which I reviewed in the August 1987 issue...However, comments in this review will point out improvements which have occurred since the release of earler verisons of the product." Point 2, back up specific comments with specific information. For example if you say-- "The translator will pay for itself quickly in saved programmer hours." then you should also say how much it costs, both the List Price and, if you know it, the street price. Point 3, if we are talking about a specific product then either cut out or cut down on the generalisations. An example of this is the comment (statement?)-- "The translator translates almost 100 percent of ANSI Standard FORTRAN as well...extensions." If the reader is reading the User Report because he or she wants to be better informed about the product (and isn't that the purpose of the User Report?) then, in this case, unless we are told- -Whether this (the non-translation of the FORTRAN code) is a transient thing. In other words do you have to check each piece of translated code for small errors (perhaps for large errors...I don't know and the Review doesn't say) which might translate to bugs in C; or -Whether this is systematic and the FOR_C translator only fails to properly translate certain pieces of FORTRAN code properly into C. In such a case does the translator 'flag' the offending pieces of code so they can be corrected using the recommended, known conversion; or -If your translated C code compiles without the compiler complaining to you, does this mean the code is a 1-for-1 translation of the FORTRAN routine, or not; and so on. It seems to me that a generalised comment of the type mentioned above does little (nothing?) to better inform the reader about the merits or otherwise of the product. Point 4, comparisons are odious (or so we are told) but they seem to abound in product reviews. My point is that partial comparisons tend more to mislead the reader than to inform him/her. In other words we are talking here about a product which translates FORTRAN code into C code. We are not told WHY it is desirable to do this if you already have good, de-bugged FORTRAN routines you wish to incorporate into C programmes. Please correct me if I have got it wrong but, as I understand the situation the Microsoft family of microcomputer languages allow you to generate files compiled in Basic, C Pascal, FORTRAN and assembler any and all of which can be linked into a run-time file as required. I am (most certainly) NOT an apologist for Microsoft but I do suggest a reviewer has not properly informed the reader as to the merits or otherwise of the product without at least canvassing other alternatives. If Microsoft, for example, have a family of languages which can do the job in another way (you'll notice I didn't say 'a better way' because I don't have a clue which is the better approach, the Review didn't tell me) then the Reviewer should at least mention this. In other words alert the reader to other possible alternatives, at least. The preferred option would be to make a comparison between the competing products and compare features, strengths and weaknesses. So there it is. In summary my four points are-- Point 1- Make the review as complete as possible in the spacce allowed. Don't ask the reader to look up other references. We aren't dealing with a scientific paper, just a product review. Point 2- Give specific (factual) backup to specific comments. It's not that we don't trust reviewers to be objective, but we are discussing opinions here, and my opinion may well differ from the Reviewer's if I am given the opportunity to see what his\her opinion was based on. Point 3- Leave out generalisations, at least if we are discussing one, specific product. Generalisations are OK if we are discussing a 'family' of products. Who was it said-- 'All generalisations are false.' Or perhaps I got that wrong? Point 4- If you believe comparisons (with products from other sources) make the review stronger, then by all means put in the comparisons...but at least try to cover the best alternatives to the product being reviewed. Anything less then you are misleading your readers. I know it has been tedious, but that's all I wish to say on the subject for the moment. Perhaps you will find something here to put before future Product Reviewers, when they submit their articles. My hope is that I have sparked a debate which will lead to an even higher standard for what is already a fine publication. Yours sincerely, Phil E. Cogar P.O. Box 364, Narrabeen, N.S.W. Australia 2101 I find myself in complete agreement with your four points. I'm sorry the FOR_C article didn't measure up. Generally I'd just as soon do without "reviews". That's why we've used the label "User Reports". I don't really care if someone gave the product four stars -- I want to know what it's like to use the product. Will it require some changes in my work habits? Does it seem to fit a certain design style better than others? Are certain unobvious tricks necessary to certain goals. If someone has spent enough time with a product to be qualified to evaluate it for other experienced programmers, then that person has also learned several things that aren't in the manuals. Why should I have to relearn those items if I decide to buy the product? The writer should give me the full vicarious benefit of his experience. Here are some of my guidelines for anyone interested in writing a product related story. Don't try to sell the product or your philosophy of how products should be designed, tested, marketed, packaged ... whatever. Instead, tell us what it does and doesn't do. Keep the opinions to a minimum. If you give intelligent, experienced readers access to the facts that produced your opinion, they'll reach a similar (or at least reasonable) opinion on their own. Don't be cute. I don't care how entertaining you think your struggle to remove the shrink wrap was, I don't want to waste time reading about it. Don't guess. If you aren't certain about a particular issue, either find out or don't mention it. Don't just list features. That's the role of vendor literature. Do share all you learned in working with the product. If you include information inappropriate to my audience, I can edit it out. I can't edit in information. I'm acutely aware that we very seldom get product-releated copy that fully measures up to these guidelines. We're always working on getting better copy. -- rlw To The C Users Group: I am disheartened at the lack of truly advanced pioneering books in C programming. Particularly those of a scientific nature. Numerical Recipes in C and Numerical Software Tools in C are the only two that I have heard of, which are primarily argorithm'ic' books without instruction. Everyone seems to be publishing the same link lists, the same databases, and the same TODO lists. Just as in assembly language books one gets the same Ram disks, disk caches and clocks. That is not just book publishers either. Journals and magazines are doing the same thing. I cannot believe that the programming community lacks such expertise. When will publishers realize that enough is enough, and start producing books and articles of a truly advanced nature, like the one you had The Fast Walsh Transform. It is also time for a complete numerical methods book written for C programming in a common compiler (MSC TC) with full descriptions as one would receive in a course in numerical methods at a University. Sincerely, Jerry Rice 504 Eastland St. El Paso, TX 79907 Maybe some qualified author (with a willing publisher) will hear your plea. Why do publishers publish the same material over and over? Perhaps because it sells. One of our earlier issues (with several stories covering the fundamentals of device drivers) remains one of our most popular back issues. Perhaps device drivers are old-hat to you, but to many they remain a mystery. Most of our readers are expert programmers, they just aren't all expert in the same areas.--rlw Using Header Files To Enhance Portability Rainer Gerhards Rainer Gerhards specializes in systems programming and has a strong interest in C. He has written some large-scale control systems and many small utilities in C. He owns his own small software company in addition to managing the computing center of a mid-sized company. He may be contacted at Petronellastrasse 6, 5112 Baesweiler, West Germany. C is known for its efficient code, rich set of features and portability. While portability is not built in, you can avoid possible portability problems by anticipating them. Let's look at a few problem areas, suggest some solutions, and examine one method in detail. One important portability issue is the C dialect that your compiler implements. Although there have always been C language standards, until recently they have been too imprecise to preclude varying interpretations. Early, less powerful machines also forced compiler writers to limit features, contributing additional variant dialects. Thus, some compilers can't understand valid C-coding if it contains unsupported features. Bit fields are a good example. A number of modern compilers still don't support bit fields. Of course, you could avoid using bit fields, but what if you write for one compiler which doesn't support structure and union assignment and for several others which do? You might avoid these constructs too, but would you prefer to learn while porting a 50,000 line program which makes extensive use of structure assignment, that the environment to which you're porting doesn't support structure assignment? The challenge is to know which features to avoid. Now nearly all commercially-used compilers support C in its entirety. But these compilers offer extra features, especially in the preprocessor area. Though you may simply avoid these features, you may not know which features are non-standard, especially if you are new to C or if you work in just one environment. Some compiler vendors don't flag such features. Even an experienced C programmer determined to avoid the problems outlined above by using only standardized constructs still faces the difficulty of deciding which "standard" to use: the original Kernighan and Ritchie (K&R) standard defined in The C Programming Language, or the forthcoming ANSI standard. The ANSI standard resolves many portability problems not addressed by K&R and provides a good base for the future. The ANSI standard is mostly upwardly compatible with K&R; most K&R programs can be moved to ANSI compilers without any problems. But in order to move code in the opposite direction successfully (from ANSI to K&R), compilers require special preprocessor tricks I'll describe later. The standard library poses similar problems. Compiler writers have restricted and extended the library rather than the language. Some compilers don't even have a standard library; many libraries include numerous extensions. MS-DOS compilers in particular tend to offer extensions covering graphics, interrupts, and operating system interfaces. Porting code which uses one compiler's extensions to a different compiler can be very difficult. Operating system differences, because they are the hardest to hide, are among the hardest subjects to address. Moreover, operating systems differ greatly -- some do multi-tasking, some are multi-user, and some are single tasking systems. The file-naming conventions are anything but standardized. These problems are minor compared to the variations in file organization. For example, while most operating systems consider text files to have variable length records (if any), some use fixed-length records (if any). Records may be delimited by \n, \n\r or record-length fields. Some OSs use special blocking mechanisms, others don't. Fortunately most standard libraries can hide these differences, but only by distinguishing between text and binary mode, introducing subtle, non-standard features. In addition to processing files the operating system should have some kind of interaction with the user, which leads to additional problems if you use special system features like asynchronous communication or sophisticated display manipulation. Hardware differences can cause programs that compile and link without error and run well in one environment, to crash in another. Often these problems are caused by different word lengths. It's hard for a UNIX programmer working with the portable C compiler (PCC) on 68xxx to learn that the same PCC on 80x86-based machines uses 16 instead of 32 bits for integers. A 68xxx program that uses integers to index some two million database records on a 68xxx machine may require a major rewrite before it can access more than 32,767 records on the 80x86 machine. Hardware differences can also affect the portability of pointer casts. Many programmers assume that pointers can simply be cast from one type to another -- a reasonable assumption on most byte machines. However, word machines' (like the Unisys 1100) pointers to word-aligned items differ significantly from pointers to non-aligned items. This is true for some so-called byte machines too. Still other problems arise when you port code from machines with a segmented address space to one with a linear address space. The last problem is machine resources. Many programmers assume that if their code is portable and standardized, their program will run on all machines supporting a standard C- compiler. While this is basically true, some programs require so much memory or processing time that they simply can't be run on some smaller machines. Designing For Portability In spite of these problems, it is possible to write C programs that can be compiled and executed in different environments. To be portable, a program must be designed and coded in a fashion that hides environmental differences. C's own design hides many environmental differences. The standard library is a successful attempt to hide some very environment-specific information -- such as the way in which file system (and some others) calls are done on the target operating system. Without the standard library, every programmer would have to write the interface coding himself. Even worse, he would have to rewrite it again and again for each new environment. You can hide other large environment differences by creating your own "standard libraries" for other tasks: extract the non-portable operations to a separate source module, define a general interface for this model and build a different implementation for every environment you want to work with. Many of the high quality portable support library products available do this for you. Such a library provides "instant" portability, lower cost, and more functionality than an equivalent product written by a single programmer. While system-specific libraries are appropriate for horrible, non-portable tasks like dealing with the user console, using a standardized function call for smaller tasks which require only slightly different coding in limited areas of the source code might not make sense. In this case it would not make sense to define a one-line function to set a signal handler under one environment only, especially if the signal-handler is called from inside a tight loop where the calling overhead could cause performance problems. The C preprocessor is the obvious tool for these smaller coding differences: just use conditional compilation to enable the code which sets the signal handler in the one environment where it's needed. You don't have to define a large number of functions, and there is no unnecessary calling overhead. The preprocessor can also help solve problems that arise simply because different names are used for the same thing. For example, nearly every compiler uses its own name for the machine-level i/o (port) functions of MS-DOS compilers (for example inp and outp versus inportb and outportb). Fortunately these functions have the same calling conventions. In this situation, rather than use conditional compilation for every function call parameterized, just use conditional compilation one time to define a macro that in turn calls the function with the right name. Everywhere else, the code uses the macro to call the function. Macro and constant definitions can also completely hide slight differences in standard library paramenters. For example, when working under two different operating systems where the standard libraries have different open modes for text and binary files, you could use the call to open a binary file for writing fp = fopen ("file", OPM_WM) Under UNIX, OPM_WB would be defined "w" and the call would expand to fp = fopen("file", "w") Under MS-DOS (Microsoft C) OPM_WB would be defined "wb" and would expand to fp = fopen("file", "wb") Sometimes a simple define can also hide significant hardware differences. Different data type sizes can be hidden by defining your own data types with a guaranteed minimum and maximum precision. For example, type int32 (integer containing at least 32 bits) would be mapped to int for 68xxx machines and to long for 80x86 machines. If int32 has been used in every spot requiring a 32-bit integer, nothing but the definition needs to be changed to adjust for the alternate name. (Please note that a data type redefinition can be done either with the preprocessor or a compiler typedef. While the former is potentially more portable, so far I have not seen a compiler which does not implement typedef. Thus I prefer using typedef because sophisticated compilers can do better error checking with it. However, if you want to be absolutely sure that your data type redefinition will be accepted by all old compilers, you must use preprocessor defines.) By now it is obvious that the preprocessor can help make programs more portable. What would make more sense than to combine all these preprocessor-based aids? This can be done in a single header file. For nearly two years I have been using such a file, working mainly with four different MS-DOS compilers and the UNIX PCC. The idea developed because of minor standard-library differences between MS-DOS compilers, but it soon became clear that the header file could help when porting to UNIX, too. The still incomplete result will be described below. environ.h All necessary preprocessor statements and typedefs are included in one single file named environ.h (Listing 1). It should be the very first file included. Before including environ.h, you should define which other standard include files you need. This is done by defining some preprocessor constants which correspond to standard include file functionality. You read right, functionality -- not names. For example, if you select the define INCL_ASSERT, not only will the file assert.h be included but the necessary (for MS-DOS/MSC) file process.h also. If you compile under UNIX, only assert.h is included. Defining these constants in terms of functionality hides the include file name differences -- an important feature that saves you many conditional directives in the source modules. Microsoft uses a similar system for their OS/2 header files in MSC 5.1. When completely defined for your environment, environ.h should #include all include files needed by your application. If you find it necessary to explicitly include other files, you should extend the definitions in environ.h. They are still incomplete (see lines 274 - 401). environ.h begins by preventing the accidental inclusion of a header file more than once. Multiple inclusion may cause damage to some preprocessor defines. At best, it will cause additional overhead, and at worst, program errors may occur. To prevent these problems environ.h checks preprocessor constant ENVIRON_H. If this constant is defined, environ.h assumes that it has been previously included and takes no further steps (via the #ifndef ENVIRON_H in line 26). If ENVIRON_H is not defined, then this is the first inclusion of environ.h and processing takes place. First ENVIRON_H is defined, ensuring that no second inclusion will be possible. Next, based on which compiler and operating system are active, ENVIRON_H defines the target environment. Information about the environment is acquired in a relatively straightforward way (lines 29 - 165). Operating-system specific constants that may be defined automatically by the compiler are purged -- they will be replaced with your own. The #undef of the default definitions is not actually necessary, but it will prevent possible warning messages from appearing when redefining the compiler default constants. The #undefs are followed by defines which select the target OS. Only one may be active at one time. Note the definition to 0 or 1. You could also define only one OS constant and use #ifdef instead of #if CONSTANT == 1 but this has the disadvantage that K&R compilers have no "#if defined(CONSTANT)". Without this command it is hard to build complex preprocessor-ifs using #ifdef and #ifndef because you can't use Boolean operators. If you define the constants to 0 and 1, you can build normal conditional expressions. This is an advantage if you consider that you must often ask questions like #if MSDOS && USE_BIOS Following the OS definition there are some auxiliary definitions used only under specific OS to identify the target machine. Currently these apply only to certain generic MS-DOS machines within compatible hardware or BIOS requiring actual MS-DOS calls (as opposed to BIOS calls or direct hardware manipulation). The only common example is the early Wang PCs, for which there is a separate definition. The operating system definitions are followed by the compiler definitions. A specific compiler selection is only necessary if more than one is available under one OS. In my case this is only needed for MS-DOS. But as you can see in environ.h there is only a definition for MSC. All other compilers I use identify themselves by doing an automatic constant definition upon startup (e.g., ___TURBOC___ for Borland's Turbo C). Note that the MSC constant is overridden if one of the other predefined constants is detected or an OS other than MS-DOS is active (lines 88 - 106). This feature simplifies proper configuring of the header-file. Separate constants for each compiler to allow conditional compilation for small compiler differences. To avoid code like "#if MSC DLC LC ___TURBOC___ .... "we introduce some language set selection constants (lines 70 - 76). Each define corresponds to one language feature. If the constant is equated to true (1) that language feature can be used, otherwise it cannot. All other decisions are based on these feature selection constants and are much more readable. Now the example given above takes the more intelligible form #if USE_VOID. To avoid modifying all language selection constants each time you change compilers, environ.h includes an automatic language set selection which automatically redefines the language set constants based on the compilers' and OS definitions. While auto selection is currently only functional in the MS-DOS environment, it can easily be expanded to work under different operating systems (lines 129 - 164). To complete the environment definition, environ.h defines the constant ANSI_C to 0 or 1 in respect to the compilers' C standard (K&R/ANSI) (lines 119 - 127). This constant is currently set based on the state of a language feature selection (like USE_VOID), but could become more important in the future. The example header file still lacks one feature, a definition check. All definitions are accepted as entered. If, for example, the programmer defines two or more operating systems to 1 the behavior of environ.h is undefined but clearly erroneous. This could be avoided by checking the entered definitions to see if two or more definitions are true and aborting compilation if so: #if MSDOS && UNIX "Error: Both MSDOS and UNIX selected" #endif This code ask for the error condition and generates a compile-time error if it detects one. The error message generated by the compiler points at the real error message in the source module. Examples can be found in CUG library volume 227 (compatible graphics) in file graphics.h. This file contains extensive definition checking. So far environ.h has supplied definitions that allow conditional compilation in the source units but no automatic porting aids. The balance of the file addresses this second need. Different compiler data types and modifiers can be hidden largely by preprocessor defines. For example, if the compiler doesn't support the void keyword, just define void to nothing, and the void keyword will disappear. Since you didn't use void originally when writing for that compiler, this disappearance will cause no problems. Your coding can now be used with compilers that support void without any additional work. That is the key feature of modifier definition: you can hide all data type and modifier differences by simply defining the data type in question to nothing (as in lines 167 - 195 in environ.h). Here's another example: if a compiler doesn't support the volatile modifier, it normally doesn't do the strange optimizations that force you to use volatile (or they can be turned off), so there is no problem in purging all volatile modifiers in your source. This kind of type redefinition allows you to use the types on machines supporting them without losing backward compatibility. If an older compiler doesn't support these type modifiers, their extra value is gone but your program still runs without problems. Most data types and modifiers can be treated in this manner. (In some cases you may instead redefine the type to something different -- e.g. define void to int instead of purging it). However, some types and modifiers, like enum, can't simply be redefined to nothing or to some other value. If you try to redefine these types, your program won't compile due to the syntax differences between defining a "normal" data item and an enum one. Defining an enum is a process nearly identical to defining a structure or union. Special definitions are required. You can't hide them by one general define. You still can use enum on supporting and non-supporting compilers, but you must define all your enum types using conditional compilation. If the compiler supports enum, you can use it without difficulty. If not, you define an int type and use the preprocessor to define the enum tags: "#if USE_ENUM typedef enum { A, B } enumtype; #else typedef int enumtype; #define A 0 #define B 1 #endif" This clearly entails more programming work but allows the use of extended error checking features of compilers that support enum. You can define your own data types to hide hardware differences, especially machine word length differences. They ("personal types") have a guaranteed minimum and maximum precision and are mapped to the actual hardware data type. By relying on these "personal types," you can write programs that work on different machines in an expected manner, and you can take memory requirements into account because there is a guaranteed MAXIMUM precision. This problem wasn't critical to me, so the example header file contains only very limited support (lines 258 - 261). Please note that typedefs are used instead of preprocessor defines. The next problem area is that of standard library function names and calling conventions. For example, calling exit() in C will commonly terminate your program gracefully. Under the Starsys OS, exit() is an OS call something like abort(). The real exit() function has been called dx_exit(). This causes problems to all but a few programs and would normally require text modifications. But that's exactly what the preprocessor can do for you: if you're running under Starsys, just define a macro named exit which takes one parameter (the return value). It will expand to a call to dx_exit() with that given parameter (line 234 - 236). A similar technique hides the variations among library functions with different names but identical calling parameters and functionality. Example macro definitions can be found a few lines above the exit() macro. File open modes are addressed in lines 241 - 253. Please note that not all open modes are supported, but the definitions can be easily expanded. Function Prototyping Unfortunately, ANSI function prototyping is not supported in every environment. Rather than sacrificing the extended error checking features that prototyping offers by not using it at all, you can use prototyping when the compiler supports it and turn it off when it does not. Turning off function prototyping is a little harder than turning off an unknown modifier. First you must build two classes of function prototypes, external and internal, corresponding to external and static functions. The external prototype macros appear in lines 197 - 211. This macro expands to extern func() for a K&R compiler and to extern func(int) for ANSI compilers. Please note the extra parentheses around int in the PROTT definition. These parentheses become part of the macro argument and are re-expanded. After expansion, they are the function parentheses of extern func(int). These parentheses are especially important if you want to prototype a function with more than one argument. If there were no inner braces, the macro would have two arguments, which would force you to write one prototyping macro for every number of function arguments you will ever use. Given these inner braces the whole prototype is one macro argument and only one prototyping macro will satisfy all needs. Normally you write a function header only once for each internal function. It is more difficult to hide these prototypes: modern ANSI's style is to write argument types and names in the function header (e.g. static func(int a)), while K&R's style is to write the argument names only (static func (a)). Fortunately ANSI compilers accept function headers written in K&R style, but usually don't build prototypes for such headers. One solution is to write the prototype first and then to write the actual function header (STATICPT(func, (int));\n static func()). In this case the function prototype defines the function first as extern to prototype it (just as is done in application header files). While this has worked well with all ANSI-compilers I know of, I'm not certain that it is guaranteed to be legal under ANSI-standard. At first glance you may wonder why the prototype does not have the form static func PROTT((int)) and in fact I am not sure if these constructs are legal. Most compilers accept the functions to be declared to extern and later redefined to static. However, the MSC compiler doesn't accept this construct and generates error messages (at least QC does; CL accepts them with warnings). Instead, MSC allows both the function prototype and the actual function header to be declared static -- the approach used in environ.h. If MSC is active, the prototype attribute is redefined to static. To do this the macro must have control over the whole prototype line, not just part of it. So a new construct has been created. The macro has two parameters: the function name and the prototype. It expands to the correct modifier followed by the function name and (if selected) the function prototype. This may be a somewhat unusual macro construct, but remember that the C preprocessor is mainly a text substitution tool and not part of the actual compilation process. This allows the preprocessor to make some very strange modifications to the C source code, including constructs like the static function prototyping which cannot be done by any C statement. Building such unusual constructs can give very simple solutions to otherwise intractable problems. The STATICPT() macro can be found between lines 197 and 211. Conclusions As you can see, the environmental header file environ.h can aid in writing portable programs, especially in the problem areas of data type, modifier and name differences. In addition, some machine specifics can be hidden and some newer constructs mapped to work with older compilers. On the other hand, the header file can't hide some differences (e.g. different mechanisms for interacting with the user console). Such differences require special coding that normally should be contained in external modules. But the header file can help you write these modules too by precisely defining the target environment. Precise functional definitions are the basis for selecting the right code sequences in the low-level driver modules (assuming that coding for more than one environment can be contained in one source unit). The definitions will aid you in activating slightly different source lines which you may have in your program. Thus, a larger porting system is built using three modules. First, the environment header file describes the environment and hides all differences possible using the preprocessor and typedefs (mainly text substitutions). Second, libraries of standardized functions handle larger problem areas that actually require different coding. Third, conditional compilation within the source modules hides very small differences where the text-substitution capabilities of the preprocessor are insufficient and a special function call makes no sense. This last option should be limited to cases where it is absolutely necessary, because conditional compilation is not really portable programming, but is rather having code for all known environments. If you switch to a new environment, you must not only write new coding but also look for a problem area in the source file. To avoid these problems I recommend flagging these lines with special comments (e.g./*PORT*/). Related code can be found in the CUG library holdings. Volume CUG227 contains a compatible graphics system which makes extensive use of the preprocessor's text substitution capabilities. Volume CUG265, the cpio starter kit, contains a header file similar to the one discussed here. It also contains programs using it. Listing 1 1: /* 2: *e n v i r o n. h 3: * ----------------- 4: * This module contains environment specific information. 5: * It's used to make the programs more portable. 6: * 7: * @(#)Copyrigth (C) by Rainer Gerhards. All rights reserved. 8: * 9: * Include-file selection defines are: 10: * 11: * Define Class 12: * --------------------------------------------------------- 13: * INCL_ASSERT assert macro and needed functions 14: * INCL_CONIO low-level console i/o 15: * INCL_CONVERT conversion and classification functions 16: * INCL_CTYPE ctype.h 17: * INCL_CURSES curses.h 18: * INCL_LLIO low-level i/o 19: * INCL_MEMORY memory acclocation/deallocation functions 20: * INCL_MSDOS MS-DOS support 21: * INCL_PROCESS process control 22: * INCL_STDIO stdio.h 23: * INCL_STDLIB standard library functions 24: * INCL_STRING string handling functions 25: */ 26: #ifndef ENVIRON_H 27: #define ENVIRON_H 28: 29: #undef MSDOS 30: #undef OS2 31: #undef UNIX 32: #undef STARSYS 33: 34: /* 35: * configurable parameters. 36: * modify the following parameters according to the target environment. 37: */ 38: 39: /* 40: * define target operating system 41: */ 42: #define MSDOS 0 43: #define UNIX 0 44: #define OS2 1 45: #define STARSYS 0 46: 47: /* 48: * define target machine 49: * 50: * This is auxiluary data only needed for some operating 51: * systems. Currently only needed if MS-DOS is active. 52: */ 53: #define IBM_PC 1 /* IBM PC, XT, AT & compatibels */ 54: #define WANG_PC 0 /* Wang PC, APC ... */ 55: 56: /* 57: * define target compiler (if neccessary) 58: */ 59: #undef MSC 60: #define MSC 1 /* Microsoft C */ 61: 62: #define AUTO_SEL 1 63: /* 64: * The above #define allowes an automatic language set selection. It is 65: * only functional if the used compiler identifies itself via a #define. 66: * 67: * Note: If AUTO_SEL is set, the parameters below are meaningless! 68: */ 69: 70: #define USE_FAR 0 /* use far keyword */ 71: #define USE_NEAR 0 /* use near keyword */ 72: #define USE_VOID 1 /* use void keyword */ 73: #define USE_VOLA 0 /* use volatile keyword */ 74: #define USE_CONST 0 /* use const keyword */ 75: #define USE_PROTT 0 /* use function prototypes */ 76: #define USE_INTR 0 /* use interrupt keyword */ 78: /* +--------------------------------------------------------+ 79: * End Of Configurable Parameters 80: * +--------------------------------------------------------+ 81: * Please do not make any changes below this point! 82: */ 83: 84: #ifdef SYMDEB 85: # define SYMDEB 0 86: #endif 87: 88: /* 89: * Check target_compiler. Note that the MSC switch is overriden if 90: * either __TURBOC__ or DLC are defined. 91: */ 92: #ifdef __TURBOC______LINEEND____ 93: # undef MSC 94: #endif 95: #ifdef DLC 96: # undef MSC 97: #endif 98: #if STARSYS 99: # undef MSC 100: #endif 101: 102: #if !(MSDOS OS2) 103: # undef MSC 104: # undef AUTO_SEL 105: # define AUTO_SEL 0 106: #endif 107: 108: #if OS2 109: # undef MSC 110: # define MSC 1 111: # undef AUTO_SEL 112: # define AUTO_SEL 1 113: #endif 114: 115: /* 116: * Compiler ANSI-compatible? 117: * (First we assume it's not!) 118: */ 119: #define ANSI_C 0 120: #ifdef MSC 121: # undef ANSI_C 122: # define ANSI_C 1 123: #endif 124: #ifdef TURBO_C 125: # undef ANSI_C 126: # define ANSI_C 1 127: #endif 128: 129: #if AUTO_SEL 130: # undef USE_FAR 131: # undef USE_NEAR 132: # undef USE_VOID 133: # undef USE_VOLA 134: # undef USE_CONST 135: # undef USE_PROTT 136: # undef USE_INTR 137: # ifdef __TURBOC______LINEEND____ 138: # define USE_FAR 1 139: # define USE_NEAR 1 140: # define USE_VOID 1 141: # define USE_VOLA 1 142: # define USE_CONST 1 143: # define USE_PROTT 1 144: # define USE_INTR 1 145: # endif 146: # ifdef DLC 147: # define USE_FAR 1 148: # define USE_NEAR 1 149: # define USE_VOID 1 150: # define USE_VOLA 1 151: # define USE_CONST 1 152: # define USE_PROTT 1 153: # define USE_INTR 0 154: # endif 155: # ifdef MSC 156: # define USE_FAR l 157: # define USE_NEAR 1 158: # define USE_VOID 1 159: # define USE_VOLA 1 160: # define USE_CONST 1 161: # define USE_PROTT 1 162: # define USE_INTR 1 163: # endif 164: #endif 165: 166: 167: #if !USE_FAR 168: #define far 169: #endif 170: 171: #if !USE_NEAR 172: #define near 173: #endif 174: 175: #if !USE_VOID 176: #define void 177: #endif 178: 179: #if !USE_VOLA 180: #define volatile 181: #endif 182: 183: #if !USE_CONST 184: #define const 185: #endif 186: 187: #if USE_INTR 188: # ifdef MSC 189: # define INTERRUPT interrupt far 190: # else 191: # define INTERRUPT interrupt 192: # endif 193: #else 194: # define INTERRUPT 195: #endif 196: 197: #if USE_PROTT 198: # define PROTT(x) x 199: # ifdef MSC 200: # define STATICPT(func, prott) static func prott 201: # else 202: # define STATICPT(func, prott) extern func prott 203: # endif 204: #else 205: # define PROTT(x) () 206: # ifdef MSC 207: # define STATICPT(func, prott) static func () 208: # else 209: # define STATICPT(func, prott) extern func () 210: # endif 211: #endif 212: 213: #ifdef MSC 214: # define inportb(port) inp(port) 215: # define outportb(port, val) outp(port, val) 216: #endif 217: 218: #ifdef__TURBOC______LINEEND____ 219: # define REGPKT struct REGS 220: #else 221: # define REGPKT union REGS 222: #endif 223: 224: #ifdef DLC 225: # define defined(x) 226: # define inportb inp 227: # define outportb outp 228: #endif 229: 230: #if !SYMDEB /* symbolic debugging support */ 231: # define STATICATT static 232: #endif 233: 234: #if STARSYS 235: # define exit(x) dx_exit(x) 236: #endif 237: 238: /* 239: * Define open modes according to selected operating system/compiler. 240: */ 241: #if MSDOS 0S2 242: # define OPM_WB "wb" 243: # define OPM_WT "wt" 244: # define OPM_RB "rb" 245: # define OPM_RT "rt" 246: #endif 247: 248: #if UNIX 249: # define OPM_WB "w" 250: # define OPM_WT "w" 251: # define OPM_RB "r" 252: # define OPM_RT "r" 253: #endif 254: 255: #define TRUE 1 256: #define FALSE 0 257: 258: typedef unsigned char uchar:; 259: typedef int bool; 260: typedef unsigned short ushort; 261: typedef unsigned long ulong; 262: 263: #define tonumber(x) ((x) - '0') 264: #define FOREVERL() for(;;) 265: 266: /* 267: * Select #include-files depending on target compiler and OS. 268: * 269: * Phases: 270: * 1. Define all include selection constants to true or false. 271: * 2. Select actual include files and include them. 272: * 3. #Undef all include selection constants. 273: */ 274: #ifndef INCL_STDIO 275: # define INCL_STDIO 0 276: #else 277: # under INCL_STDIO 278: # define INCL_STDIO 1 279: #endif 280: #ifndef INCL_CURSES 281: # define INCL_CURSES 0 282: #else 283: # undef INCL_CURSES 284: # define INCL_CURSES 1 285: #endif 286: #ifndef INCL_CTYPE 287: # define INCL_CTYPE 0 288: #else 289: # undef INCL_CTYPE 290: # define INCL_CTYPE 1 291: #endif 292: #ifndef INCL_ASSERT 293: # define INCL_ASSERT 0 294: #else 295: # undef INCL_ASSERT 296: # define INCL_ASSERT 1 297: #endif 298: #ifndef INCL_LLIO 299: # define INCL_LLIO 0 300: #else 301: # undef INCL_LLIO 302: # define INCL_LLIO 1 303: #endif 304: #ifndef INCL_PROCESS 305: # define INCL_PROCESS 0 306: #else 307: # undef INCL_PROCESS 308: # define INCL_PROCESS 1 309: #endif 310: #ifndef INCL_MEMORY 311: # define INCL_MEMORY 0 312: #else 313: # undef INCL_MEMORY 314: # define INCL_MEMORY 1 315: #endif 316: #ifndef INCL_STRING 317: # define INCL_STRING 0 318: #else 319: # undef INCL_STRING 320: # define INCL_STRING 1 321: #endif 322: #ifndef INCL_STDLIB 323: # define INCL_STDLIB 0 324: #else 325: # undef INCL_STDLIB 326: # define INCL_STDLIB 1 327: #endif 328: #ifndef INCL_CONVERT 329: # define INCL_CONVERT 0 330: #else 331: # undef INCL_CONVERT 332: # define INCL_CONVERT 1 333: #endif 334: #ifndef INCL_MSDOS 335: # define INCL_MSDOS 0 336: #else 337: # undef INCL_MSDOS 338: # define INCL_MSDOS 1 339: #endif 340: #ifndef INCL_CONIO 341: # define INCL_CONIO 0 342: #else 343: # undef INCL_CONIO 344: # define INCL_CONIO 1 345: #endif 346: 347: #if INCL_STDIO && !(INCL_CURSES && UNIX) 348: # include 349: #endif 350: #if INCL_CURSES && UNIX 351: # include 352: #endif 353: #if INCL_CTYPE INCL_CONVERT 354: # include 355: #endif 356: #if INCL_ASSERT 357: # include 358: # ifdef MSC 359: # undef INCL_PROCESS 360: # define INCL_PROCESS 1 361: # endif 362: # ifdef __TURBOC______LINEEND____ 363: # undef INCL_PROCESS 364: # define INCL_PROCESS 1 365: # endif 366: #endif 367: #if INCL_LLIO 368: # ifdef MSC 369: # include 370: # include 371: # endif 372: #endif 373: #if INCL_PROCESS 374: # ifdef MSC 375: # include 376: # endif 377: #endif 378: #if INCL_MEMORY 379: # include 380: #endif 381: #if INCL_STRING 382: # if ANSI_C 383: # include 384: # endif 385: #endif 386: #if INCL_STDLIB INCL_CONVERT 387: # if ANSI_C 388: # include 389: # endif 390: #endif 391: #if INCL_CONIO 392: # ifdef __TURBOC______LINEEND____ 393: # include 394: # endif 395: # ifdef MSC 396: # include 397: # endif 398: #endif 399: #if MSDOS && INCL_MSDOS 400: # include 401: #endif 402: 403: 404: /* 405: * Purge utility #defines. 406: */ 407: #undef INCL_STDIO 408: 409: #endif Writing Standard Headers: The String Functions Dan Saks Dan Saks is the owner of Saks & Associates, which offers training and consulting in C and C++. He is a member of X3J11, the ANSI C committee. He has an M.S.E. in computer science from the University of Pennsylvania. You can write to him at 287 W. McCreight Ave., Springfield, OH 45504 or call (513) 324-3601. In a recent letter to The C Users Journal, Phil Cogar of N.S.W. Australia complained that much of the C source code appearing in this and other programming journals contains references to headers such as that are not published along with the code. He observed that if your compiler provides these headers, then typing in the code and getting it to run is usually easy; without them, it may be impossible. He has a legitimate complaint, but as editor Robert Ward points out in his response, it's often impractical to publish the headers with the code. (See The C Users Journal, October 1989, p.138.) To get the programs to run, you can write your own standard headers to go with your existing compiler and library. Although writing an entire Standard C library from scratch is a big chore, you can fill many of the gaps in an existing library by yourself in only a few days. The Standard Headers The fifteen headers specified by the Standard are summarized in Table 1. Most of them declare a set of related library functions, along with any macros and types needed to call them. A few headers don't contain any functions; they simply define useful macros and types that have nowhere else to go. Some macros and types appear in more than one header, but each function is declared only once. Most compilers supply additional headers. For example, UNIX compilers add headers such as , and . Many MS-DOS compilers supply some of the UNIX headers, along with others such as , and . None of these headers is covered by the C Standard. Some UNIX headers have been formalized by the IEEE 1003.1 POSIX Portable Operating System Standard, but many aren't covered by any non-proprietary standard. A C program using library headers other than those listed in Table 1 will not be portable to all Standard C implementations. A program accesses the contents of a standard header by referencing the header in an include directive, such as #include Headers are often referred to as "include files" because they are almost always implemented as source files with the same names. Other implementations are permitted, and so the Standard is careful not to refer to them as files. Nevertheless, "headers" and "include files" are generally understood to mean the same thing. Determining What You Already Have Before starting to fix your standard headers, you should look to see what you already have. Headers are usually easy to locate them. For example, on UNIX systems the headers for cc are usually in /usr/include (see the subheading FILES on the manual page(s) for cc(1) in your UNIX manual). The default setup for Turbo C on MS-DOS places the headers in \turboc\include. Most MS-DOS compilers do something similar. The headers for DECUS C on my PDP-11 are in the same subdirectory as my compiler executables, which is a subdirectory with the logical name C:. You should not be surprised to find that you already have several of the standard headers. The standard library is not pure invention; it's the result of an effort to "codify common existing practice." You will almost certainly find a version of -- the only standard header used by Kernighan and Ritchie in the first edition of The C Programming Language. is also extremely common. Beyond that, it's hard to say just how many headers you're likely to find. For example, the DECUS C compiler has only four of the standard headers: , , , and . The UNIX 4.2 BSD compiler (cc) has these four, plus , , , and . It also has , which is very similar to . Turbo C 2.0, Microsoft C 5.1 and Zortech C 1.07 (all for MS-DOS) have every header except , but very few of the headers among all three compilers are exactly as they should be. Where To Put New Headers Before you start creating and modifying headers, you should think about where to put them. You can throw caution to the wind and put the new headers in the same directory as your existing ones (assuming you have the access rights), but then you run a serious risk that some of your old code won't work with the new headers. I recommend creating a directory for your new headers and reconfiguring your compiler environment to search this new directory before it searches the old one. Remove the new headers from the search if you have to. Compiler environments vary so much that I can't explain how to do this for everyone, but I will show you what I've done on a few different systems: On UNIX 4.2 BSD: I put the new headers in a subdirectory /usr/include within my home directory (/u/dsaks). I wrote a shell script called cc that simply contains /bin/cc -I/u/dsaks/usr/include $* This script invokes the UNIX C compiler (in /bin) with the -I option. -I tells the compiler to search for include files in the named directory before searching in the standard places. The $* passes all the arguments to the cc script through to the C compiler. I put this script in /u/dsaks/usr/bin, and added this directory name to my shell path variable. I made the script executable by using chmod +x cc This cc command compiles with the new headers. If I need to omit them, I simply rename the command with mv cc cc.new so the cc command reverts to the one in /usr/bin (without -I). On MS-DOS 3.0 and higher: I put the original headers for Microsoft C and Quick C in \ms\include, and my new headers in \ms\usr\include. Both compilers support the -I option, so you can create a cc.bat command file like the UNIX shell script. Yet, Microsoft gives you an easier alternative. The Microsoft compilers use the INCLUDE environment variable to define the search path for include files. I use two different command files to configure the compiler environment. My msnew.bat uses set INCLUDE=c:\ms\usr\include;c:\ms\include to put the new headers in the search path, while msold.bat uses set INCLUDE=c:\ms\include to take them out. Other MS-DOS compilers require slightly different approaches. Zortech's command line compiler, ZTC, uses the INCLUDE environment variable just like Microsoft C, but their integrated environment, ZED, gets its search path from a configuration file maintained by a utility called ZCONFIG. Borland's Turbo C lets you specify the search path in a file called TURBOC.CFG. Consult your compiler user's guide for details. On RT-11 V5.0 and higher: The DECUS C compiler has a built-in preprocessor that's virtually useless. Fortunately, the compiler is distributed with MP, a decent preprocessor from the UNIX User's Group. My compilation command files disable the built-in preprocessor (with the /M compiler switch) and use MP instead. MP has a preset search path for include files. First it looks in the directory with the logical name LB:, then it looks in C:, and finally it looks in SY:. I put the original headers in a directory assigned to C: and the new headers in another directory assigned to LB:. I can remove the new headers from the search by deassigning LB:. I'll begin with because it's often missing and yet is easy to create. Once you have it, you'll use it frequently. (see Table 2) declares the string handling functions in the library. It also declares one macro, NULL, and one type, size_t, that are needed to use these functions. There is no universal way to define NULL ---- you tailor the definition to your machine's architecture. The easiest way to obtain a definition for NULL is to steal one from . If you can't find a definition there or in some other header, then you should probably use #define NULL ((void *)0) if your compiler supports the void * type, or #define NULL ((char *)0) if it doesn't. If you know that your pointers have the same size as type int, you can use simply #define NULL 0 If the pointers on your machine have the same size as type long int, you can use #define NULL OL I prefer to use the casts to determine the size of NULL. However, I suspect you'll find that one of the latter two forms is already used in your existing headers. Whichever form you choose, use it consistently. Most MS-DOS C compilers provide pointers in two different sizes, near and far. The headers in these compilers use conditional compilation to select the appropriate definition for NULL, something like #ifdef _NEAR_POINTERS #define NULL 0 #else #define NULL OL #endif If your needs a definition like this, you should find it in one of your existing headers. (For more insight into the possible definitions for NULL, see "Doctor C's Pointers: The 'NULL' Macro and Null Pointers" by Rex Jaeschke in The C Users Journal, Sept/Oct, 1988.) NULL is defined in several standard headers. The headers may be included in any order, and a given header may be included more than once, so you must insure that the repeated definitions for NULL don't conflict with each other. Most implementations permit "benign" macro redefinitions (repeated definitions formed by identical sequences of tokens) as specified in the Standard. In this case, make all the definitions the same. If your preprocessor doesn't allow any redefinitions, you will have to put a "protective wrapper" around each one, as in #ifndef NULL #define NULL ((void *)O) #endif size_t is the type of the result of the sizeof operator. The Standard says that it should be an unsigned integral type, so use either typedef unsigned size_t; or typedef unsigned long size_t; You can select the appropriate definition using the program in Listing 1. In many C implementations, sizeof yields a signed int value. You should still define size_t as unsigned, so that operations on objects of that type have the proper unsigned behavior. You can always use size_t to cast the possibly negative result of sizeof to its 'true' unsigned value, as in if ((size_t)sizeof(something_big) > 0) For more about size_t and sizeof, see "Doctor C's Pointers: Exploring the Subtle Side of the 'sizeof' Operator" by Rex Jaeschke in The C Users Journal, Feb., 1988 or see Rex's book, listed in References. As with NULL, size_t appears in several standard headers. The Standard and many implementations do not allow typedef redefinitions (even "benign" ones) in the same scope, so you may need a protective wrapper around each definition. For example #ifndef _SIZE_T_DEFINED typedef unsigned size_t; #define _SIZE_T_DEFINED #endif You don't have to use the name _SIZE_T_DEFINED. Any identifier beginning with an underscore followed by an upper-case letter or another underscore will do. The Standard reserves these names for the implementation of the compiler (of which the headers are part). Since benign macro redefinitions are usually allowed, you may be tempted to define size_t as #define size_t unsigned in order to eliminate the protective wrapper. I have seen this done in some "ANSI-conforming" compilers. Although you will probably never notice the difference, the macro definition is wrong because it changes the scope of size_t. Use the typedef. And now for the functions. Most older C compilers don't support prototypes, so you might have to delete or "comment out" the parameter lists. Some functions return void *. If your compiler won't accept that type, use char *. You will find that your library contains some, but not all, of the string functions. Sometimes you will find a standard C function under an archaic name. Many recent books on C have an appendix that details the functions in the standard library. (See references at the end of the article.) You should compare the functions in the standard library with the functions in your compiler's library to find as many matches as you can. For example, some implementations use index instead of strchr. In this case, you could declare strchr as char *index(); #define strchr(s, c) index(s, c) but there is a hazard. If you forget that strchr is really index, and write another function called index, you will inadvertently redefine strchr. (This is an excellent way to test your debugging skills.) This macro definition should only be used as an interim fix until you add a compiled version of the missing function to the run-time library. What about functions that are completely missing? Should you still put their declarations in ? The answer is a definite maybe. Suppose that memchr is missing from your library. memchr returns a void *, but if you leave the declaration out of , the compiler will assume it returns an int. When you compile char *p, s[10]; p = memchr(s, 'x', 10); you may get a spurious warning about an illegal pointer assignment, but compilation will continue. You won't know what's really happening until the linker reports that memchr is undefined. Under these circumstances, you should declare memchr in the header to eliminate the unnecesary warnings. If you use a Lint-like program checker that can detect undeclared functions (or if your compiler has such an option), then don't declare functions that are missing from the library. When you reference a missing function, you will still get a meaningful error message, but won't have to wait for the linker to tell you what you already know. Listing 2 shows the that I use on UNIX 4.2 BSD. It includes some interim macro definitions for missing functions. The #ifndef ... #endif wrapper around the entire header prevents repeated compilation of the declarations if the header is included more than once. The wrapper isn't needed for protection since you can redeclare functions (provided all declarations in the same scope are the same), and everything else in the header is either benign or protected. I added the wrapper to simplify debugging. While debugging macros, I sometimes look at the preprocessor output to verify the expansions. Eliminating redundant headers from preprocessor output makes it easier to read. The comment at the header's beginning is not in the wrapper so it still appears wherever the header is included, even if the rest of the header does not. One final word of caution. In Listing 2, strlen is declared to return a size_t, even though strlen is actually defined in the library to return an int. On machines where a signed int to unsigned int conversion performs no transformation of the data (as on twos-complement machines), strlen returning a size_t is perfectly safe. On other machines, you should leave the declaration as int strlen(); so that the compiler can recognize that size_t n; n = strlen(s); involves a signed to unsigned conversion and generate the proper code. You should also cast the result of strlen to size_t whenever strlen is used in an expression with other ints, such as if ((size_t)strlen(s) > 0) This is the same technique used with sizeof when it returns an int. Conclusion In this article I've tried to show why it's impossible to just publish a single portable version of the standard headers. The headers provide a portable definition of the Standard C environment, but they do it in a non-portable way. Rather than writing the missing string functions in the library, I suggest you write the remaining standard headers. Doing so solves more portability problems and gives you the definitions you need to compile new library functions as you write them. In , you've already seen many of the design problems, so most of the remaining work is simply determining what goes into the other headers. References Darnell, Peter and Margolis, Philip, Software Engineering in C (1988, Springer-Verlag). Gardner, James, From C to C: An Introduction to ANSI Standard C (1989, Harcourt Brace Jovanovich). Jaeschke, Rex, Portability and the C Language, (1989, Hayden Books). Plauger, P.J. and Brodie, Jim, Standard C (1989, Microsoft Press). Ritchie, Dennis and Kernighan, Brian, The C Programming Language, 2nd. ed. (1988, Prentice-Hall). Table 1 Standard Headers assert.h - program diagnostics ctype.h - character testing and case mapping errno.h - error reporting float.h - floating type characteristics limits.h - integral type sizes locale.h - local customs math.h - mathematics setjmp.h - non-local jumps signal.h - signal handling stdarg.h - variable-length arguments lists stddef.h - common definitions stdio.h - input and output stdlib.h - general utilities string.h - string handling time.h - date and time utilities Table 2 Summary of Macros: NULL Types: size_t Function Prototypes: void *memchr(const void *, int, size_t); int memcmp(const void *, const void *, size_t); void *memcpy(void *, const void *, size_t); void *memmove(void *, const void *, size_t); void *memset(void *, int, size_t); char *strcat(char *, const char *); char *strchr(const char *, int); int strcoll(const char *, const char *); int strcmp(const char *, const char *); char *strcpy(char *, const char *); size_t strcspn(const char *, const char *); char *strerror(int); size_t strlen (const char *); char *strncat(char *, const char *, size_t); int strncmp(const char *, const char *, size_t); char *strncpy(char *, const char *, size_t); char *strpbrk(const char *, const char *); char *strrchr(const char *, int); size_t strspn(const char *, const char *); char *strstr(const char *, const char *); char *strtok(char *, const char *); Listing 1 /* * write the definition for size_t */ #include main() { printf("typedef unsigned%s size_t;\n", sizeof(sizeof(int)) == sizeof(int) ? "" : "long"); } Listing 2 /* * string.h - string hadling (for cc on UNIX 4.2 BSD) */ #ifndef _STRING_H_INCLUDED #define NULL ((char *)0) #ifndef _SIZE_T_DEFINED typeder unsigned size_t; #define _SIZE_T_DEFINED #endif char *strcat(); int strcmp(); char *strcpy(); size_t strlen(); char *strncat(); int strncmp(); char *strncpy(); /* * interim macro definitions for functions */ char *index(); #define strchr(s, c) index(s, c) extern int sys_nerr; extern char *sys_errlist[]; #define strerror(e) \ ((e) < sys_nerr ? sys_errlist[e] : "?no message?") char *rindex(); #define strrchr(s, c) rindex(s, c) /* * missing functions */ char *memchr(); int memcmp(); char *memcpy(); char *memmove(); char *memset(); int strcoll(); size_t strcspn(); char *strpbrk(); size_t strspn(); char *strstr(); char *strtok(); size_t strxfrm(); #define _STRING H_INCLUDED #endif UNIX 'termcap' Facility Improves Portability By Hiding Terminal Dependencies Ronald Florence Ronald Florence is a novelist, sheep farmer, occasional computer consultant, and UNIX addict. He can be reached at ron@mlfarm or ... {hsi,rayssd}!mlfarm!ron. For programmers accustomed to writing for single-user systems, UNIX (and Xenix) holds some quick surprises. All those carefully optimized, hand-coded screens, the lightning-fast displays that write to the screen buffer, even "well-behaved" routines that rely on BIOS calls, are suddenly useless. Terminal displays, including the console, are treated as teletype devices under UNIX. To perform even the simplest screen display function, such as clearing the screen, the program must send the proper screen control sequence. In effect, all screen displays are comparable to using the ANSI.SYS driver under MS-DOS. If the UNIX system had only a single terminal or if only one type of terminal were used on the system, it would be easy enough to hand-code the proper screen control sequences. Indeed, even if several different terminals are used on a system, the screen control sequences can be hand coded. For example, the function in Listing 1 could be used to clear screens. For a closed system where most of the output is teletype format, with only simple screen display commands, your programs may not need much more. But what if the system is not closed? What if there are outside logins using a variety of terminals? And what if you want to write screen displays that utilize a wide range of terminal capabilities, including automargins and optimized cursor motion, and make sure those displays are scaled to the size of the terminal display? And what if some of the terminals using the system require padding at certain speeds or have other quirks that make them unsuitable or tricky to use with fancy screen display programs? It is possible to keep adding options to code like Listing 1, but by the tenth terminal type, the code starts to look like linguini. The alternative is to use the termcap and terminfo databases of screen display parameters and control sequences which are provided with most UNIX systems. Termcap, which was developed at Berkeley, uses an ASCII database; the terminfo database is compiled. A curses library of screen display and terminal input functions is supplied with both systems. Terminfo is theoretically faster; it supports many terminal capabilities which are normally not encoded into the termcap database, and the curses library supplied with terminfo has many capabilities which are not supported under termcap curses. The termcap database is substantially easier to modify, and there are ways to incorporate many of the capabilities of the terminfo curses into programs running on termcap systems. This article will discuss only termcap, which is used by Xenix and by most BSD systems. The UNIX documentation describes the termcap routines as "low level" and the curses routines as "higher level," in much the way that troff/nroff is a low level formatting package, and the formatting macro packages (MM or MS) are high level. Actually, the analogy is not really appropriate. Curses is a screen optimization package with some convenient windowing functions. Termcap is a straightforward package of functions to access the database of screen and keyboard control sequences. The termcap database is normally in the file /etc/termcap. Comments in the file are prefaced with a # character. All lines which do not begin with the # are considered part of the database. Each entry in the database represents a different terminal. The entry begins with alternate names of the terminal, separated by characters. Usually the first name listed for the terminal is a special two-character abbreviation, used by some older programs. The second name is used by most utilities, such as the editor vi. The last name listed is the full name of the terminal, and is the only name which can have blanks inserted for readability. Thus: d1vt100vt-100pt100pt-100dec vt100: are the names of a DEC vt-100. If you add terminal descriptions to the termcap database, make sure that every name in your addition is unique. The capabilities of the terminal are listed after the name, separated from one another by colons. Newlines in the entry must be escaped with a backslash. The capabilities are strings, boolean, or integers. Most are mnemonic. Boolean capabilities are true if named. Strings follow an equals sign (=). Integers follow a #. There are no spaces or tabs within capabilities or between them, and an entry carried to a second line must repeat the :. Thus: MTmytermMy Special Terminal:\ bs:am:cl =\E[J:ho=\E[H:lines#24: indicates that myterm can backspace (bs), has automatic margins (am), that there are 24 lines displayed on the screen, and gives the sequences that should be sent to clear the screen (cl) and home the cursor (ho). Several special sequences are used to encode the strings:\E is the escape character (0x1b); ^X is "Control-X" or any other control key; \n, \r, \t, \b, and \f are newline, carriage return, tab, backspace, and form feed; \^ is ^, and \\ is \; All non-printing characters may be represented as octal escapes; the :, which is used to separate capabilities in each entry, must be entered as \072 if used in a string. Null characters can be entered as \200 because the routines that process termcap entries strip the high bits of the output late, so that \200 comes out \000. Padding can be encoded into the strings by prefacing the string with an integer, representing milliseconds of delay. An integer and a * indicate that the delay is proportional to the number of lines involved in the execution of the command. When the * is used, the delay can be stated in tenths of a millisecond, so that 3.5* before the string for ce (clear to end of line) would mean that the command requires 3.5 milliseconds of padding for each line that is to be cleared. Terminals which are identical to another entry with few exceptions can make use of the tc string and the @ negator. NTnewtermMy alternate terminal:lines=25:@bs:tc=vt100: describes a terminal with 25 lines, no backspace capability, but otherwise identical to a vt100. One caution in using entries with tc encoding: programs with a fixed stack (such as Xenix 286) may crash when reading tc encoded entries. The cure is to make the stack larger with the -F option on the compile command line. The cursor addressing string (cm) is coded with printf-like escapes. These are described in detail in the termcap (M) entry in the UNIX documentation. In addition to the regular termcap capabilities, which begin with lower case letters, some UNIX systems utilize extensions. Xenix uses a variety of upper case termcap entries to indicate special PC keys: PU for Page Up, EN for End, GS for Start-Character-Graphics-Mode, and pseudo-mnemonics for eight-bit PC graphics drawing characters. GNU Emacs uses upper- case capabilities to describe terminal command sequences which are not generally used in termcap, such as AL and DL for adding and deleting multiple lines. Programs which use these extended termcap capabilities may not be portable to other UNIX systems. The termcap library provides functions to retrieve the encoded information from the database. The termcap routines first search the environment for a TERMCAP variable. If it is found, does not begin with a slash, and the terminal type matches the environment string TERM, the TERMCAP string is read. If it begins with a slash, it is read as the pathname of the termcap database (instead of the default /etc/termcap). Using the environment variable instead of searching the database will speed up the development of new termcap entries. If your system has a tset command which supports separate TERM and TERMCAP environment entries, it will also speed the startup of programs which use termcap. One obvious use for the termcap database is in displaying formatted text to the screen. Although there are wordprocessing programs available to run under UNIX and/or Xenix, much text processing in UNIX systems is done by using an editor (vi or emacs) to prepare the text with nroff/troff formatting codes, usually with one of the macro packages such as MM. The formatted file is then piped to a printer or type-setter, or to a screen display for proofing. Although it is possible to prepare nroff terminal driving tables to encode the screen control sequences needed for such formatting features as bold type, italics or underlining, a different table would have to be encoded and compiled for each terminal, and the user would have to indicate the terminal type on the nroff command line: nroff -cm -Tmyterm myfile Also, the nroff terminal driving table format was created when daisy-wheel printers were the cutting edge of desktop hardcopy capabilities, and the coding is sometimes awkward to adapt to the capabilities of a terminal display. For simple text formatting, it is easier to parse the default nroff output, which uses backspaces and overstrikes to generate underlined or bold characters, and use termcap to look up the appropriate standout (bold) and underline sequences. The program in Listing 2 (Bold.c), uses termcap library functions to look up the terminal screen control sequences for so and se (standout start and standout end), us and ue (underline start and underline end), and sg, which is an integer coded quantity indicating how many spaces the attribute change to standout mode requires. For terminals with multiple fonts, the switchover to italic font could be encoded in us, so that underlined text would be displayed in italics. A bold screen attribute could be encoded in so and se, so that bold text would be displayed in bold font, instead of in reverse video. Alternately, new termcap entries could be created to hold the screen control sequences for bold or italic fonts. The termcap access functions are simple and straightforward. To parse the database, you need to allocate a buffer of 1024 characters (tbuf in Listing 2), to hold the entire termcap entry as it is retrieved by tgetent(). This buffer must be retained through all calls to the three functions which parse capabilities: tgetstr(), tgetflag(), and tgetnum(). Another buffer (sbuf in Listing 2) should be allocated for the strings which will be retrieved by tgetstr(). This should be a static buffer. The tgetstr() function is passed the address of a pointer to this buffer. As string capabilities are read, they are placed in the buffer, and the pointer is advanced. Using a static buffer saves the overhead of allocating space for each string as it is retrieved. The termcap library also provides a function tputs(), which correctly sends screen control sequences to the display, including any needed padding. tputs() requires a pointer to a user-supplied function which can display a single character. The function prch() (Listing 1) invokes the macro putchar(). Although it is not used here, the termcap library includes one other function, tgoto(), which uses the cm (cursor movement) string to go to a desired column and line. Because togoto() will output tabs, programs which make tgoto() calls should turn off the TAB3 bit when setting the line protocol. The function putout() in Listing 2 is not really necessary. It is used here to check for insertions of ^G (0x7) in the text files. ^G was chosen because it passes through nroff transparently. It is used to trigger expanded font in files sent to the printer. In Bold.c, it triggers the insertion of a space between characters to simulate expanded font. Termcap can also be used to retrieve the sequences sent by non-ASCII keys, like the arrow or functions keys. Although the termcap curses library does not use the arrow or functions keys, the keys can be added to programs which use curses for screen control by making a second set of termcap calls (curses makes it own calls to termcap), and then reading for the arrow or function key sequences in a getkey() routine (see Listing 3, keys.c). Reading arrow keys for terminals which use a single character code for each arrow (such as ^H, ^J, ^K, ^L) is simple, but many terminals, such as the PC console, send escape-prefaced strings (ESC[A, ESC[B, etc.) when the arrow keys or other non-ASCII keys are pressed. Some systems may balk at reading strings with a simple read() system call. It is worth fiddling with the VMIN and VTIME values in structure termio if you cannot read key sequences with the code in getkey(). The values in function fixquit() in Listing 3 are a good start. The alternative is to put the strings together out of characters read one at a time. This may be the most reliable technique for an editor or other program that reads repeated sequences of fast input characters that might be misinterpreted, such as an ESC followed by a [ and an alphabetic character, which an ANSI terminal might interpret as a screen control sequence. The trick if you are reading a character at a time is to distinguish between a lone ESC (0x1b) and an ESC sent as the first character of an escape sequence. One technique is to set a timeout alarm. If you get the characters that would constitute a key string before the timeout, return the key string, otherwise return an ESC followed by individual characters. The whole procedure takes tinkering, and fast typists can foul it up. Hence, using a read() call is simpler. One problem that can arise with the arrow key is that ^\, the UNIX "quit" character, is used as an arrow key on some terminals. Even if the "quit" signal is disabled, the keys will still be intercepted. The easiest fix to the problem is to change the "quit" key to an impossible value. The function fixquit() does this. The global variable ttytype is set by the curses termcap routines, which in this program are called before lookupkeys(). The ttytype could be set by a call to getenv(), as in the code for Listing 1. The header file in Listing 4 (keys.h) defines integer equivalents for the arrow and function keys; these defines can be used in switch statements. (The values given are those used in the terminfo header files.) What termcap cannot do is to optimize screen output by cutting down the overhead of repeated cursor movement sequences. The output routines in the curses library do a fair job and are simple to use. The code for life.c in Listing 5 uses these routines along with the arrow key routines from key. c, and while the speed of output cannot compare with an optimized routine writing directly to screen memory, it is quick enough on a console or a terminal running at 19,200 baud. Highly optimized screen output which requires even more efficiency could mean a journey into the treacherous code of screen display routines which calculate the cost of each move. One such package is the display routines in the Gosling Emacs code, which quite properly carries a dire warning to those who would venture into the tangles of the code. Listing 1 cls () { char *getenv(), *term, *cl; term = getenv ("TERM"); if (!strcmp(term, "ansi")) cl = "\033[2J\033[H"; else if (!strcmp(term, "wy50")) cl = "\033*"; /* add other terminals ... */ /* if all else fails, try a form feed */ else cl = "\f"; fputs (cl, stderr); } Listing 2 /* * Bold.c - filters nroff output for terminal display * displays bold in SO, underlines, expanded font * copyright 1987 Ronald Florence */ #include #define UL 01 #define BOLD 02 #define ULSTOP 04 #define Bold() tputs(so, 1, prch), att = BOLD #define Stopbold() tputs(se, 1, prch), att &= ~BOLD #define Uline() tputs(us, 1, prch), att = UL #define Stopuline() tputs(ue, 1, prch), att &= ~(ULULSTOP) prch(c) register char c; { putchar(c); } char *so, *se, *us, *ue; main() { static char sbuf[256]; char tbuf[1024], *fill = sbuf, *tgetstr(), *getenv(); register a, c; int i, att = 0; if (tgetent(tbuf, getenv("TERM")) == 1 && tgetnum("sg") < 1) { so = tgetstr("so", &fill); se = tgetstr("se", &fill); us = tgetstr("us", &fill); ue = tgetstr("ue", &fill); } a = getchar(); while ((c = getchar()) != EOF) { if (a == '_') { if (c == '_' && (att & UL) == 0) Uline(); else if (c == '\b') /* nroff italics */ { if ((a = getchar()) == EOF) a = 0; c = getchar(); if ((att & UL) == 0) Uline(); } if (c != '_' && (att & UL)) /* c is the last underline */ att = ULSTOP; } else if (c == '\b') { if ((c = getchar()) != a) { /* Not a bold: print the character and pass the \b to be printed. */ putout(a); a = '\b'; } else { if ((att & BOLD) == 0) Bold(); for (i = 0; i < 5; i++) if ((c = getchar()) != a && c != '\b') break; } } else if (att & BOLD) Stopbold(); putout(a); if (att & ULSTOP) Stopuline(); a = c; } } putout(c) register char c; { static int expanded; if (c == 07) /* ^G signals expanded font */ { expanded++; return(0); } putchar(c); if (expanded) { if (c == '\n') expanded = 0; else putchar(' '); } } Listing 3 /* * keys.c - gets arrow and function keys from termcap, * returns terminfo codes * changes quit key for use as arrow * * define NO_SYSV for versions of curses that do not look up * arrow & function keys from termcap * * copyright 1988 Ronald Florence * changed VMIN & VTIME for wy99 @ 9600 ron@mlfarm (7/11/88) */ #include #ifndef KEY_DOWN #include "keys.h" #endif #define NKEYS 16 char #ifdef NO_SYSV *tcap_ids[] = { "kd", "ku", "k1", "kr", "kh", "kb", "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7", "k8", "k9", 0 }, #endif *fkeys[NKEYS]; lookupkeys() { #ifdef NO_SYSV static char sbuf[256]; char **key, tbuf[1024], *fill = sbuf, *tgetstr(); int i = 0; tgetent(tbuf, ttytype); for (key = tcap_ids; *key; ++key) fkeys[i++] = tgetstr(*key, &fill); #else fkeys[0] = KD; fkeys[1] = KU; fkeys[2] = KL; fkeys[3] = KR; fkeys[4] = KH; fkeys[5] = KB; fkeys[6] = K0; fkeys[7] = K1; fkeys[8] = K2; fkeys[9] = K3; fkeys[10] = K4; fkeys[11] = K5; fkeys[12] = K6; fkeys[13] = K7; fkeys[14] = K8; fkeys[15] = K9; #endif fixquit(); } getkey() { char cmd[6]; register k; k = read(0, cmd, 6); cmd[k] = '\0'; for (k = 0; k < NKEYS; k++) if (strcmp(cmd, fkeys[k]) == 0) return (k + KEY_DOWN); return ((int) *cmd); } fixquit() { struct termio new; ioctl(0, TCGETA, &new); new.c_cc[VQUIT] = 0xff; /* in case QUIT is an arrow */ new.c_cc[VTIME] = 1; /* minimum timeout */ new.c_cc[VMIN] = 3; /* three characters satisfy */ ioctl(0, TCSETA, &new); } Listing 4 /* * keys. h * copyright 1988 Ronald Florence * * use with curses programs that need extended keyboard * (if tcap.h does not include the defines) */ #define KEY_DOWN 0402 #define KEY_UP 0403 #define KEY_LEFT 0404 #define KEY_RIGHT 0405 #define KEY_HOME 0406 #define KEY_BACKSPACE 0407 #define KEY_F0 0410 #define KEY_F(n) (KEY_F0 + (n)) Listing 5 /* life.c copyright 1985, 1988 Ronald Florence compile: cc -O -s life.c keys.c -lcurses -ltermcap -o life */ #include #include #ifndef KEY_DOWN #include "keys.h" #endif #define ESC 0x1b #define life '@' #define crowd (life + 4) #define lonely (life + 2) #define birth (' ' + 3) #define minwrap(a,d) a = --a < 0 ? d : a #define maxwrap(a,d) a = ++a > d ? 0 : a #define wrap(a,z) if (a < 0) (a) += z; \ else if (a > z) (a) = 1; \ else if (a == z) (a) = 0 #define MAXX (COLS-1) #define MAXY (LINES-3) #define boredom 5 typedef struct node { int y, x; struct node *prev, *next; } LIFE; struct { int y, x; } pos[8] = { { 1,-1}, {1, 0}, {1, 1}, {0, 1}, {-1, 1}, {-1, 0}, {-1,-1}, { 0,-1} }; LIFE *head, *tail; extern char *malloc(); char *rules[] = { " ", "The Rules of Life:", " ", " 1. A cell with more than three neighbors", " dies of overcrowding.", " 2. A cell with less than two neighbors", " dies of loneliness.", " 3. A cell is born in an empty space", " with exactly three neighbors.", " ", 0 }, *rules2[] = { "Use the arrow keys or the vi cursor keys", "(H = left, J = down, K = up, L = right)", "to move the cursor around the screen.", "The spacebar creates and destroys life.", " starts the cycle of reproduction.", " ends life.", " ", "Press any key to play The Game of Life.", 0 }; main(ac, av) int ac; char **av; { int i = 0, k, die(); initscr(); crmode(); noecho(); signal(SIGINT, die); lookupkeys(); head = (LIFE *)malloc(sizeof(LIFE)); /* lest we have an unanchored pointer */ tail = (LIFE *)malloc(sizeof(LIFE)); head->next = tail; tail->prev = head; if (ac > 1) readfn(*++av); else { erase(); if (COLS > 40) for ( ; rules[i]; i++) mvaddstr(i+1, 0, rules[i]); for (k = 0; rules2[k]; k++) mvaddstr(i+k+1, 0, rules2[k]); refresh(); while (!getch()) ; setup(); } nonl(); while (TRUE) { display(); mark_life(); update(); } } die() { signal(SIGINT, SIG_IGN); move(LINES-1, 0); refresh(); endwin(); exit(0); } kill_life(ly, lx) register int ly, lx; { register LIFE *lp; for (lp = head->next; lp != tail; lp = lp->next) if (lp->y == ly && lp->x == lx) { lp->prev->next = lp->next; lp->next->prev = lp->prev; free(lp); break; } } display() { int pop = 0; static int gen, oldpop, boring; char c; register LIFE *lp; erase(); for(lp = head->next; lp != tail; lp = lp->next) { mvaddch(lp->y, lp->x, life); pop++; } if (pop == oldpop) boring++; else { oldpop = pop; boring = 0; } move(MAXY+1, 0); if (!pop) { printw("Life ends after %d generations.", gen); die(); } printw("generation - %-4d", ++gen); printw(" population - %-4d", pop); refresh(); if (boring == boredom) { mvprintw(MAXY, 0, "Population stable. Abort? "); refresh(); while (!(c = getch())) ; if (toupper(c) == 'Y') die(); } } mark_life() { register k, ty, tx; register LIFE *lp; for (lp = head->next; lp; lp = lp->next) for (k = 0; k < 8; k++) { ty = lp->y + pos[k].y; wrap(ty, MAXY); tx = lp->x + pos[k].x; wrap(tx, MAXX); stdscr->_y[ty][tx]++; } } update() { register int i, j, c; for (i = 0; i <= MAXY; i++) for (j = 0; j <= MAXX; j++) { c = stdscr->_y[i][j]; if (c >= crowd c >= life && c < lonely) kill_life(i, j); else if (c == birth) newlife(i, j); } } setup() { int x, y, c, start = 0; erase(); y = MAXY/2; x = MAXX/2; while (!start) { move(y, x); refresh(); switch (c = getkey()) { case 'h' : case 'H' : case ('H' - '@'): case KEY_LEFT: case KEY_BACKSPACE: minwrap(x, MAXX); break; case 'j' : case 'J' : case ('J' - '@'): case KEY_DOWN: maxwrap(y, MAXY); break; case 'k' : case 'K' : case ('K' - '@'): case KEY_UP: minwrap(y, MAXY); break; case '1' : case 'L' : case ('L' - '@'): case KEY_RIGHT: maxwrap(x, MAXX); break; case ' ' : if (inch() == life) { addch(' '); kill_life(y, x); } else { addch(life); newlife(y, x); } break; case 'q' : case 'Q' : case ESC : ++start; break; } } } newlife(ny, nx) int ny, nx; { LIFE *new; new = (LIFE *)malloc(sizeof(LIFE)); new->y = ny; new->x = nx; new->next = head->next; new->prev = head; head->next->prev = new; head->next = new; } readfn(f) char *f; { FILE *fl; int y, x; if ((fl = fopen(f, "r")) == NULL) errx("usage: life [file (line/col pts)]\n", NULL); while (fscanf(fl, "%d%d", &y, &x) != EOF) { if (y < 0 y > MAXY x < 0 x > MAXX) errx("life: invalid data point in %s\n", f); mvaddch(y, x, life); newlife(y, x); } fclose(fl); } errx(m,d) char *m, *d; { fprintf(stderr, m, d); endwin(); exit(0); } Fitting Curves To Data Michael Brannigan Micheal Brannigan is President of Information and Graphic System, IGS, 15 Normandy Court, Atlanta, GA 30324 (404) 231-9582. IGS is involved in consulting and writing software in computer graphics, computational mathematics, and data base design. He is presently writing a book on computer graphics algorithms. He is also the author of The C Math Library EMCL, part of which are the routines set out here. Fitting curves to data ranks as one of the most fundamental needs in engineering, science, and business. Curve fitting is known as regression in statistical applications and nearly every statistical package, business graphics package, math library, and even spreadsheet software can produce some kind of curve from given data. Unfortunately the process and underlying computational mathematics is not sufficiently understood even by the software firms producing the programs. It is not difficult, for example, to input data for a linear regression routine to a well known statistical package (which I shall not name) used on micros and mainframes for which the output is incorrect. Constructing a functional approximation to data (the formal act known as curve fitting) involves three steps: choosing a suitable curve, analyzing the statistical error in the data, and setting up and solving the required equations. Choosing a suitable curve is a mixture of artistic sensibility and a knowledge of the data and where it comes from. Analyzing statistical error can be something of a guessing game and requires some thought. Setting up and solving the equations is computationally the most interesting. It is here that many programs fail because they use computationally unstable methods, but more of that later. The number of methods for data fitting is legion and we suggest some in this article. However, we give only one method in full and consider only 2-D data. Anyone interested in other specific data fitting techniques may contact the author. Problem Given data points (xi,yi)i=1,...,n we suppose there exists a relation yi = F(xi) + ei, i = 1,...,n where F(x) is an unknown underlying function and ei represents the unknown errors in the measurements yi. The problem is to find a good approximation f(x) to F(x). We thus need a function f to work with and some idea, however minimal, of the errors. How To Choose f f will have variable parameters whose correct values (the values that solve the approximation problem) are found by solving a system of equations, each data point defining one equation. We call the function f linear or non-linear if it is linear or non-linear in its parameters. Consider some of the general principles involved in choosing a suitable function f. We must have more data points than parameters, otherwise f will fit the data exactly and we will not model the errors. Unless absolutely necessary, don't use a non-linear f; solving systems of non-linear equations uniquely is, except for special cases, nearly impossible. In most cases polynomials are not a good choice; they are wiggly curves and nearly always wiggle in the wrong places. The best option in most cases is to use piecewise polynomials. The example we give is a piecewise cubic polynomial such that the first derivatives are continuous everywhere. (You can, of course, use cubic splines if you want second derivatives to be continuous, but in most cases the example set out here is superior for a general purpose curve fitting routine. If you want the full cubic spline, please use the B-spline formulation, no other, otherwise you get unstable systems of equations resulting in incorrect solutions. Using the B-spline formulation for spline approximation, you need only change the routine coeff_cubic() in the program given in this article. The system of equations is solved by the same routines.) Once f has been chosen and applied to each data point, we obtain a system of linear equations to solve, where the number of equations will be greater than the number of unknowns. Such a system is called an overdetermined system and no exact solution exists -- one that is exactly what we want. However, overdetermined systems have an infinite number of inexact (approximate) solutions; we will seek an approximation that minimizes some particular error measure. (Mathematicians call these error measures "norms". Thus the problem of curve fitting becomes an optimization problem.) Of the infinite possible norms three should be considered for any curve fitting package: the L1-norm, the L2-norm (least squares norm), and the minimax (Chebyshev) norm (These norms are defined later in this article.). Fortunately good algorithms exist for solving overdetermined systems of linear equations in all three norms. For the L1-norm and the minimax norm, you use a variation of the simplex method of linear programming; for the L2-norm you use a QR decomposition of the matrix in preference to the computationally unstable method of solving the normal equations. (We cannot give all the program code here as space is limited but for more guidance the reader can contact the author.) Of many possible combinations the following solution is a good general-purpose option. Solution We have data points (xi,yi)i=1,...,n. Let each xi belong to some interval [a,b]. Specify k points Xj j=1,...k, on the X-axis, we call these points knots. These knots are such that a = X1 < X2 < < Xk = b We can now define our function as follows: for each x in the interval [Xj,Xj+1] define the cubic polynomial y = [(d3 - 3dp2(x) + 2p3(x))Yj + (dp(x)q2(x))Yj' + (3d - 2p(x))p2(x)Yj+1 + dp2(x)q(x)Yj+1']/d3 where d = Xj+1 - Xj p(x) = x - Xj q(x) = Xj+1 - x Thus y is a cubic polynomial with the linear parameters Yj, Yj+1,Yj',Yj+1', which are the values and first derivatives at the knots Xj and Xj+1 respectively. For each data point we obtain one linear equation so we can set up n linear equations in the 2k unknowns Y1,Y1',..Y, k,Y k'. In matrix form this can be written as AY = b where A is a block diagonal matrix, Y is the vector of unknowns, and b is the vector of y values. Because A is block-diagonal, for very large data sets optimal use should be made of the structured sparsity. With the same knots we could also define cubic B-splines and then fit a cubic spline to the data. We would again arrive at an overdetermined system of linear equations with a matrix of coefficients having block-diagonal structure. In fact the equations we have set out above form a cubic spline with each knot Xj a double knot. Choosing A Norm For each possible solution Y we have errors si i=1,...,n such that AY - b = s where s is the vector of si values. The L1-norm is defined to be †abs(si) i The L2-norm or least squares norm is (†si2)1/2 i And the minimax or Chebyshev norm is max(abs(si) :i=1,...,n) We solve the overdetermined system of equations by finding that vector Y which minimizes one of these norms. The choice of norm depends on the unknown errors ei and we hope that the choice of norm will give errors si that will mirror these unknown errors. The general rule is: choose the L1-norm if the ei are scattered (belong to a long tailed distribution); choose the L2-norm if the ei are normally distributed; choose the minimax norm if the ei are very small or belong to a uniform distribution. Research has indicated that data sets have errors nearer to the L1-norm than the L2-norm. (Errors in data are never normally distributed, neither as they are nor in the limit. This assumption of normally distributed errors is common in most packages, the user should question this assumption very carefully.) So when you don't know how the errors are distributed, use the L1-norm. The minimax norm is rarely used for fitting curves to experimental data. However, always use the minimax norm if you want to fit a function to another function, for example fitting a Fourier series to a complicated function where you know the values exactly. Whichever norm you choose, the computer solution of the equations is not straightforward. You must choose an algorithm that is computationally stable. (A computationally unstable algorithm is one that is mathematically correct but when fed into a computer, produces wrong answers. For example solving linear equations without pivoting, or solving quadratic equations from the well-known formula. So get some professional help in choosing the algorithm.) Program After you have spent some time analyzing your particular data fitting problem, decided upon a suitable function to approximate the data, and also decided upon the norm to use for the errors in the data, you must program the result. Unless your application requires special functions, then the approximating function set out above is a good general purpose function. The programming for this function or any other has the same form. The system of equations is set up with one equation for each data point, and then the system is solved with the required norm. For the function described here the programming is just as straightforward. The main routine is Hermite(), named after the mathematician who defined these piecewise polynomials. The routine first gives the user the choice (by setting the variable flag) of either setting the k knots lambda[] on input or using the routine app_knots() to compute the knots. In most cases the user will never just use the routine once but compute a first approximation then alter the position of the knots for a second approximation. For a first approximation set flag to true and use app_knots() to compute the knots automatically. Then look at the result and choose new knots. A more sophisticated method automatically chooses the number of knots k and their position. Once the knots are defined the routine allocates space for the matrix A of size nx2k. After making sure all elements of the matrix are zero, the routine calls coeff_cubic() to set up the coefficients of the matrix. Now the program solves the overdetermined system in the appropriate norm. The variable norm is set by the user to indicate which norm to use. (We do not give here the three routines that solve the overdetermined system of equations as they require lots of space, but the reader can find the algorithms in most computational mathematics textbooks.) The routine L1_approx() uses the Ll-norm, the routine CH_lsq() uses the least squares norm, and the routine Cheby() uses the minimax norm. With the solution from the appropriate routine, the function now fits the data. Some words on the other routines. First, the routine app_knots() will compute k knots lambda[j] so that in each interval (lambda[j], lambda[j+1]) there are approximately the same number of x values. This is a good starting point for our Hermite approximation and for any spline approximation that needs knots. The routine coeff_cubic() is merely a direct translation of the formulae. This routine uses interval(), which finds to which knot interval each x value belongs. coeff_cubic() also uses the macro SUB() to move around the matrix (this is my preferred method for dealing with matrices passed as parameters). Finally there is the routine app_cubic(). This routine uses the results from Hermite() to compute the piecewise polynomial for any value of x. Thus app_cubic() completes the curve fitting problem. Example An example (using data from actual measurements of the horizontal angular displacement of an elbow flexion against time) will show how the pieces fit together. There are 142 measured points and these measurements are quite accurate (the experimenters knew the kind of instruments they were using--see the paper by Pezzack, et al). In this instance a close fit to the data points is required. In all the figures the dark circles are the knots and the crosses are the data points. The solution is in Figure 1. Figure 2 shows the result when the L2-norm is used. Figure 3 shows the result when the minimax norm is used. As would be expected with such "clean" data, the answers are all quite good, the best being Figure 1. To illustrate the behavior in the presence of noise, add some significant errors to the same data points. Using the same curve approximation method, then Figure 4, Figure 5, and Figure 6 show the result when using the L1- norm, L2-norm, and minimax norm respectively. As theory suggests, the Ll-norm gives definitely superior results. This example is a straightforward application of the method set out here -- well, nearly! You may be asking the six thousand dollar question, "How do I choose the knots?" The answer is not straightforward and contemporary research has different answers. As you can see from the figures, the number and position of the knots changes for each example. The goal is to choose the number of knots and their position so as to give the best fit possible for the norm chosen--easy to say but not easy to compute. All the knots in each figure have been chosen according to an information theoretic criterion, plus a little experience on the placement of knots. The idea behind this method is to attempt to extract the maximum amount of information from the data points until only error remains. To do this we need a computable value for the amount of information contained in the errors si; we suggest using the Akaike Information Criterion. The routine changes the number of knots and their position until there is no more information in the errors. For those readers who wsh to go further into this problem, see the papers by Brannigan for a full mathematical treatment of this method, the information theoretic criterion, and an extension to multivariate data. Bibliography Pezzak, J.C. et al. "An Assessment of Derivatives Determining Techniques Used for Motion Analysis," J. Biomechanics, 10 (1977). Brannigan, M. "An Adaptive Piecewise Polynomial Curve Fitting Procedure for Data Analysis," Commun. Statist. A10(18), (1981). Brannigan, M. "Multivariate Data Modelling by Metric Approximants," Comp. Stats. & Data Analysis 2, (1985). Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Listing 1 Coeff_Cubic void coeff_cubic (a,p,q,x,y, lambda,k) /* * Set up the equations for the Hermite cubic approximation. */ double *a,*x,*y,*lambda; int p,q,k; { double d,alpha,beta,d3,alpha3; int i,j,col; for (i=0; i x[j-1]) ? 1:0; return(j); } double app_cubic (x,j,lambda,res) /* * Given the result res[] from the routine Hermite() find the value * of y for the given x value. */ double x,*lambda,*res; int j; { double d,alpha,beta,d3,alpha3,sum,val[4]; int i, col; col = 2*(j-1); d = lambda[j] - lambda[j-1]; alpha = x-lambda[j-i]; beta = d-alpha; d3 = d*d*d; alpha3 = alpha*alpha*alpha; val[0] = (d3-3.O*d*alpha*alpha+2.0*alpha3)/d3; val[1] = d*alpha*beta*beta/d3; val[2] = (3.0*d-2.0*alpha)*alpha*alpha/d3; val[3] = -d*alpha*alpha*beta/d3; for (sum=0.0,i=0; i<4; i++) sum += val[i]*res[col+i]; return (sum); } Hermite (Listing 2) #define SUB(i,j,k) (i)*(k)+(j) double Hermite (x,y,n,norm,lambda,k,flag,res,err) /* * Given n data points (x[],y[]) find the Hermite cubic approximation * to this data using the k nots lambda[]. If flag = true then find the * knots from the routine app_knots() otherwise lambda[] is set by the * user. The 2k result is returned in res[] and the error at each point * is returned in err[].The overdetermined system of equations is * solved with respect to the value of norm, uses L1-norm if norm = 1, * uses the L2-norm if norm = 2, and uses the minimax norm if norm = 3. * The return value z is the size of the resultant norm. */ double *x,*y,*lambda,*res,*err; int n,norm,k,flag; { double *a,z; int i,j,l,kk,m,m2; /* * Find whether the knots are to be computed. */ if (flag) app_knots (x,n,lambda,k); /* * Now form the system of equations one equation per data point. */ m2 = n*2*k; /* * Allocate space for the matrix. */ a = (double*)calloc(m2,sizeof(double)); if (a==0) printf ("\n NO DYNAMIC SPACE AVAILABLE"); else { for (i=0; i2) { i = n/(k-1); j = (n-(i*(k-3)))/2; lambda = x[j]; if (k>3) { s = j; for (t=2; t #include #define M_POINTER 0 /* mouse shapes */ #define M_CROSS 1 #define ON 1 #define OFF 0 #define MAX_OBJECT 100 #define ESC 27 #define BOX 'b' /* object types we support */ #define ELLIPSE 'e' #define LINE 'l' #define TEXT 't' #define M_MAIN 1 /* handles for the menus */ #define M_FILE 2 #define M_OBJ 3 #define M_ACT 4 #define A_COPY 1 /* action requests for button() */ #define A_MOVE 2 #define A_EDIT 3 #define min(a,b) ((a) < (b) ? (a) : (b)) #define max(a,b) ((a) > (b) ? (a) : (b)) typedef struct { int type, l, t, r, b; char select, *data; } Object; Object objects[MAX_OBJECT]; /* the table of objects defined so far */ int last_object; /* the end of the object table */ int map[] = { /* maps a M_OBJ menu item to an object */ 0, BOX, ELLIPSE, LINE, TEXT }; char *about = /* form used on the M_MAIN About item */ " Draw This! byMark A. Johnson %{continue}"; char *help = /* help message for wrong keyboard input */ "quit refresh : box line ellipse text : delete copy move edit"; char filename[20]; /* save the filename we're working with */ char text[100]; /* extra buffer for text i/o */ int actn_obj = 0; /* flag for button(), some action req */ int make_obj = 0; /* flag for button(), need to create */ int slct_cnt = 0; /* count of selected objects */ int first; /* helps make_object collect points */ int grid = 0; /* grid displayed, snap coords */ extern int Maxx, Maxy, MaxColor; /* start routine, called by the application driver, gets things going */ start(argc, argv) char **argv; { add_menu(M_MAIN, "Main:AboutQuitRefreshGrid"); add_menu(M_FILE, "File:ReadWriteSavePrint"); add_menu(M_OBJ, "Objects:BoxEllipseLineText"); add_menu(M_ACT, "Actions:DeleteCopyMoveEdit"); menu_state[M_ACT, 0); if (argc> 1) { strcpy(filename, argv[1]); read_objects(); } } /* no timers in this application , but DCUWCU needs an entry anyway */ timer() {} /* button routine called every time button 1 is depressed */ button(b, x, y) { if (make_obj) { /* need points to make an object */ make_object(x, y); } else if (actn_obj) { /* got a point for a copy or move */ action_object(x, y); } else { /* do a selection */ select_object(in_object(x, y)); } check_menu(); } /* menu routine called every time a menu item is selected */ menu(m, i) { char junk = 0, on; switch (m) { case M_MAIN: /* main menu */ switch (i) { case 1: form(about, &junk); break; case 2: quit(); break; case 3: refresh(); break; case 4: do_grid(); break; } break; case M_FILE: /* file menu */ if (i < 3 && !get_name()) break; switch (i) { case 1: read_objects(); break; case 2: case 3: write_objects(); break; case 4: print(); break; } break; case M_OBJ: /* objects] */ start_make(map[i]); break; case M_ACT: /* actions */ switch (i) { case 1: kill_object(); break; case 2: start_actn(A_COPY); break; case 4: start_actn(A_MOVE); break; case 4: start_edit(); break; } break; } } /* routine called everytime a key is struck */ keyboard(c) { switch (c) { case 'p': print(); break; case 'g': do_grid(); break; case 'r': refresh(); break; case 'q': quit(); break; case 'b': start_make(BOX); break; case 't': start_make(TEXT); break; case 'l': start_make(LINE); break; case 'm': start_actn(A_MOVE); break; case 'c': start_actn(A_COPY); break; case 'd': kill_object(); break; case 'e': if (slct_cnt) start_edit(); else start_make(ELLIPSE); break; default: msg(help); } } /* time to go, see if they really want to */ quit() { char yes = 0, no = 0; char *f_exit = "Are you sure? %{yes} %{no}"; if (form(f_exit, &yes, &no) && no == 0) finish(); } /* * miscellaneous support routines */ /* reset the current grid size */ do_grid() { char gridval, ok = 0, nok = 0, x; switch (grid) { case 8: gridval = 1; break; case 16: gridval = 2; break; default: gridval = 0; break; } x = form("Change Grid Size %[none:8:16]%{ok} %{cancel}", &gridval, &ok, &nok); if (x == 0 nok) return; grid = gridval * 8; refresh(); } /* print the current screen somewhere, Epson-compatible graphics mode */ print() { static char grhd[] = { ESC, 'L', 0, 0 }; /* 960 bit graphics */ static char grlf[] = { ESC, 'J', 24, '\r' }; /* line feed */ static char prbuf[960]; int x, y, i, b, n, any, pixel, max; max = min(Maxx, 960); grhd[2] = max; grhd[3] = max >> 8; mouse_state(OFF); b = 0x80; any = 0; for (y = 0; y < Maxy; y++) { for (x = 0; x < max; x++) { if (getpixel(x, y)) { any = 1; prbuf[x] = b; } } b >>= 1; if (b == 0) { /* out it goes */ if (any) { prn(grhd, 4); prn(prbuf, max); } prn(grlf, 4); b = 0x80; any = 0; for (x = 0; x < max; x++) prbuf[x] = 0; } } mouse_state(ON); } /* print the n bytes out the printer port */ prn(s, n) char *s; { while (n-) biosprint(0, *s++, 0); } /* select or de--select an object */ select_object(obj) { int i; Object *o; if (obj == --1) { /* de--select all */ for (i = 0; i < last_object; i++) { o = &objects[i]; if (o-->select) { o-->select = 0; highlight(o, 0); } } slct_cnt = 0; } else { o = &objects[obj]; o-->select = !o-->select; highlight(o, o-->select); slct_cnt += o-->select ? 1 : --1; } } /* get a filename from the user, return 0 if abort */ get_name() { return form("Path: %20s", filename); } /* based on current select state, set the top--most menu */ check_menu() { menu_state(M_ACT, slct_cnt > 0); menu_state(M_OBJ, slct_cnt <= 0); } /* start to make an object by collecting points */ start_make(tape) { char *s; switch (make_obj = type) { case BOX: s = "box: top left corner..."; break; case ELLIPSE: s = "ellipse: top left corner..."; break; case LINE: s = "line: one end..."; break; case TEXT: s = "text: starting..."; break; } msg(s); mouse_shape(M_CROSS); first = 1; } /* if enough points have been collected, make the object */ make_object(x, y) { static int fx, fy; if (grid) snap(&x, &y); switch (make_obj) { case TEXT: *text = 0; form("text: %20s", text); add_object(TEXT, x, y, x + strlen(text)*8, y+8, text); make_obj = 0; mouse_shape(M_POINTER); msg(""); break; default: if (first) { fx = x; fy = y; first = 0; line(x--3, y, x+3, y); line(x, y--3, x, y+3); if (make_obj == LINE) msg("other end..."); else msg("bottom right corner..."); } else { add_object(make_obj, fx, fy, x, y, 0L); msg(""); make_obj = 0; mouse_shape(M_POINTER); } } } /* snap the coordinates to the nearest grid point */ snap(xp, yp) int *xp, *yp; { int g2 = grid/2, g4 = grid/4, x = *xp, y = *yp; x = ((x + g2) / grid) * grid; y = ((y + g4) / g2) * g2; msg("x %d-->%d y %d-->%d", *xp, x, *yp, y); *xp = x; *yp = y; } /* move, copy, or edit a figure */ action_object(x, y) { int i, dx, dy; Object *o; if (grid) snap(&x, &y); /* find reference point and compute distance moved */ dx = dy = (actn_obj == A_EDIT ? 0 : 10000); for (i = 0; i < last_object; i++) { o = &objects[i]; if (o-->select) { if (actn_obj == A_EDIT) { dx = max(o-->r, dx); dy = max(o-->b, dy); } else { dx = min(o-->l, dx); dy = min(o-->t, dy); } } } dx = x -- dx; dy = y -- dy; /* do it to all selected items, de-selecting as you go */ for (i = 0; i < last_object; i++) { o = &objects[i]; if (o-->select) { o-->select = 0; highlight(o, 0); switch (actn_obj) { case A_COPY: highlight(o, 0); add_object(o-->type, o-->l + dx, o-->t + dy, o-->r + dx, o-->b + dy, o-->data); break; case A_MOVE: draw_object(o, 0); o-->l += dx; o-->t += dy; o-->r += dx; o-->b += dy; draw_object(o, 1); break; case A_EDIT: draw_object(o, 0); set_coords(o, o-->l, 0-->t, o-->r + dx, o-->b + dy); draw_object(o, 1); break; } } } /* deselect all and reset the mouse */ actn_obj = 0; slct_cnt = 0; mouse_shape(M_POINTER); msg(""); check_menu(); } /* read objects from a file */ read_objects() { int type, t, l, r, b; FILE *f = fopen(filename, "r"); if (f != NULL) { last_object = 0; while (fgets(text, 100, f)) { sscanf(text, "%c %d %d %d %d '%[^']\n", &type, &l, &t, &r, &b, text); add_object(type, l, t, r, b, text); } fclose(f); msg("%d objects loaded", last_object); } else msg("can't open '%s'", filename); } /* write objects to a file */ write_objects() { int i; Object *o; FILE *f; if (*filename == 0 && !get_name()) return; if ((f = fopen(filename, "w")) != NULL) { for (i = 0; i < last_object; i++) { o = &objects[i]; fprintf(f, "%c %d %d %d %d '%s'\n", o-->type, o-->1, o-->t, o-->r, o-->b, o-->type == TEXT ? o->data : ""); } fclose(f); } else msg("can't write '%s'", filename); } /* save the given string in malloc'ed memory */ char * strsave(s) char *s; { char *malloc(); char *r = malloc(strlen(s)+1); if (r) strcpy(r, s); else msg("out of memory!!!"); return r; } /* re--display all the objects on the screen */ refresh() { int i, x, y, gy; Object *o; clearviewport(); setcolor(MaxColor); if (grid) { gy = grid/2; for (x = grid; x < Maxx; x += grid) for (y = gy; y select) highlight(o, 1); } } /* (de)highlight the current selected item */ highlight(o, color] object *o; { setcolor(color); rectangle(o-->l--2, o-->t--2, o-->l+2, o-->t+2); rectangle(o-->r--2, o-->b--2, o-->r+2, o-->b+2); } /* give the user some feedback */ msg(fmt, a, b, c, d) char *fmt; { static int lastback = 0; setfillstyle(EMPTY_FILL, 0); bar(0, 0, lastback, 8); sprintf(text, fmt, a, b, c, d); setcolor(MaxColor); outtextxy(0, 0, text); lastback = strlen(text) * 8; } /* * object handling */ /* see if x, y are in an object, begin looking at start + 1 */ in_object(x, y) { static int last = 0; int l, r, t, b; Object *o; int i = last+1, n = last_object; while (n-) { if (i >= last_object) i = 0; o = &objects[i]; l = min(o-->l, o-->r); r = max(o-->l, o-->r); t = min(o-->t, o-->b); b = max(o-->t, o-->b); if (x >= l && x <= r && y >= t && y <= b) return (last = i); i++; } return (last = --1); } /* add an object to the object table */ add_object(type, l, t, r, b, data) char *data; { Object *o = &objects[last_object++]; char *s; o-->type = type; set_coords(o, l, t, r, b); o-->select = 0; if (type == TEXT) o-->data = strsave(data); draw_object(o, 1); } /* set the coordinates properly */ set_coords(o, l, t, r, b) Object *o; { if (o-->type == LINE) { /* no fixup on these */ o-->l = l; o-->t = t; o-->r = r; o-->b = b; } else { o-->l = min(l, r]; o-->t = min(t, b); o-->r = max(l, r); o-->b = max(t, b); } } /* draw an object on the screen */ draw_object(o, color) Object *o; { int x, y, xrad, yrad; setcolor(color); switch (o->type) { case TEXT: x = strlen(o->data) * 8; setfillstyle(EMPTY_FILL, 0); bar(o-->l, o-->t, o-->l + x, o-->t + 8); outtextxy(o-->1, o-->t, o-->data); break; case BOX: rectangle(o-->l, o-->t, o-->r, o-->b); break; case LINE: line(o-->l, o-->t, o-->r, o-->b); break; case ELLIPSE: x = o-->l + (o-->r -- o-->l)/2; y = o-->t + (o-->b -- o-->t)/2; xrad = o-->r -- x; yrad = o-->b -- y; ellipse(x, y, 0, 360, xrad, yrad); break; } } /* delete an object */ kill_object() { int i, j; Object *o; for (i = j = 0; i < last_object; i++) { o = &objects[i]; if (o-->select) { highlight(o, 0); draw_object(o, 0); o-->select = 0; } else { if (i > j) objects[j++] = objects[i]; else j++; } } last_object = j; slct_cnt = 0; check_menu(); } /* start an edit on the selected objects */ start_edit() { int i; Object *o; /* edit the text objects now */ for (i = 0; i < last_object; i++) { o = &objects[i]; if (o-->type == TEXT && o-->select) { o-->select = 0; highlight(o, 0); draw_object(o, 0); strcpy(text, o-->data); if (form("edit: %20s", text)) { free(o-->data); o-->data = strsave(text); o-->r = o-->l + strlen(text)*8; } draw_object(o, 1); slct_cnt-; } } if (slct_cnt > 0) { /* must be other stuff */ start_actn(A_EDIT); } check_menu(); } /* initiate an action on selected objects */ start_actn(actn) { switch (actn) { case A_COPY: msg("copy to..."); break; case A_MOVE: msg("move to..."); break; case A_EDIT: msg("editing..."); break; } actn_obj = actn; mouse_shape(M_CROSS); } Spiffier Windows For Turbo C Tony Servies Tony Servies is a programmer/analyst with World Computer Systems in Oak Ridge, Tennessee. Presently he is working on a project to develop computer-based training programs for the U.S. Navy. His computer interests include utilities and C programming. You may contact him at Route 1, Box 143, Greenback, TN 37742. Want to spice up your user interface with flashy windows with only a minimal amount of coding and time? With a few lines of code and Borland's Turbo C, it's possible. Turbo C Window Interface Two functions can be used to create text windows in Turbo C: gettext() and puttext(). Each function either gets a screen image or puts an image to the screen, respectively. The programmer supplies only the window coordinates and a character string pointer (or character array, if you will); the function does the rest. These remarkable routines do some rudimentary screen I/O quickly and cleanly. One drawback though is the inherent lag between the time you write to the window area and the moment the text is displayed. The user 'sees' any text writing that you perform. Most applications today require a window flashed on the screen intact, such as in a pull-down menu. The code in Listing 1 allows writing to a window before it is displayed on the screen. Then when the window is flashed on the screen, it is complete. You call puttext_write with the x,y coordinates, window size, the character string to display, the attribute for the string, and the pointer to the window buffer. The x,y coordinates start from the upper left corner with the location 0,0. The size of the window is given with the number of columns (width) and the number of rows (heighth). The string to display is simply a character string stored in standard C format; a '\0' character terminates the string. The buffer is a pointer to an area of characters that denotes the window area. The string attribute is the usual color attributes found in almost every reference manual on PCs. How It Works puttext_write first checks that you are not positioning the data beyond the physical bounds of the window. Of course, this routine will wrap any text past the end of a line onto the following line (unless it is the last line in the window). The routine then gets the pointer address for the last character and attribute pair in the window area, called maxbuffer in the subroutine. The offset for the proper x,y location is added to the buffer so that it points to the correct character. While the buffer location is less than the maxbuffer pointer and the character in the string is not the end of string terminator ('\0'), the while loop updates the character and attribute of the buffer. The loop terminates only when the buffer overflows (buffer >= maxbuffer) or at the end of string (*string == '\0'). Now, just put the window on the screen and you're ready to go. I've included a quick and dirty sample program illustrating the flashy windows routine (Listing 2). Note how easy it is to create a window. Just use a character array of XSIZE*YSIZE*2 bytes. (You multiply the area by two because each displayed character is followed by a byte of attribute information (color, blink, etc.).) The program then clears the window and sets all of the attributes. In this example I set all attributes to magenta characters on cyan background. Then the routine that does the actual call to puttext_write() loops through ten times. After the page is full, I put the window on the screen with the puttext() command and wait for half a second. The routine loops through nine more times until it completes the for loop. I then restore the original screen with the puttext() call for the original screen area (oldbuffer). This routine should enable you to enhance your pull- down, pop-up, and user-entry screens. Feel free to modify the code to account for border areas, highlighted text, etc. Listing 1 puttext_write (x, y, xsize,ysize,string,attr,buffer) int x,y,xsize,ysize; char *string, attr, *buffer; { char *maxbuffer; if (x >= xsize y >= ysize) /* Range Errors */ return; maxbuffer = buffer+(xsize*ysize*2)-1; /* maxbuffer points to the attribute of the last character */ buffer += (((y*xsize)+x)*2); /* buffer points to the first character to write */ /* While buffer is not overrun and there are characters left * to print. */ while ((buffer < maxbuffer) && (*string != '\0')) { *buffer++ = *string++; /* Do character */ *buffer++ = attr; /* Do attribute */ } } Listing 2 #include #include #define XSIZE 50 #define YSIZE 15 char newbuffer[XSIZE * YSIZE * 2]; /* Allow for Attributes */ char oldbuffer[XSIZE * YSIZE * 2]; main() { int i, j; char key_string[15]; /* Get the existing screen area and store it in oldbuffer. * Subtract 1 from size, since the 1st position is 0. */ gettext (5,5,5+XSIZE-1,5+YSIZE-1,oldbuffer); /* Clear the new window area (newbuffer) */ for (i = 0; i < YSIZE; i++) { for (j = 0; j < XSIZE*2; j+=2) { newbuffer[i*XSIZE*2+j] = ' '; /* Blank Space */ newbuffer[i*XSIZE*2+j+1] = '\35'; /* Attribute */ } } /* Loop through 10 times */ for (j = 0; j < 10; j++) { /* Print YSIZE lines */ for (i = 0; i < YSIZE; i++) { sprintf(key_string,"Value %.3d",i+(j*(int)YSIZE)); puttext_write(1,i,XSIZE,YSIZE,key_string,\ '\35',newbuffer); } /* Show it on the screen */ puttext(5,5,5+XSIZE-1,5+YSIZE-1,newbuffer); delay(500); } /* Restore the original screeen */ puttext(5,5,5+XSIZE-1,5+YSIZE-1,oldbuffer); } The C Programmer's Reference: A Bibliography Of Periodicals Harold Ogg This article is not available in electronic form. Standard C Formatted Input P.J. Plauger P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standard committee. This is the fourth in a series of columns on input and output under Standard C. (See "Evolution of the C I/O Model," CUJ August '89, "Streams," CUJ October '89, and "Formatted Output," CUJ November '89.) The topic this month is how to perform formatted input. You can think of it as a natural, but not essential, companion to formatted output. As I emphasized last month, you really must perform output somewhere in every program that you write. If the output is to be directly digestible by human beings, as is often the case, then you want the program to produce readable text. The formatted output functions help you produce readable text that reflects the values of encoded data in your program. On the other hand, not all programs read input. Those that do can read data directly, using an assortment of standard library functions, and interpret it as they see fit. Converting small integers and text strings for internal consumption are both five-finger exercises that most C programmers perform easily. It is only when you must convert floating point values, or recognize a complex mix of data fields, that standard scanning functions begin to look attractive. Even then the choice is not always clear. The usability of a program depends heavily on how tolerant it is to variations in user input. You as a programmer may not agree with the conventions enforced by the standard formatted input functions. You may not like the way they handle errors. In short, you are much more likely to want to roll your own input scanner. Obtaining formatted input in not simply the inverse of producing formatted output. With output, you know what you want the program to generate next and it does it. With input, however, you are more at the mercy of the person producing the input text. Your program must scan the input text for recognizable patterns, then parse it into separate fields. Only then can it determine what to do next. Not only that, the input text may contain no recognizable pattern. You must then decide how to respond to such an "error." Do you print a nasty message and prompt for fresh input? Do you make an educated guess and bull ahead? Or do you abort the program? Various canned input scanners have tried all of these strategies. No one of them is appropriate for all cases. It is no surprise, therefore, that the history of the formatted input functions in C is far more checkered than for the formatted output functions. Most implementations of C have long agreed on the basic properties of printf and its buddies. (A notable exception is the I/O library I originally wrote for the Whitesmiths C compiler. It nicely regularized the names of functions and format conversion specifications, but at a serious cost in compatibility. Eventually, we had to abandon our special dialect of I/O.) By contrast, scanf and its ilk have changed steadily over the years and have proliferated dialects. Committee X3J11 spent considerable time sorting out the proper behavior of formatted input. Once we agreed on which input conversions to include in Standard C, we had to agree on exactly what they did. Implementations varied on the valid formats for numeric fields. They were all over the map on how to respond to invalid input. They seldom clarified how scanf interacts with ungetc and other I/O functions. All these decisions had to be made in an atmosphere of general dissatisfaction. A vocal minority wanted major changes in the formatted input functions. An almost silent majority didn't want to be bothered with details about functions they considered useless at best, dangerous at worst. Given all these handicaps, I think X3J11 did rather a good job of clarifying the formatted input functions and making them useful. After that introduction, I will rashly assume that you still care about the formatted input functions. The rest of this column discusses the scan functions, so called because they all have scan as part of their names. These are the functions that scan input text and convert text fields to encoded data. All are declared in the standard header . To use the scan functions, you must know how to call them, how to specify conversion formats, and what conversions they will perform for you. Calling Scan Functions The Standard C library provides three different scan functions, declared as follows: int fscanf(FILE *stream, const char *format, ...); int scanf(const char *format, ...); int sscanf(char *src, const char *format, ...); The function fscanf obtains characters from the stream stream. The function scanf obtains characters from the stream stdin. Both stop scanning input early if an attempt to obtain a character sets the end-of-file or error indicator for the stream. The function sscanf obtains characters from the null-terminated string beginning at src. It stops scanning input early if it encounters the terminating null character for the string. Note that all of the functions accept a varying number of arguments, just like the print functions. And just like the print functions, you had better declare any scan functions before you use them by including . Otherwise, some implementation may go crazy when you call your undeclared scan function. All the functions accept a read-only format argument, which is a pointer to a null-terminated string. The format tells the function what additional arguments to expect, if any, and how to convert input fields to values to be stored. (A typical argument is a pointer to a data object that receives the converted value.) It also specifies any literal text or whitespace you want to match between converted fields. If scan formats sound remarkably like print formats, the resemblance is quite intentional. But there are also important differences. I will revisit formats in considerable detail later in this column. All the functions return a count of the number of text fields converted to values that are stored. If any of the functions stops scanning early for one of the reasons cited above, however, it returns the value of the macro EOF (defined in the standard header ). Since EOF must have a negative value, you can easily distinguish it from any valid count, including zero. Note, however, that you can't tell how many values were stored before an early stop. If you need to locate a stopping point more precisely, break your scan call into multiple calls. A scan function can also stop scanning because it obtains a character that it is unprepared to deal with. In this case, the function returns the cumulative count of values converted and stored. You can determine the largest possible return value for any given call by counting all the conversions you specify in the format. The actual return value will be between zero and this maximum value, inclusive. When either fscanf or scanf obtains such an unexpected character, it pushes it back to the input stream. (It also pushes back the first character beyond a valid field when it has to peek ahead to determine the end of the field.) How it does so is similar to calling the function ungetc. There is a very important difference, however. You cannot portably push back two characters to a stream with successive calls to ungetc (and no other intervening operations on the stream). You can portably follow an arbitrary call to a scan function with a call to ungetc for the same stream. What this means effectively is that the one-character pushback limit imposed on ungetc is not compromised by calls to the scan functions. Either the implementation guarantees two or more characters of pushback to a stream or it provides separate machinery for the scan functions. Note that the scan functions push back at most one character. Say, for example, that you try to convert the field 123EASY as a floating point value. The field is, of course, invalid. Even the subfield 123E is invalid, since the conversion requires at least one exponent digit. What will happen is, the subfield 123E is consumed and the conversion fails. No value is stored and the scan function returns. The next character to read from the stream is A. This behavior matters most for floating point fields, which have the most ornate syntax. Other conversions can usually digest all the characters in the longest subfield that looks valid. As a final point, the Standard C library does not provide any of the functions vfscanf, vscanf, or vsscanf. These are obvious analogs to the print functions vfprintf, vprintf, and vsprintf which I described last month. X3J11 simply felt that there was not enough call for such scan functions to require them of all implementations. Writing Formats Last month, I described the print formats as a mini programming language. The same is, of course, true of the scan formats. I also commented earlier that print and scan formats look remarkably alike. This should serve as both a comfort and a warning to you. The comfort is that the print and scan functions are designed to work together. What you write to a text file with one program should be readable as a text file by another. Any values you represent in text by calling a print function should be reclaimable by calling a scan function. (At least they should be to good accuracy, over a reasonable range of values.) You would even like the print and scan formats to resemble each other closely. Doug McIlroy, at AT&T Bell Laboratories, makes a stronger statement. He feels that any good formatted I/O package should let you write identical formats for print and scan function calls. A formatting language that is not symmetric, he feels, is deficient. I believe that Standard C comes close to achieving this goal. It is at least possible for you to write symmetric formats (those that read back what you wrote out). Be warned, however, that developing symmetry can take a bit of extra thought. And here lies the danger. The fact remains that the print and scan format languages are different. Sometimes the apparent similarity is only superficial. You can write text with a print function call that does not scan as you might expect with a scan function call using the same format. Be particularly wary when you print text using conversions with no intervening whitespace. Be somewhat wary when you print adjacent whitespace in two successive print calls. The scan functions tend to run together fields that you think of as separate. The basic operation of the scan functions is, indeed, the same as for the print functions. Call a scan function and it scans the format string once from beginning to end. As it recognizes each component of the format string, it performs various operations. Most of these operations consume characters sequentially from a stream (fscanf or scanf) or from a string stored in memory (sscanf). Many of these operations generate values that the scan function stores in various data objects that you specify with pointer arguments. Any such arguments must appear in the varying length argument list, in the order in which the format string calls for them. For example, sscanf("thx 1138", "%s%2o%d", &a, &b, &c); stores the string "thx" in the char array a, the value 9 (octal eleven) in the int data object b, and the value 38 in the int data object c. It is up to you to ensure that the type of each actual argument pointer matches the type expected by the scan function. (The pointer must, of course, also point to a data object of the expected type.) Standard C has no way to check the types of additional arguments in a varying length argument list. Not every part of a format string calls for the conversion of a field and the consumption of an additional argument. In fact, only certain conversion specifications gobble arguments. Every conversion specification begins with the % escape character and matches one of the patterns described below. The scan functions treat everything else either as whitespace or as literal text. Whitespace in a scan format, by the way, is whatever the standard library function iswhite (declared in ) says it is. That can change if you call the function setlocale (declared in ) before you call the scan function. Your program begins execution in the "C" locale, where whitespace is what you have learned to know and love. A sequence of one or more whitespace characters in a scan format is treated as a single entity. It consumes an arbitrarily long sequence of whitespace characters from the input. (Again, whitespace is whatever the current locale says it is.) The whitespace in the format need not resemble the whitespace in the input in any way. The input can contain no whitespace. Whitespace in the format simply guarantees that the next input character (if any) is not a whitespace character. Any character in the format that is not whitespace and not part of a conversion specification calls for a literal match. The next input character must match the format character. Otherwise, the scan function returns with the current count of converted values stored. A format that ends with a literal match can produce ambiguous results. You cannot determine from the return value whether the trailing match failed. Similarly, you cannot determine whether a literal match failed or a conversion that follows it. For these reasons, literal matches have only limited use in scan formats. For completeness, I should point out that a literal match can be any string of multibyte characters. Each sequence of literal text must begin and end in the initial shift state, if your target environment uses a state-dependent encoding for multibyte characters. I suspect, however, that you will have little need to match Kanji characters with scan formats in the next few years. Conversion Specifications A scan conversion specification differs from a print conversion specification in fundamental ways. You cannot write any of the print conversion flags and you cannot write a precision (following a decimal point). On the other hand, scan conversions have an assignment-suppression flag and a conversion specification called a scan set. Following the % you write three components. All but the last component is optional. In order: You write an optional asterisk (*) to specify that the converted value is not to be stored. You write an optional field width to specify the maximum number of input characters to match when determining the conversion field. The field width is an unsigned decimal integer. Many conversions skip any leading whitespace, which is not counted as part of the field width. You write a conversion specifier to determine the type of any argument, how to determine its conversion field, and how to convert the value to store. You write a scan set conversion specifier between brackets ([]). All others consist of one or two character sequences from a predefined list of about three dozen valid sequences. The two-character sequences begin with an h, l, or L, to indicate alternate argument types. I describe scan sets and list all valid sequences in Table 1. Don't write anything else in a scan format if you want your code to be portable. The goal of each formatted input conversion is to determine the sequence of input characters that constitutes the field to convert. The scan function then converts the field, if possible, and stores the converted value in the data object designated by the next pointer argument. (If assignment is suppressed, no function argument is consumed.) Unless otherwise specified below, each conversion first skips arbitrary whitespace in the input. Skipping is just the same as for whitespace in the scan format. The conversion then matches a pattern against succeeding characters in the input to determine the conversion field. You can specify a field width to limit the size of the field. Otherwise, the field extends to the last character in the input that matches the pattern. The scan functions convert numeric fields by calling one of the standard library functions strtod, strtol, or strtoul (declared in ). A numeric conversion field matches the longest pattern acceptable to the function it calls. Scan Sets A scan set behaves much like the s conversion specifier. It stores up to w characters (default is the rest of the input) in the array of char pointed at by ptr. It always stores a null character after any input. It does not, however, skip leading whitespace. It also lets you specify what characters to consider as part of the field. You can specify all the characters to match, as in: "%[0123456789abcdefABCDEF]" which matches an arbitrary sequence of hexadecimal digits. Or you can specify all the characters that do not match, as in: "%[^0123456789]" which matches any characters other than digits. If you want to include the right bracket (]) in the set of characters you specify, write it immediately after the opening [ (or [^). You cannot include the null character in the set of characters you specify. Some implementations may let you specify a range of characters by using a minus sign (-). The list of hexadecimal digits, for example, can be written as: "%[0-9abcdefABCDEF]" or even, in some cases, as: "%[0-9a-fA-F]" Please note, however, that such usage is not universal. Avoid it in a program that you wish to keep maximally portable. Table 1 Conversion Specifiers In the descriptions that follow, I summarize the match pattern and conversion rules for each valid conversion specifier. w stands for the field width you specify, or the indicated default value if you specify no field width. ptr stands for the next argument to consume in the varying length argument list: c -- stores w characters (default is 1) in the array of char whose first element is pointed at by ptr. It does not skip leading whitespace. d -- converts the integer input field by calling strtol with a base of 10, then stores the result in the int pointed at by ptr. hd -- converts the integer input field by calling strtol with a base of 10, then stores the result in the short pointed at by ptr. ld -- converts the integer input field by calling strtol with a base of 10, then stores the result in the long pointed at by ptr. e -- converts the floating point input field by calling strtod, then stores the result in the float pointed at by ptr. le -- converts the floating point input field by calling strtod, then stores the result in the double pointed at by ptr. Le -- converts the floating point input field by calling strtod, then stores the result in the long double pointed at by ptr. E -- is the same as e. lE -- is the same as le. LE -- is the same as Le. f -- is the same as e. lf -- is the same as le. Lf -- is the same as Le. g -- is the same as e. lg -- is the same as le. Lg -- is the same as Le. G -- is the same as e. lG -- is the same as le. LG -- is the same as Le i -- converts the integer input field by calling strtol with a base of zero, then stores the result in the int pointed at by ptr. (A base of zero lets you write input that begins with 0, 0x, or 0X to specify an actual numeric base other than 10.) hi -- converts the integer input field by calling strtol with a base of zero, then stores the result in the short pointed at by ptr. i -- converts the integer input field by calling strtol with a base of zero, then stores the result in the long pointed at by ptr. n -- converts no input, but stores the cumulative number of matched input characters in the int pointed at by ptr. It does not skip leading whitespace. hn -- converts no input, but stores the cumulative number of matched input characters in the short pointed at by ptr. It does not skip leading whitespace. ln -- converts no input, but stores the cumulative number of matched input characters in the long pointed at by ptr. It does not skip leading whitespace. o -- converts the integer input field by calling strtoul with a base of eight, then stores the result in the unsigned int pointed at by ptr. ho -- converts the integer input field by calling strtoul with a base of eight, then stores the result in the unsigned short pointed at by ptr. lo -- converts the integer input field by calling strtoul with a base of eight, then stores the result in the unsigned long pointed at by ptr. p -- converts the pointer input field, then stores the result in the void * pointed at by ptr. Each implementation defines its pointer input field to be consistent with pointers written by the print function. s -- stores up to w non-whitespace characters (default is the rest of the input) in the array of char pointed at by ptr. It first skips leading whitespace, and it always stores a null character after any input. u -- converts the integer input field by calling strtoul with a base of 10, then stores the result in the unsigned int pointed at by ptr. hu -- converts the integer input field by calling strtoul with a base of 10, then stores the result in the unsigned short pointed at by ptr. lu -- converts the integer input field by calling strtoul with a base of 10, then stores the result in the unsigned long pointed at by ptr. x -- converts the integer input field by calling strtoul with a base of 16, then stores the result in the unsigned int pointed at by ptr. hx -- converts the integer input field by calling strtoul with a base of 16, then stores the result in the unsigned short pointed at by ptr. lx -- converts the integer input field by calling strtoul with a base of 16, then stores the result in the unsigned long pointed at by ptr. X -- is the same as x. hX -- is the same as hx. lX -- is the same as lx. % -- converts no input, but matches a percent character. (%) Doctor C's Pointers (R) The Memory Management Library Rex Jaeschke Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at uunet!aussie!rex. The C run-time library has long had a family of routines that enable a programmer to allocate and free memory at run-time, at his pleasure. This capability is a powerful one and was adopted (and somewhat expanded) in ANSI C. Oftentimes you define an array of elements (necessarily of fixed size) only to find that, in most cases, you don't use all the elements or that, in some cases, you need just a few more. What you need is the ability to have variable sized arrays. However, according to the definition of C, the dimension of an array in a definition must be a compile-time integer constant. That is, the C language does not support such constructs. (Note that the Numerical C Extensions Group, of which I am the convener, is investigating the possibility of adding such a construct.) However, this idea can be implemented using the memory allocation routines in the standard library. The beauty of these allocation routines is twofold: the programmer determines just when space is allocated and exactly how long it is kept, and, if the program is written correctly, you can change the manner in which the space is allocated and freed, transparently. Let's discuss the second point further. ANSI C defines the term storage duration by saying "An object has a storage duration that determines its lifetime. There are two storage durations: static and automatic." I prefer to also add a third duration, dynamic. An object having dynamic storage duration is one allocated by the programmer using the library. (For the purposes of this discussion, the address space from which dynamic objects are allocated will be referred to as the heap. This term is widely used for this purpose but is not used in the ANSI C Standard.) Consider the following example: #include void f() { char c1[100]; static char c2[100]; char *c3; c3 = malloc(100); c1[10] = 'a'; c2[10] = 'a'; c3[10] = 'a'; } Ignoring the possibility of malloc() failing to allocate memory, c1, c2, and c3 can be used to designate the automatic, static, and dynamic arrays, respectively. Since the notation for referencing all three arrays is identical, the executable code can be ignorant of the object's storage duration. You can change from automatic to dynamic, from dynamic to static, etc., with no real impact on the code, if you design it appropriately to begin with. The allocation functions somehow magically change the address space of our program at run-time. The way in which this is done is specific to an implementation and may vary widely. In any case, an understanding of such details is unnecessary to use the allocation functions effectively. All you need know is that if they succeed, the requested space is allocated contiguously and you are given a base address. The Parent Header In the not too distant past, there were only four or five "standard" headers. Apart from those, there was a wide variation as to which functions were provided and in which header (if any) they were declared. ANSI C requires the allocation functions to be declared in the header stdlib.h. Many implementations currently declare them in malloc.h as well as, or instead of, stdlib.h. I have also seen quite a lot of old code that contained explicit declarations for these functions, presumably because no header in their implementation contained them. As a result of ANSI C, the declarations of these functions has changed both with regard to return as well as argument types. ANSI C adopted the concept of a void pointer from C++. This solved two important issues: it provided a bridge for porting code across byte and word (and other) architectures where different pointer types may actually have different physical representations, and secondly, it provided a way to represent a generic pointer, one that simply contained an address of some (unknown) object type. Since the allocation routines are not given any information about the type of object a programmer wishes to store in the allocated space, the pointers used and returned by these functions were prime candidates for type void *. A consequence of this is that the returned value no longer need be explicitly cast. For example in the following case: int *pi; pi = (int *)malloc(10 * sizeof(int)); pi = malloc(10 * sizeof(int)); the assignments are equivalent since a void pointer is assignment-compatible with all other pointer types. (Historically, it was common to see such casts even though they generally were not needed. That is, strict pointer assignment-compatibility checking was not enforced as is now required by ANSI.) If some of your code explicitly declares the allocation functions as having return types of char *, without such casts you will get errors when compiling in strict ANSI mode if the target of the assignment has type other than char *. The best solution to this is to remove the explicit declaration and include stdlib.h instead. With ANSI's adaptation of function prototypes from C++, stdlib.h now describes the allocation routines' argument type information as well. Again, all pointer types here have type void * but this is of no consequence since any "real" pointer type is compatible with void * and, as such, objects of such type can be passed in. ANSI C has invented the type size_t, the type of a sizeof expression. This type is typedefed in numerous standard headers including stdlib.h and is used in various library function prototypes (including the allocation functions) for the type of sizes and counts. Since sizes and counts can never be negative, size_t is an unsigned integer type. However, the underlying type of size_t is implementation-defined and may be unsigned int or unsigned long. Historically, descriptions of the allocation functions stated that sizes and counts had type unsigned int. The Allocation Functions calloc #include void *calloc(size_t nmemb, size_t size); calloc() allocates contiguous space for nmemb objects, each of whose size is size. The space allocated is initialized to all-bits-zero. Note that this is not guaranteed to be the same representation as floating-point zero or the null pointer constant NULL. free #include void free(void *ptr); free() causes the space (previously allocated by calloc(), malloc(), or realloc()) pointed to by ptr to be freed. If ptr is NULL, free does nothing. Otherwise, if ptr is not a value previously returned by one of these three allocation functions, the behavior is undefined. The value of a pointer that refers to space that has been freed is indeterminate, and such pointers should not be dereferenced. Note that free() has no way to communicate an error if one is detected. On some systems, most noticeably MS-DOS, freed space may not actually be given back to the operating system. (It likely will, however, be available for future allocations within that program.) It might only be really released when the program terminates. One consequence of this is that if you try to execute another program from within a running program that has freed up memory using free(), there still might not be sufficient physical memory available to start the new program. malloc #include void *malloc(size_t size); malloc() allocates contiguous space for size bytes. The space allocated has no guaranteed initial value. realloc #include void *realloc(void *ptr, size_t size); realloc() changes the size of the space pointed to by ptr to have size size. If ptr is NULL, realloc() behaves like malloc(). Otherwise, if ptr is not a value previously returned by calloc(), malloc(), or realloc(), the behavior is undefined. The same is true if ptr points to space that has been freed. size is absolute, not relative. If size is larger than the size of the existing space, new uninitialized contiguous space is allocated at the end; the previous contents of the space are preserved. If size is smaller, the excess space is freed; however, the contents of the retained space are preserved. If realloc() cannot allocate the requested space, the contents of the space pointed to by ptr remain intact. If ptr is non-NULL and size is 0, realloc() acts like free(). Whenever the size of space is changed by realloc(), the new space may begin at an address different from the one given it, even when realloc() is truncating. Therefore, if you use realloc() in this manner, you must beware of pointers that point into this possibly-moved space. For example, if you build a linked list there and use realloc() to allocate more (or less) space for the chain, it is possible that the space will be "moved," in which case the pointers now point to where successive links used to be, not where they are now. You should always use realloc() as follows: ptr1 = realloc(ptr, new_size); if (ptr1 != NULL) { ptr = ptr1; ... } This way, you never care whether the object has been relocated since you always update ptr each call, to point to the (possibly new) location. General Comments The way in which a heap is physically organized can vary widely. On some systems, the stack and the heap (and possibly even the static data area) share the same address space. On others, each may have its own address space. Some MS-DOS implementations provide both near and far heaps. Historically, many C implementations have permitted the allocation of zero bytes to be successful. That is, a non-NULL pointer is returned. Since ANSI C does not permit zero-sized objects to be defined, this practice was hotly debated during X3J11 deliberations. As a compromise, if you attempt to allocate zero bytes, it is implementation-defined whether a null pointer or a unique pointer is returned. We are told that if an allocation attempt fails, NULL is returned. The common approach I've seen to this is to display some error message and call exit(). However, most applications I have seen could ill-afford to actually do this since it would leave either disk files and/or shared memory data areas compromised. For example, if you cannot get more dynamic space, you may have quite some work to undo your current situation before you can gracefully terminate or continue. On the other hand, failure to allocate more memory when doing an in-memory sort can simply be handled by writing the sorted tree to disk, freeing the memory, and starting on the next set of strings. In such cases, the failure to allocate memory is not fatal. In cases where it is, you must consider the ramifications of receiving a NULL return at design time, not during maintenance when the first failure occurs. When heap allocation fails it might well be useful to find out how much you can get. Unfortunately, ANSI C does not provide this capability. Several implementations (including Microsoft's) do provide some help in this area. Either they can tell you how much is available now in one allocation or, how many allocations you can make of a given size. (The two need not add up to the same number of bytes since each time you request bytes, extra bytes may also be fetched to help manage the space allocated.) Similarly, ANSI C provides no help in debugging heap-related problems by "walking the heap links" and the like. Again, it's up to the quality of the implementation. On some systems (VAX/VMS, for example) the cost of allocating memory dynamically can be somewhat expensive. As such, a caching approach may be taken. That is, when you free memory the larger of the freed block and the cache currently held, will be kept. The idea is that if you alternately allocate and free, each new allocation will have some chance of getting memory from the freed cache. ANSI C guarantees that any non-NULL address returned by the allocation functions will be aligned appropriately so it can be dereferenced via any pointer type. On systems that require object alignment, this means that space is allocated in multiples of some cluster value (such as machine words, for example.) On such systems, more memory may actually be allocated than you requested. If your program contains a bug and copies (slightly) beyond the end of allocated memory, the bytes overwritten may be those extra ones and no error occurs. However, if you change the request to a few extra bytes, the bug may manifest itself. The most common example I see is as follows: char name[30]; getname(name); pc = malloc(strlen(name)); strcpy(pc, name); Here, strcpy() adds a null character to the destination but no space was allocated for it. If the length was odd and malloc() allocates an even number anyway, the problem will not be observed. However, with even length names it may well appear. It is considered good style to explicitly free allocated memory when you are done with it. Presumably, if you don't, this is done when your program terminates (although this is not so stated by ANSI C.) Note that if you "forget" where your allocated space resides (by overwriting the pointer value returned by malloc(), for example), there is no way of getting that address back. One relatively easy way of having this happen is to use: ptr = realloc(ptr, new_size); If realloc() fails, you have lost the address of the original area. An alternate memory allocation system also exists in many systems. It usually involves using sbrk(). The two schemes are incompatible and must not be used in the same program. ANSI C does not include this alternate scheme. Transparent Heap Usage It is possible that your program calls the allocation routines even if you don't call them yourself. For example, some library routines might need dynamic space to efficiently handle variable size amounts of local information. Many systems have a fixed limit on the number of open files they support. However, others do not. They can achieve this by building a linked list of FILE objects using the allocation routines. They may even include stdin, stdout, and stderr in this list, in which case, the program startup code may contain calls to malloc(), etc. Compile and link an empty main() program and look at the linker map to see if these library functions are called at startup. Multi-Dimensional Arrays Occasionally, it may be necessary to allocate a multi-dimensional array on the heap. This can be done just as easily as for single-dimensioned arrays once you master the required pointer declaration. For example, double (*pd)[10]; pd = malloc(50 * sizeof(double)); pd[3][2] = 1.234; By declaring pd to be a pointer to an array of 10 doubles, pd can be subscripted to two levels. pd[3] designates the fourth row of 10 elements and pd[3] [2] designates the third column in that row. (If you are confused about the difference between a pointer to double and a pointer to an array of double, you will have to wait for a future column.) Implementer's Notebook Life With Static Buffers Don Libes This article is not available in electronic form. Applying C++ Designing And Implementing A Text Editor Using OOP, Part 1 Tsvi Bar-David Tsvi Bar-David is president of Deerworks and currently a faculty member in the Software Engineering Department at Monmouth Collge. He received his PhD in mathematics from the University of California at Berkeley. Previously, he was employed at Bell Labs in the development and delivery of UNIX, C++, and Object-Oriented courses. In my July 1989 column on training for object-oriented programming, I presented a simple framework for object-oriented design. Today we embark upon a journey -- likely to last several columns -- in which we apply the design framework to the problem of constructing a simple text editor. Along the way we will develop some types which are not only useful in building the editor, but also as tools in general, and so can serve as members of a general-purpose object library. Most languages, including C++, require that the solution to a problem be represented as a main program. This we will do. Yet, our goal is not to design and build programs, but rather to identify and construct useful object types, out of which we can construct an infinite number of programs. In a sense akin to mathematics, we are constructing a solution not to just one problem, but rather to a family of related problems -- for example, the problem of editing text. It is precisely this approach to problem solving, I believe, that permits an object-oriented design (design as a noun, the result of the design process) to be easily modified, enhanced and re-used. The brevity of the main programs that we build reflects this approach; typically these programs instantiate an object or two and then invoke a couple of member functions. Bertrand Meyer [2] takes and supports very much the same position in the eiffel language. Indeed, eiffel has no main program; one simply selects a first object to which to send a message. The action associated with that message goes ahead and creates other objects and sends messages to them ad infinitum. Design Framework You should refer to the July 1989 column for details of the design framework. In sum, the framework manages a process that maps a requirements document to an implementable design document. To quote the earlier column: "The heart of object-oriented design is the identification of the types in the program and the relationships between them. To identify a type is to specify its behavior (public interface). To identify relationships means bringing to light the relationships (inheritance and parametric types) in the behavior of the types. One can then implement the behavior in many ways." Here is the pseudo-code for the design process. initial decomposition(on requirements document); while( stopping condition has not been met ) { abstraction; type relationships; type decomposition; } return design specification; In order to begin the design process, we need a behavioral description of the object we want to build, namely the text editor. Describing The Editor The ced editor allows the user to create new text files or edit existing ones. The editor views the file as just a sequence of characters (thus the 'c' in ced) with no other structure, such as a sequence of lines. Since newline ('\n') is just an ordinary character, we can easily recover the traditional line structure of a file by using ordinary edit operations. In addition, the editor maintains the notion of current point in the file. The point is regarded as being between two characters. The notion of current point is pretty close to the concept of current offset in UNIX files. At this point, we have to make a requirements decision about the user interface to the editor. For the sake of simplicity, assume that the editor has a traditional command line interface like edlin on MS-DOS systems or ed on UNIX systems (the input command stream looks like a sequence of lines). Each line consists of an optional integer prefix followed by a character. The table below associates commands with the characters that invoke them. N.B. Bracketed arguments are optional [n] g -- Move point to just before nth character (zero-based). Default value for n is 0. [n] p -- Print n characters starting at the first character after point, followed by a newline. n defaults to 1. Increment point by n. i -- Insert an arbitrary number of characters before point. Terminate insertion with '.' on a line by itself. [n] d -- Delete n characters starting at the first character after point. n defaults to 1. [n] y -- Paste whatever was last deleted n times just before point. n defaults to 1. w [file] -- Write out the internal representation of the file (the buffer) to the named file. The primary default for file is the filename command line argument to ced. If ced was invoked without a filename, it selects the last file written to. q -- Exit the editor. ? -- Print out useful information, like filename, point and size of file. Normally the editor scans standard input for commands. However, for flexibility, the editor should be able to get its command stream from a file or possibly some other source, like a string or a window. When the editor is invoked with an argument at the command line interface, such as prompt> ced filename the editor opens an existing file for editing or creates an empty file of that name. In either case, point is located just before the first character in the file. If the editor is called without an argument prompt> ced it manages an editing session. The user decides how to explicitly write the contents to a named file. A typical edit session might look like: 36g i hello there . g 50p w q Initial Decomposition Our task now is to identify the high-level types from the requirements, out of which we will construct the editor. Certainly File is one of these types and is used in two ways: as the file to be created or modified, and as the command stream (typically standard input from the terminal). In our description of the editor write command, we briefly mentioned the internal representation of the file under edit, traditionally known as the buffer. Is the Buffer type synonymous with the File type? We can answer this question more easily once we have described (the abstraction step) the public interface of both File and Buffer; namely, if the public interfaces (really, the manual pages) of two types are the same, then the types are one and the same. At the risk of getting ahead of ourselves, let's try to answer this question right now. Assume that a File object essentially has the semantics of a standard I/O FILE object (as supported by the standard run-time library of the ANSI C compiler [1]). Files and Buffers may very well share the offset or point concept. On the other hand, whatever a Buffer is, it must support the editor commands listed in the requirement section, particularly insertion and deletion. Yet, there are no native insertion and deletion operators on Files. The operation that puts a character into a file (putc( int, FILE *)) can be considered as inserting only when appending the file, not if the file offset is anywhere in the middle of the file (it will overwrite the character at the offset). This is not the behavior we are looking for. We conclude then that a Buffer is not a File, and so we must design and implement the Buffer abstraction. Now we may be able to implement Buffer in terms of File (as some implementations of the full-screen editor will do), but that is merely (yes, merely!) a matter of implementation and is not to be confused with the behavior or semantics of the Buffer. Is the editor itself a type? Even though giving the Editor a type may seem unnatural at first, we will reap the benefits already mentioned. Our design policy is clear, albeit extreme -- everything in the application is an object of one type or another. So what is the behavior of an editor object? An editor object interprets the command stream and performs actions both upon a buffer and the user interface, which for now is just standard output. That is, the editor coordinates three objects: the input (command) stream, a buffer, and an output stream (a view of the buffer). For simplicity of design, assume that an editor object manages precisely one buffer, which corresponds to at most one file. I say "at most" and not "precisely one" since the edit program ced can be invoked with no arguments. In such a case, the program presumably contains an editor object which manages a buffer, which currently does not correspond to any file. Later on, we will build an edit program based upon the Editor type which manages multiple buffers and files, something in the spirit of emacs. Now that we have identified the object types Editor, Buffer, and File we must now perform the design process on each of the types. We'll start with File since it is the most familiar of the types in our working list. But why even bother representing File as a class when all C++ compilers already support the standard I/O FILE structure? There are several reasons: Consistency. We want objects of all types in our application -- other than built-in types of the language -- to be represented by classes. This provides developers and maintainers a uniform feel of object orientation. The message expression object.memberfunction() will be the sole means of communicating with an object. Using a standard I/O function like putc( 'a', fp) directly on a FILE pointer (fp) would violate this desideratum. Insulation. We can regard our File type as an application-specific type layered on top of the environments's existing I/O support. This helps to make the editor more portable. When you port the editor to a new operating system, only the implementation of File need change. Other code that uses File doesn't change one iota. But we can do better. Since every C++ compiler's run-time support library contains FILE, we can just implement/layer File on top of FILE. Furthermore, there won't be much of a run-time penalty for this layering, if we declare all of the member functions of File to be inline! For File's public interface you need the five classic operations of a minimal interface. open -- connect the program to the named file or create it. close -- sever the connection between the program and the file. iseof -- returns true if at end-of-file, otherwise false. get -- get a character and advance the file offset. put -- put a character and advance the file offset. We can get rid of the explicit open and close member functions elegantly by using a constructor and destructor respectively. The advantage of this approach is that an instantiated File object is guaranteed to be initialized properly. Furthermore, mapping close to the destructor guarantees that when the File object dies (goes out of scope) in the program, the associated file in the file system is automatically closed, without the client programmer having to explicitly close it. The public interface of File as a C++ class is class File { public: File( char *name = "", char *mode = "r"); ~File(); Truth iseof(); int get(); void put( int c); private: // data members }; The constructor takes two arguments, and both are provided with defaults. Here are the intended semantics. The declaration File f; invokes the default constructor File( "", "r" ), which connects the object f to standard input for reading. File f( filename); invokes File( filename, "r") and so opens filename for reading. File f( filename, mode); opens filename with some mode (with the same semantics as fopen()). So, for example, File f( "foo", "w" ); opens the file foo for writing. Before we wax too lyrical about the joys of using constructors in place of an open function, we must face a design problem. Just after the constructor runs, how do we know that the file is really open? If the open failed for any reason (the file doesn't exist, we don't have the correct permissions, etc.), it would be nonsensical to invoke any member function against the object. One solution is to forget the constructor approach and just endow File with an explicit open function with the following form typedef int Truth;// boolean type Truth File::open( char *filename, char *mode); The open function could report success or failure of the operation in a manner similar to the C/C++ library functions fopen() and open() ---- by returning a boolean value (the value is regarded as boolean by convention). However, for those who want to stick with the constructor approach, here is another solution to the problem. Endow File with a member function // returns TRUE if open succeeded in constructor Truth File::isok(); whose sole purpose is to report on the status of the open performed in the constructor. Perhaps a separate isok() function is unnecessary; iseof() can report on open. But assigning this function to iseof() is bad design for two reasons. First, checking for end-of-file is conceptually a completely separate matter from checking to see if the open succeeded. And secondly, how are we to interpret the return value of iseof() on a newly created file for writing? To play it safe, we will have two predicate functions. The File type was originally developed to support a lexical scanner object. To make implementing the scanner easier, we included the following additional member functions in File's public interface: class File { public: ... void unget(int c); int peek(); ... }; unget() pushes the character c back onto an input stream. c is the next character get() gets. peek() returns the value of the next character without removing it from the input stream. As we have alluded, we can piggy-back or layer the implementation of File on top of standard I/O FILE. One easy implementation is found in Listing 1. The only thing difficult about this layered implementation was figuring out that we needed a state data member for recording the status of the open. All the member functions, with the exception of the constructor, are one-liners. Wrap-up In this column we have begun applying an object-oriented design framework to the problem of constructing a text editor. Starting from a description of the editor's behavior, we have identified three types of objects: Editor, Buffer, and File. We discussed how File might be used by other types and let that guide us in identifying its public interface. We then wrote a portable implementation of File layered on top of the standard I/O FILE abstraction. In the next column, we will continue on our journey, focusing our attention on the Buffer abstraction. In the course of designing Buffer, we will become acquainted with two useful parametric container types, Sloop[T] and Yacht[T], that will make the implementation of Buffer rather simple. Bibliography [1] Brian Kernighan and Dennis Ritchie, The C Programming Language, second edition, 1988, Prentice Hall. [2] Bertrand Meyer, Object Oriented Program Construction, 1988, Prentice Hall. (addresses object oriented design, including parametric types). Listing 1 class File { public: File( char *name = "", char *mode = "r") { if( *name ) fp = fopen( name, mode); else if( *mode == 'r' ) fp = stdin; else fp = stdout; state = (int)fp; } ~File() { if( fp) fclose( fp); } Truth isok() { return state; } Truth iseof() { return feof( fp); } int get() { return getc( fp); } void unget(int c) { (void)ungetc( c, fp); } int peek() { int c = get(); unget(c); return c; } void put( int c) { putc( c, fp); } private: FILE *fp; int state; }; Questions & Answers Readability, Portability, And Coding Style Ken Pugh Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707. Q I would appreciate your comments on the following questions and problems: 1. Type char: signed or unsigned? Most compilers consider chars as signed by default. We, European users, make extensive use of ASCII codes above 127 and the signed chars default does not seem to be the best choice. Which mode, in your opinion, is "better"? Why are constant chars considered as ints? The following: char c = '‚'; if (c == '‚') will work only if default char is unsigned. Otherwise, a cast to (char) is necessary to get the program to work, yet the constant ‚ is clearly a char, not an int. 2. Good use or abuse of #defines and typedefs? What does one think of the current practice of #defineing or typedefing native C types, like char into BYTE, unsigned char into BYTE or UBYTE, char * into TEXT, int into COUNT, int into BOOL, etc. Is there really a reason for this (except (sometimes!) for portability, of course)? There is no such things (as far as I know) in the standard library header files! Moreover, when strictly prototyped programs are compiled the result is generally a long list of type mismatch errors (often pointer mismatches between (char *) and (unsigned char *)). 3. New C programming style What do you think of the 'new' (?) C style programming, … la PASCAL, with (long) identifiers mixing lowercase and uppercase and banishing the underscore? Thanks for your opinion and sincerely yours, Hubert Toullec Angers, France A In the ANSI C committee meetings, there was considerable discussion as to whether a particular feature of the language should be made right or whether backward compatibility should be preserved, to avoid "breaking" existing programs that used documented features of the language. If George Burns (in "Oh God") remade the world from scratch, he "would make the avocados with smaller seeds"; judging from the committee's discussion of this topic, remaking C is much more complex. Several features were left unchanged for the sake of backward compatibility including the priority of the operators (even though some of the bitwise operators could be used more comfortably if the priorities were modified). Similarly, the type of plain chars was specifically left unchanged and thus remains unspecified (i.e. not specifically typed as signed or unsigned). I agree with you that unsigned chars are more useful. I sometimes use the char type to hold small integer values, but they are usually non-negative integers. The char data type has been converted to int since the early days of the language. That eliminates having separate rules for character arithmetic. Character constants should be treated the same way (signed or unsigned) as character variables. Note that standard ASCII includes only seven bit characters, so none of its values have the high order bit set. The C language does not specify that programs must run if you include non-ASCII characters. (Actually it specifies exactly which source characters are acceptable, but that basically is the ASCII set). With your example, char c = '‚'; if (c == '‚') you have used a character that is not specified as being standard. The compiler is not even obliged to compile the code. If you used the octal or hexadecimal escape sequence to represent the character, then the compiler would treat it as a regular character constant. I compiled with Quick-C and ran the program in Listing 1 with one unexpected result. The results were: Unequal -118 138 (char) Equal -118 -118 Hex Equal -118 -118 Hex (char) Equal -118 -118 Notice that the compiler treated both the char variable and the char constant as signed. However, it treated the non-standard character as a regular integer value. Some compilers provide a runtime switch on the interpretation of character variables. You might try using one that has such a switch. On your next question, I am strongly in favor of using typedefs to define logical data types. Using typedefs is preferable to using #defines for consistency's sake, as there are many types which cannot be described in terms of a #define. Declaring variables with typedefs captures a significant amount of information for the maintenance programmer. Unfortunately the C standard, in my opinion, does not go far enough in checking the use of typedefs. My favorite illustration is: typedef SPEED double; typedef TIME double; typedef DISTANCE double; SPEED compute_speed(time, distance) TIME time; DISTANCE distance; { SPEED speed; if (distance != 0.0) speed = time / distance; else speed = 0.0; return speed; } and in another program: SPEED car_speed; TIME car_time; DISTANCE car_distance; car_speed = compute_speed(car_time, car_distance); car_speed = compute_speed(car_distance, car_time); Under the ANSI standard, both of these function calls are compatible, but logically one is erroneous. Some super lint or the compiler itself may one day use the typedef information for error checking. I agree that there is a problem with the type checking performed when comparing or assigning unsigned char pointers and regular char pointers. This problem is most irritating when it forces you to write the declaration as: unsigned char *string = "ABC"; with a cast as: unsigned char *string = (unsigned char *) "ABC"; The ANSI committee debated whether it would be okay to not require such a cast in an initialization statement, but decided that consistency in typing was more important. Of course, I strongly urge using full names for the type names, e.g. BOOLEAN instead of BOOL, etc. On your final question, I am in favor of readable and meaningful variable and function names. Some people may have heard of studies that conclude otherwise, but ALongVariable-Name appears less readable to me than a_long_variable_name. The latter appears closer to what you would expect to read in normal text. How much you should use abbreviations in naming is an open issue. The more abbreviations you use, the more you will have to remember and the more the maintenance programmer will have to infer and comprehend when reading the program. For example, XMT for transmit and TX for transaction may be common, but does CMP stand for compare or compute? Q I am developing a simulation program for study of our company's manufacturing plant using C Language compilers on IBM-PC/AT Machine. I shall be thankful to you for sending information on various software tools in C language for incorporating graphics in the Program. P.K. Gupta Gujarat, India A The only package with which I personally have extensive experience is Essential Graphics by South Mountain Software, Inc., 76 So. Orange Avenue, South Orange, NJ 07079 (201) 762-6965 ($299 list, $230 street). You can distribute products built with Essential Graphics royalty-free, and you can use direct coordinates (your x,y values specify an exact pixel location) or world coordinates (your x,y values are transformed into a pixel location), the latter at some price in speed. The names in this package are somewhat unintelligible, since the developers tried to stay with an eight character name. For example: grbx draws a box, grwx draws an x at a point, hsrect draws a rectangle with a hatch style and a label. As I mentioned above, I would prefer something like graph_box, graph_write_x, and hatch_rectangle_with_label. Essential Graphics also supports loading and saving PC Paintbrush .PCX files. There are several other packages on the market, including Halo Graphics and Advantage Graphics. Perhaps some of our readers may have comments on these or other packages. Reader Responses: Commodore 128 In the May 1989 issue of The C Users Journal, I took note of the questions by Mr. David Ockrassa regarding printing special characters such as the braces, vertical bar, and tilde on the Commodore 128. Before I started programming the Amiga in C, I dealt with the same problem. The problem is two-fold in nature. Because these characters are not in the standard font set of the Commodore 128, the C language packages for that machine generally include an editor that re-defines several characters bitmaps to conform to the missing ones. These are saved with the file as a non-ASCII byte. The problem occurs when the file is printed, because the redefined characters may or may not have the same font set as that of the printer being used. The solution is to write a small printer utility in C. The accompanying code (Listing 2) accomplishes this task, and is available on most commercial bulletin boards. I wrote several printer drivers of this type for the Commodore 128 for use with different printers that have a few more features than the included code such as pagination and filename/date headers. John D. Clark St. Louis, MO MS Dynamic Data Exchange: This letter is in response to Ken Libert's request for material concerning MS Dynamic Data Exchange. If you contact Microsoft's product support services and ask for Windows Software Development Kit support, you can request their Application Notes concerning Dynamic Data Exchange. With this publication you get a disk complete with examples and source. The DDEAPP example allows you to initiate a session with Excel and actually exchange cell data in multiple formats. Tim Kuntz University of Pittsburgh Listing 1 main() { char c = '‚'; char c1 = '\x8A'; if (c == '‚' printf("\n Equal %d %d", c, '‚'); else printf("\n Unequal %d %d", c, '‚'); if (c == (char) '‚') printf("\n (char) Equal %d %d", c, (char) '‚'); else printf("\n (char) Unequal %d %d", c, '‚'); if (c == '\x8A') printf("\n Hex Equal %d %d", c, '\x8A'); else printf("\n Hex Unequal %d %d", c, '\x8A'); if (c == (char) '‚') printf("\n Hex (char) Equal %d %d", c, (char)'\x8A' else printf("\n Hex (char) Unequal %d %d", c, '\x8A'); } Listing 2 /* Printer driver for Gemini 10x */ #include <> main(argc, argv) { unsigned int count; FILE infile, outfile; char c; outfile = 5; open(outfile, 4, 7, " "); for(count = 0; count << argc; count++) { infile = fopen(argv[count], "r"); while((c = getc(infile)) != EOF) { switch(c) { case '{': c + 123; break; case '}': c = 125; break; case '\\': c = 92; break; case '~': c = 126; break; case '': c = 124; break; case '_': c = 95; break; default: if(islower(c)) c += 32; else c -= 128; } putc(c, outfile); } close(infile); } close(outfile); } New Releases Prolog And 'Curses' Added To Library Kenji Hino New Releases CUG297 -- Small Prolog Henri de Feraudy (France) has submitted a public domain Prolog interpreter. His Small Prolog follows a Cambridge syntax (LISP-like syntax) that has advantages for meta-programming and small code. The Small Prolog includes most of standard built-in (predicates) based on Clocksin and Mellish's descriptions in Programming in Prolog, although it can be extended by creating more user defined built-ins. The disk includes C source files, make files, documentation, and many Prolog example files that demonstrate Prolog features for C programmers who may be unfamiliar with Prolog. The source code is very portable and will compile under Turbo C v1.5 and Mark William Let's C v4 on PC clones, Mark William C v3.0 and Megamax Laser C on Atari ST and Sun C compiler on Sun-3. CUG298 -- PC Curses Jeffrey S. Dean has contributed PC Curses, v0.8. This shareware release of PC curses is a C window functions library designed to provide compatibility with the UNIX curses package. By fully utilizing the PC features, this package is coded much simpler than the UNIX version. For example, there is no need for cursor motion and screen output optimization on PC. Currently, there are two major versions of curses database under UNIX; one is termcap, the other terminfo. However, PC curses derives primarily from the former version, with some features of the latter version. Moreover, additional routines (not in the original curses package) are provided for the PC user. The distribution disk includes a couple of demo programs, Small and Large model library for Microsoft C v5.0 and Turbo C v1.5 compilers, and documentation that describes all the functions in the library. The source code is obtained by paying a $20 fee directly to the author. Updates CUG220 -- Window BOSS Phillip A. Mongelluzzo (CT) from Star Guidance Consulting has submitted Revision 07.01.89 of The Window Boss. This release provides additional data entry routines along with support for user-defined physical sizes (i.e. 43 and 50 line EGA/VGA screen sizes). CUG198 -- MicroEmacs Source Willam Bader has extensively updated a text editor, MicroEmacs v3.9. His update includes not only bug fixes of the old version, but also additional commands, portability improvement, and performance enhancement. The new feautures of MicroEmacs are built-in emulation DEC EDT editor support for VT100/VT200 keypads, function keys and scrolling regions, better VMS support such as filter-buffer command and preservation of record format attributes, extra commands such as insert a C format octal escape sequence, scroll the screen horizontally, callable interface of Emacs (you can call Emacs as a function), VMS subshell routines, support for ANSI color, BINARY mode for MS-DOS, pull-down menu, and more. The enhancements include a faster search routine, faster lookup for normal keys and FNC macro, faster display routine. Bader has tested the new version of MicroEmacs using the following compilers and operating systems: VAX11c under VMS4.1 on VAX-11/750, Microsoft C 5.0 under MS-DOS 3.20, Turbo C 1.5 under MS-DOS 3.20, CI86 2.30J under MS-DOS 3.20, Microsoft C under XENIX 386, cc under SunOS 3.5 on Sun 3/360C, cc under SunOS 4.0 on Sun 386i, cc under BSD 2.9 on PDP-11/70. In order to create an executable code for your environment, you need to turn on/off the switches of Machine/OS definitions, Compiler definitions, Terminal Output definitions, and Configuration options in the header file, estruct.h. The distribution setting is to compile under MS-DOS using Turbo C. On The Networks How To Get Net Software Sydney S. Weinstein Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and President of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites and device management for UNIX and MS-DOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320. First, an introduction, and a thank you. I am the new "Contributing Editor" of the "On The Networks" column. I have written before for The C Users Journal so, hopefully I won't be a total stranger to you. And, as David Fiedler said in the last CUJ, I am the Elm coordinator. (Elm, itself, is a large piece of freely distributable software.) I can be reached at syd@DSI.COM, for those with Internet access, or at {bpa, vu-vlsi}!dsinc!syd for those without Internet access. I don't plan any change in the scope or content of this column. I will attempt to report on the latest freely distributable software available on Usenet and the Internet. Also as David did, I am willing to forward a list of neighboring sites for access, provided you send me a self-addressed, stamped envelope. If you have net access but need a news neighbor, I will reply to electronic mail asking for nearby news sites. To David Fiedler, a well earned thank-you for his two year tenure in this spot. Many megabytes of useful software were highlighted here. His tireless attempts to find neighbors for those sites that requested it, is also gratefully appreciated. It was with his help that our site found its first news neighbor several years ago. However, I highly doubt I can keep up with his run of puns. Some Definitions For the past two years, the terms Usenet, Internet, internet, and "the net" have been bantered about in this column. I would like to add a new one: "freely distributable software." Some definitions are in order. Usenet, often times referred to as "the net" is a loose collection of cooperating computers. In the past, all of Usenet ran UNIX, but now with other computers and operating systems supporting UUCP, hosts could be running anything from MS-DOS to VAX/VMS. All that is required to be considered a computer on Usenet is that you communicate via the UNIX to UNIX Communications Protocol (UUCP) to another computer. Usenet consists of electronic mail, file transfers, and network news. It is via network news that most of the programs you read about in this column are distributed. If your computer talks to Usenet or to another computer via some protocol other than UUCP, you are considered to be on an internet (lower case "i"), short for inter-network. This just means that you are using some network other than the UUCP-based Usenet. This generic internet includes "the Internet" and several other networks such as CSNET and BITNET. The actual connection to Usenet is via a gateway computer that talks to both the network you use and Usenet. The Internet (capital "I") is the computer network loosely managed by the Network Information Center at SRI. The Internet is a collection of networks that grew out of the Defense Department's ARPANET (Advanced Projects Research Agency Network). Usenet sites make phone calls to other computers; the Internet is mostly machines connected with dedicated leased lines. These lines usually run faster than the dial-up lines used by UUCP. The Internet has many sub-networks associated with it, including NSFNET, the National Science Foundation Networks. These newer networks run at much higher speeds and currently also pick up a lot of the long distance traffic for Usenet's Network News. In my area, the local NSFNET related network is called PREPnet and has a backbone consisting of 1.544Mb/s (million bit per second) data links and each site either has a 1.544Mb/s or a 56kb/s (thousand bit per second) hookup to the network. The main backbone NSFNET is now all 1.544Mb/s data links and is quickly upgrading to 45Mb/s data links as they become available. Whereas only mail and news is usually available over the Usenet via UUCP, the Internet runs the TCP/IP protocol and supports news (NNTP, Network News Transfer Protocol), mail (SMTP, Simple Mail Transfer Protocol), remote logins to any computer on the network provided you have an account there (telnet), and remote file transfer (FTP, file transfer protocol), and many other services. All of these services coexist and work in real time. The problem with the Internet providing much of the bulk transfers for Usenet is that they use two different addressing methods. Since a large amount of the software mentioned in this column comes from Usenet or the Internet, you'll need to understand how to format the two types of addresses. A UUCP or Usenet address is made up of site names separated by exclamation points, as in bpa!dsinc!syd. If a site wants to mention more than one "well-known site" to use as a route, it usually lists them in curly braces as in {bpa, vu-vlsi}!dsinc!syd (meaning you can use either bpa!dsinc!syd or vu-vlsi!dsinc!syd). Such addresses assume that you know the complete path from your site to one of the named "well-known sites". Some systems run programs to help with this routing, and Usenet's UUCP Mapping Project publishes maps to automate this process. However, not all sites have registered to be listed in these maps. Registration is free and accomplished by sending your entry to rutgers!uucpmap. The maps are continuously updated and distributed via the comp.mail.maps news group. On the Internet, all sites have a unique "Fully Qualified Domain Name" which is administered by the NIC. My site's domain name is node.DSI.COM, where node is the individual computer at my site. Thus, my full current address is syd@dsinc.DSI.COM, but our mailer, like the mailers at a lot of Internet sites, is smart and knows how to forward the mail to me even if you send it to syd@DSI.COM. This allows me to move around within the DSI.COM domain without having to tell everyone a new address. The Internet does not require users to know the complete path to the site; it is sufficient to know the domain name. Now a word of warning. Mixing both @ and ! in the same address leads to trouble. Not everyone follows the standard and processes the addresses correctly. Converting sitea!user@DSI.COM to a UUCP address would produce dsinc!sitea!user. Note that the @ has higher precedence than the !. Many sites get this wrong, causing your mail to bounce (be returned to you as undeliverable). Some sites, ours included, allow UUCP mail to have addresses including domain names in the ! path, as in dsinc!host.domain.type!user. Where allowed, this convention is usually more reliable than mixing the ! and @s. Lastly, what is Public Domain Software and what is Freely Distributable Software? Much of the software described in this column is free in that no licensing fee is required for personal users. In some cases even commercial users aren't required to pay a licensing fee. However, almost all of the software mentioned in this column is not in the Public Domain. For software to be in the Public Domain, either the copyright must expire (and not be renewed) or the authors must specifically renounce copyright protection. The copyright to most software mentioned in this column is reserved by the author or some group. Though the copyright is reserved, the holders have given the user the right to use and distribute the software without fee. This does not place the software in the public domain. You still cannot sell this software nor pretend that you wrote it. Many of the licensing agreements restrict how the software can be used for business purposes. Freely Distributable Software is also different from Shareware. Shareware expects (but doesn't require) the user to pay a fee if they intend to continue using the program. Freely distributable software does not. Now, how do you get the software mentioned in this column? Much of the software mentioned in this column is distributed in Usenet's network news, especially in the comp.sources.unix or the comp.sources.misc news groups. Game software is in the comp.sources.games group. There are also groups for amigas, atari sts, macs, suns, and computers running the X windowing system. The Usenet news groups are distributed via a store-and-forward broadcast from Usenet neighbor to Usenet neighbor either via UUCP or NNTP. However, news articles are kept online at a particular site for only a short period of time, usually less than two weeks. By the time a piece of software appears in this column, it will have been expired and deleted for a long time. Thus, it is necessary to access a news archive site. Many sites around the country have agreed to archive specific news groups. These sites are listed in the comp.archives news group. Many sites are also identified as archive sites in their Usenet Mapping Project map entry. Some have even been listed in this column. These sites allow access to their archives to retrieve the sources. How one accesses the archives depends on where they are and how that site has set up access. Most archives are available for either FTP or UUCP access and a few even allow both. If a site supports FTP access, you need to be on the Internet to access them. FTP allows for opening up a direct connection to the FTP server on their system and transferring the files directly to your system. FTP will prompt for a user name and optionally a password. Most FTP archive sites allow a user name of anonymous. If it then prompts for a password, any password will work, but convention and courtesy dictate that you use your name and site address for the password. If a site supports UUCP access, anyone with UUCP can access the archives. Most sites of this type publish a sample entry for the Systems (L.sys) file showing the system name, phone number of their modems, the connection speeds supported, and the login sequence. Using the uucp command, one can poll the system directly and retrieve the software. Many sites post hour restrictions on when you should access the modems. Courtesy dictates that you follow their requests, and some sites enforce the limit with programs. Be sure to call far enough before the end of the period to complete your transfer in time. A third method, used for smaller files, allows access to an electronic mail-based archive server. With these sites, you send an electronic mail message to the archive server's mailbox name specifying the files you wish. The files are then returned to you via electronic mail. Remember that many sites have a limit on the size of a single mail message, so don't ask for too much at once. Also remember that the archive server is a program, so phrase your request exactly as specified in the instructions for that archive server, and limit your message to exactly that request. Other comments in the message could confuse the program and it might not honor your request. Lastly, for those sites not connected to any network, some sites will copy the software onto your media if you send them a disk or tape along with return postage and a mailer. Other sites sell media with the software already copied onto it. This is especially useful for the largest distributions, such as the X windowing system, which runs multiple tapes. For those sites without Internet access but who do subscribe to UUNET, UUNET will retrieve the files via FTP for you and make them available for UUCP access. And to come... Starting in February, back to more new software from Usenet's source newsgroups and news from the Internet and public access sites. If you have an archive of UUCP-accessible software and would like even more accesses to it, drop me a note via electronic mail and I'll try to get it into an upcoming column. Until then, a slight paraphrase of David's tag line: see you on the nets! PC-METRIC -- A Measuring Tool For Software Larry Versaw Larry Versaw is a systems engineer at Electronic Data Systems' Corporate Communications Division. His 1984 masters thesis was entitled Measuring the Size, Structure, and Complexity of Software. He may be contacted care of 5400 Legacy Drive, Plano, TX 75024. Have you ever wanted to compare the complexity of two programs or to tell how long it took to develop them? Have you ever needed a precise measure of programmer productivity? No one yet can produce truly reliable answers to these problems, but researchers in the 1970s invented many software metrics and have since conducted hundreds of experiments to see what information could be derived by analyzing program source code. Some metrics purport to measure software complexity; others gauge program size or calculate how well structured a program is. The researchers developed many static code analyzers for use in their software metrics experiments, but few such tools were commercially marketed. PC-METRIC, developed by SET Laboratories, Inc., is one of the few stand-alone software metrics programs, if not the only one, said today. To evaluate PC-METRIC, I tried it out on 80 source files containing 25,000 lines of working C code. That exercise proved PC-METRIC to be a reliable product, an efficient program measurement tool that would be indispensable to anyone wishing to use software metrics in his work. The Product For this article I evaluated v1.1, then v2.3 of PC-METRIC. In addition to the C language versions which I examined, SET Laboratories has produced metrics programs for Ada, Assembler, COBOL, FORTRAN, Modula-2, and Pascal. Some languages are supported on systems other than MS-DOS. PC-METRIC specializes in static code analysis; that is, it reports certain quantifiable attributes of program source code without executing it. These attributes include number of source lines, number of executable statements, and a dozen other quantities which are derived by counting certain program elements. Software metrics experiments have usually shown correlations between these kinds of metrics and actual, observed software management factors, such as programmer skill, number of remaining bugs, and actual programming effort. PC-METRIC is based on the work of several of the pioneers in software metrics, notably Tom McCabe and Maurice Halstead. McCabe [McCabe 1976] proposed a measure of program control flow complexity based on a program's directed control flow graph. This metric, called cyclomatic complexity, may be calculated as one plus the number of branches (if statements, loops, alternatives in case statements) in a program. PC-METRIC reports two variants of cyclomatic complexity for each function it analyzes. McCabe's metric is widely accepted and intuitively satisfying as a complexity measure because it represents the amount of program logic that must be understood and retained to understand an algorithm. One of the most imaginative and ingenious models of software, including software size, was developed by the late Professor Halstead [Halstead 1977]. Halstead's system, labeled software physics, is ultimately based on counts of operators and operands in program source code. Several of PC-METRIC's metrics, including length, estimated length, purity ratio, volume, the effort metric, estimated time to develop, and estimated errors, are implementations of Halstead's software physics formulae. Some have seriously questioned the theoretical basis underlying Halstead's model, and Halstead's attempt to bring theory from the realm of psychology to bear on software development has been widely discounted [Coulter 1981, Perlis 1981]. On the other hand, some rather impressive correlations have been observed between certain of Halstead's metrics and such management factors as code quality, programming time, and debugging effort [Gordon 1979, Curtis 1979, Funami 1976, Paige 1980]. If you are experienced with software metrics, you may find some of your favorite metrics missing from PC-METRIC's repertoire. However, PC-METRIC supports more measures of program size and complexity than are actually needed. Most size and complexity metrics are highly correlated with each other, so that beyond the first two or three, additional size and complexity metrics are redundant. In a study which analyzed great quantities of source code written by diverse programmers in C, Ada, PL/I and Pascal, no statistically significant differences were found among the reliability of different size metrics [Versaw 1984]. They all measure the same attribute, after all. Variations in programming style notwithstanding, it is my belief that lines of code remains as good a measure of program size as any other measure we have today, and is almost as good a measure of complexity as any other. Research continues on the subject, but on a smaller scale than ten years ago. Installing and using PC-METRIC is simplicity itself. A user must learn only one command, CMET, which runs interactively or in batch mode. PC-METRIC is configurable to different dialects of C, by modifying a table of key words and symbols, which is stored in an ASCII text file. As PC-METRIC analyzes source code, it produces two reports. The complexity analysis report lists metrics values calculated for each function, and the combined values for the entire module being considered. In the new version of PC-METRIC, SET has remedied the worst problem with version 1.1, which was its inability to analyze units of source code larger than one file. The second report file, called the exceptions report, highlights all measured values which lie outside of predetermined, user-defined limits. Both the analysis report and exceptions report are output as ASCII files. In the current version of PC-METRIC, these reports are suitable for printing without any manual editing or reformatting. The current version of PC-METRIC provides a CONVERT utility which can convert the report data into a comma-delimited text file suitable for uploading into many spreadsheet or database packages. This is an especially valuable addition to the PC-METRIC package. Program attributes which cannot be measured by simply counting certain operators and identifiers are beyond the scope of PC-METRIC, unfortunately. These would include attributes such as the degree of information hiding, module coupling, function binding, and efficiency. If we could only measure these attributes objectively and automatically, it would greatly enhance the practice of software engineering. Where PC-METRIC does excel is in calculating reliably the most common size and complexity metrics with a minimum of fuss at a reasonable speed (4000 lines per minute on a 10 MHz AT type computer). System Requirements PC-METRIC requires far less memory and disk space than any C compiler would, so hardware requirements do not limit the use of PC-METRIC. The Audience PC-METRIC is intended primarily for two kinds of users. The first is software developers who would use a statistical analysis of their code as a help in identifying overly complex modules or functions. The PC-METRIC manual correctly identifies programmer feedback as an important application of PC-METRIC. The second kind of person who needs PC-METRIC is the manager or software project leader who would use software metrics as an tool to monitor programmer compliance to local standards of function size, module complexity, or other quantifiable program aspects. Documentation All bases are covered in PC-METRIC's three-part manual. Part 1 provides a well-written tutorial on the field of software metrics, concentrating on the specific metrics obtainable with PC-METRIC. It even includes a brief annotated bibliography of software metrics literature. Users with little prior exposure to the field of software metrics should be sure to read this part. Part 2 describes how to install, configure, and use PC-METRIC. It also is well-organized and gives the right amount of examples. PC-METRIC's counting strategy is documented toward the end of this section. Part 3, "Applying PC-METRIC", instructs users on what to do with all those numbers PC-METRIC generates. It first documents the indispensable new CONVERT utility mentioned above. Then it explains ways to interpret the results: how to properly use software measures as a feedback tool or resource estimation tool, in practice. Support SET Labs offers technical support by telephone for their products and will answer general questions on software metrics. SET offers site licensing as well as individual licenses. If you have a particular machine or language for which you would like a version of PC-METRIC, SET Labs will usually do a port for the price of a single site license. Conclusions PC-METRIC is an indispensable tool, and perhaps the only tool in its class, for analyzing program size and complexity by those software metrics it provides. By cleaning up the reports and by providing the CONVERT utility, the new version of PC-METRIC has enhanced users' ability to analyze and apply program metrics. PC-METRIC applies state of the art methods for objectively measuring two basic attributes of program source code: size and complexity. The usefulness of these measures is variable, but not because of any deficiency in PC-METRIC itself. PC-METRIC, and counting programs in general, find their surest application in measuring adherence to a specific coding standard. I recommend PC-METRIC for programmers and managers as a tool for monitoring adherence to their coding standard which could, and probably should, include some complexity metrics. I recommend it also as a tool for identifying overly complex modules that need extra testing or rewriting. The list price is $199. You can contact SET Labs for more information at P.O. Box 86327, Portland, OR 97283 (503) 289-4758. References Coulter, Neal S., Applications of Psychology in Software Science, Proceedings of IEEE COMPSAC 81, (1981), 50-51. Curtis, Bill; Sheppard, Sylvia; and Milliman, Phil, Third Time Charm: Stronger Prediction of Programmer Performance by Software Complexity Metrics, Proceedings of Fourth International Conference on Software Engineering, (1979), 356-360. Funami, Y., and Halstead, M.H., A Software Physics Analysis of Akiyama's Debugging Data, Proceedings of the Symposium on Computer Software Models, (1976), 133-138. Gordon, Ronald, A Quantitative Justification for a Measure of Program Clarity, IEEE Transactions on Software Engineering, IV (March 1979), 121-128. Halstead, Maurice, Elements of Software Science, New York, Elsevier, 1977. McCabe, T.J., A Complexity Measure, IEEE Transactions on Software Engineering, II (December 1976), 308-320. Paige, M., A Metric for Software Test Planning, Proceedings of IEEE COMPSAC 80, (1980), 499-504. Perlis, Alan J.; Sayward, Frederick G.; and Shaw, Mary, editors, Software Metrics: An Analysis and Evaluation, Cambridge, Massachusetts, MIT Press, 1981. Versaw, Larry, A Tool for Measuring the Size, Structure and Complexity of Software, thesis, Denton, Texas, North Texas State University, 1984. GRAD Graphics Library Ron Burk and Helen Custer Ron Burk has a BSEE from the University of Kansas and has been a programmer for the past 10 years. He is currently president of Burk Labs, a small software consulting firm. Helen Custer holds degrees in Computer Science, English, and Psychology from the University of Kansas and is currently a Senior Software Technical Wrter for a Fortune 500 company. She has coauthored books on C, GW-BASIC, and Z-BASIC. Both may be contacted at Burk Labs, P.O. Box 3082; Redmond, WA 98073-3082. The GRAD Graphics Library, written by Conrad Kwok, is a shareware package for drawing simple graphics images, including circles, lines, ellipses, arcs, and rectangles. It can also fill regions, display characters, and dump screen graphics to your printer. The 50-odd graphics functions are carefully documented in a 100-page users manual. The functions are written for PC/XT/AT clones using Microsoft C v4.0. The package is written in Microsoft C and 8088 assembly language. GRAD can also be compiled with Turbo C, and directions for doing so are included with the disks; a fewer minor changes are required. GRAD supports both CGA (640 x 200) and HGA (720 x 348) graphics cards, but unfortunately, it only supports one device at a time. You link with the library that corresponds to the device you want to use; there is no auto-detect of the graphics card adaptor. The routines are modularized in such a way that it may be possible to make them work with other graphics devices by changing one or two files of source code. However, a graphics device is not absolutely necessary, as GRAD allows you to define up to nine virtual graphics screens at run time. GRAD also supports several printers, including the Epson FX-80, the Okidata ML192, and compatibles, or laser printers using the JLASER card. You can also configure other printers to work with GRAD. The GRAD user's manual and assorted documentation files thoroughly document the functions that are available in GRAD. The writing is friendly and, in addition to GRAD, the files document a number of concepts relating to graphics libraries in general. It describes how fonts are viewed by a graphics package, how to use a graphics coordinate system, and what a virtual graphics screen is, among other things. Example code is provided for functions that are difficult to describe. Pixels Vs. Lines The graphics screen on most personal computers is pixel-oriented; it is made up of dots that you can turn on or off. A pen plotter, on the other hand, is line-oriented; everything it draws is made up of line segments. The GRAD library is oriented towards pixel graphics. For example, it supports the ability to "grab" a rectangular portion of the screen and transfer it to another part of the screen. That sort of operation could not be implemented with a pen plotter. You could, however, use GRAD as a PC device driver for a more general set of line-oriented library functions. GRAD supplies almost all of the primitives you would need for such a project. Also, most printers support pixel graphics, so they can serve as hard-copy devices for programs that use pixel graphics. If your printer is similar to the Epson FX-80 or the Okidata ML192, you can adapt the software to work with your printer. An appendix at the back of the manual documents that process. In the general case, however, you may have to buy the source code from the author to make GRAD work with your printer. Standard Transformations In some kinds of graphics, you find yourself drawing the same basic symbol in slightly different ways (different proportions, different locations on the screen, and so on). Three types of transformations of graphics are commonly supported by high-level graphics libraries: Translation--Moving a graphic element (a square, for example) to a new location. Scaling--Making a graphic element appear shorter and fatter, or taller and thinner. Rotation--Turning a graphic element around an axis. GRAD supports graphics translation by allowing you to change the value for the upper left corner, or "origin", of your frame. For example, if you wish a graphic element to appear multiple times in your final drawing, you can create a subroutine that draws the element, then call that subroutine multiple times. Between calls to the subroutine, you simply change the origin for the element. GRAD does not support graphics scaling or rotation. If you want to draw the same symbol with different heights or widths, you must implement the scaling with your own code. Likewise, if you want to rotate a graphic image so that it appears sideways or upside-down, you'll have to write your own code to do this. One reason you might want to do a transformation such as scaling is to solve the problem of aspect ratio. Aspect ratio is the ratio of a pixel's height to its width. GRAD assumes that each pixel is square, the same height as width. However, on a typical CGA monitor, each pixel is rectangular instead of square, that is, its aspect ratio is not 1:1. The aspect ratio problem becomes very clear when you ask GRAD to draw a circle on a CGA monitor. It draws a true circle, but because the pixels are not square, the result on the screen is a "stretched" circle (an ellipse). In a line-based graphics library, this problem can be solved by applying the appropriate scaling transformation just before translating the line into pixels. In GRAD, however, there isn't much you can do except take the problem into account in your code and draw a rectangle to get a square, an ellipse to get a circle, and so on. Virtual Screens A virtual screen is just like the real screen in every way--you just can't see it. Suppose you want your graphics program to have the ability to undo the last drawing request the user made. One way to accomplish the visual part of this task is to use a virtual screen. For each user request that is not an undo request, you first perform the previous request on the virtual screen, then perform the new request on the real screen. If the request is an undo, you could simply copy the virtual screen to the real screen. GRAD provides virtual screens which it calls "frames". A frame is a rectangular memory area where a graphic image is stored. If the memory area corresponds to video memory, then the graphic is visible on the screen. If the memory area is regular memory, the frame is a virtual graphics screen. A graphic image created in this area can only be seen by dumping it to the printer or by copying it to the video memory. Frames are especially useful for windowing operations, as described in the following section. Drawing Attributes GRAD allows you to specify line styles and writing modes. Normally, when you draw a line across the screen, you get a solid line. A line style, however, allows you to specify that all lines are dotted lines, or dashed lines, or almost any pattern of dots and dashes you like. Another drawing attribute that GRAD lets you specify is the writing mode. On a pen plotter, a line is a line--you can never erase an existing line. On a graphics screen, however, there are several interesting possibilities. Usually, you want the screen to look like it would on a pen plotter. This is called OR mode, since it is accomplished by bitwise ORing the pixels to be drawn with the screen pixels' existing value. GRAD also supports an XOR mode and an AND mode. The XOR mode can be used to "erase" lines, because if you draw a line in OR mode and then redraw the line in XOR mode, the line disappears. This isn't perfect, however; if there is a second line on the screen that intersects the first one, it will have a "hole" in it, because the pixel where the two lines intersected is turned off. You can also use XOR mode to achieve a kind of reverse-video effect, by turning on a block of pixels, switching to XOR mode, then drawing on the block. Drawing lines in AND mode doesn't make much sense, because the only pixels that will get turned on are those that were already on. In other words, it will look as though nothing got drawn. AND mode is useful for Bit-Block Transfers, however. Bit-Block Transfers, or bitblts (pronounced "bitblits"), are at the heart of windowing systems that operate in graphics mode. For example, moving a window from one place to another is a bitblt operation; so is removing a window (copying a block of background pattern to it). GRAD provides basic bitblt operations that allow you to transfer blocks between virtual screens and to and from files. GRAD's bitblt operations obey the current writing mode, so you can combine the block transfers with the bit-wise modes to do things like erase a window or cause a window to appear in reverse-video. Clipping Clipping is the ability to restrict graphics output to a specific (usually rectangular) region of the screen. For example, if you are using an inch-high strip along the bottom of the screen to display status information about your program, you want to ensure that no other part of your output strays into that area. If your graphics library supports clipping, you can define a clipping rectangle. Your program can then continue to draw anywhere it likes, but only that portion of the drawing that lies within the clipping region appears. GRAD allows you to specify a single, rectangular clipping region called a "window". There is no on/off function to disable or enable the defined clipping region. Instead, GRAD supplies a ResetWin() function that redefines the clipping region to be the entire virtual screen (which effectively turns clipping off). Drawing Text Whether you are drawing business charts or flowcharts, you inevitably need to display text along with your graphics. There are two general ways to draw text in graphics mode on a pixel-oriented device. Bitmapped fonts are the kind you normally see in text mode on a PC screen. As the name implies, they are defined in terms of a set of bits that are on or off, each bit corresponding to a pixel of the overall character. Bitmapped fonts are easy to define and fast to display, but difficult to scale up and down in size, and difficult to clip except on character boundaries. Stroke fonts, on the other hand, are stored as line segments and, therefore, can usually be scaled up and down in size, stretched in any direction (to form slanted text, for example), and even rotated to arbitrary angles. GRAD has no stroke fonts but supports bitmapped fonts. These fonts can be stored on files and loaded into memory dynamically, as needed. This is useful when you want to use many fonts, but don't want to consume a lot of memory. You can get the effect of rotated fonts (you can get one of four, 90-degree rotations) by using a specially rotated font file. There are 18 font files on the GRAD disk. Although most of these are variations on a couple of fonts, they provide good examples of what you can do. You can also make bitmapped fonts that have variable width. This looks more professional, especially with larger fonts. GRAD also supplies a graphics input function that reads from the keyboard. This is very handy when you need to query the user while you are drawing graphics, since you will want the keyboard input to be echoed on the screen with graphic text. Remember that just calling gets() probably won't produce the desired result when the screen is in graphics mode. Picture Segments If you are writing a program that allows the user to manipulate the graphics drawn on the screen, you may want to provide a way for them to control units of the picture more complicated than individual pixels or lines. For example, the user of an architectural program may want to move an entire wall (including windows and doors) as a unit. Picture segments support this type of operation. A picture segment is just a sequence of drawing commands that you can store, retrieve, and use to draw the same object in a variety of places on the screen. GRAD does not provide picture segments as such, but it defines a draw() function which is a step in that direction. draw() takes three arguments: a C string containing drawing commands and two integer arguments that can be used to parameterize the commands in the string. The key feature of the graphics commands that you store in the string is that they are relative to the current drawing coordinate. For example, here is a command that draws a rectangle at the current location. Draw("RT10 DN5 LF10 UP5", 0, 0); It always draws the same size rectangle; however, it could be parameterized like this: Draw("RT%OX DN%OY, LF%OX, UP%OY", 3, 10); In this case, the arguments to draw() alter the symbol that is specified in the command string. Notice that you could build up command strings, save them to files, and bring them back later -- just as you would use a symbol library. GRAD just supplies the basic command string ability, though. You would have to design your own functions to manage a symbol library. Graphics Environment If you were going to implement a symbol library, you would want the drawing of symbols to be modular. The symbol might draw in a different line style, graphics mode, or font, or use a different clipping window than the calling routine. A modular library would ensure that each symbol routine resets all these attributes back to their original values after the symbol is drawn. Fortunately, there is an easier solution. GRAD groups attributes like the current origin, clip region, line style, font, and so on, into a bundle which it calls an environment. The modular symbol or graphics routine can simply save the current environment before it begins drawing, and restore it after all its graphics operations are complete. Utility Programs The GRAD disk contains several utility programs, which Conrad Kwok wrote as sample programs for the library. The first program, Interp, is an interpreter for GRAD library functions. You can place a series of GRAD function calls in a file, then give that file name as an argument to Interp. Interp interprets the graphics commands and draws the resulting graphic on the screen. This is a fast way to experiment with the library, since you don't have to recompile anything to make changes to what you're drawing. The input to the interpreter mimics the analagous C functions. Whenever a particular function returns a value, you can simply write: var1 = function(val1, val2, ...) The variable that you name (in this case, var1) is created and initialized by the value returned by the function. Similarly, for functions that return values through pointers, you can type something like this: function (&var1, &var2, ...) The variable names available for use are hard-coded in the program, but the source to the interpreter is supplied, so you could easily extend it. Listing 1 shows a sample input file which draws an ellipse around the text "The C User's Journal". MPrint (Merge Print) is a variation of Interp. MPrint allows you to specify a file containing lines of text that are merged with the graphics drawn by the interpreter. In other words, you can print graphics in graphics mode on the printer and print the text portions in text mode, which is much faster than printing text in graphics mode. Distribution and Licensing The GRAD graphics library is a good, basic, integer graphics system. It contains a complete set of primitives which could be used as a base for a more sophisticated graphics package, a floating-point package, for example. The main disadvantage of the library is that it is not written for multiple graphics adaptors. You must compile the library for a specific adaptor. Conrad Kwok, GRAD's author, is distributing this graphics package as shareware. If you find his program useful, he requests that you send a contribution of $20 to him. If you send $20 or more, you will receive updates to the library. If you send a contribution of $60 or more, you will get the source for the latest version of GRAD, as well as a programmer reference manual which documents the internal data structures and algorithms used in the library. The source is copyrighted. The licensing terms for GRAD are as follows: You may freely copy and distribute the GRAD library and related programs provided the documentation and sample programs are not modified in any way. However, you may write additions to the library and distribute those along with the original library. You may not charge a fee for distributing the library or your enhancements to it. However, you may charge a small fee for the cost of the disk, shipping, and handling. Your program must be in the public domain and must contain a message indicating that it contains code from GRAD, written by Conrad Kwok. If your program does not meet the above requirements, you must get written permission from Conrad Kwok before distributing it. Publisher's Forum To show our appreciation for your readership and to commemorate The C Users Journal's second anniversary, we've bound a combination calendar and reference card into this issue. P.J. Plauger prepared the reference card. It summarizes calling conventions for Standard C library functions. Susan, our staff artist, prepared the calendar. We hope you find at least one side useful. This issue begins our third year of publishing The C Users Journal. It also marks our first issue on a monthly publication cycle. Two years ago, when we first combined The C Journal and The C Users Group Newsletter, the Journal was 72 pages and went to 6800 subscribers. This issue of 144 pages will be distributed to over 23,000 subscribers; another 5400 copies will go to newsstand distributors. The magazine and related activities now employ 16 persons -- up from about ten a year ago. To accommodate this extra staff, we've just moved into larger quarters, about two blocks from our old office. (We're all moved in, but we're not yet 100 percent functional. There are still little things missing -- like my terminal, and Donna's doorknob, and Kenji's return air vent, and ...) We think all these signs are cause to celebrate. (Well, maybe all but the moving ... that's pretty traumatic.) Since it's your interest in C that has stimulated this activity, we wanted to share the celebration with you. Unfortunately, it's difficult to coordinate a celebration involving over 23,000 persons scattered around the globe. We considered mailing you each a party favor with instructions about when to toot your whistle, but the reference card seemed more practical. If nothing else, we're always practical. (Personally, I'm celebrating by trying to catch up on some lost sleep.) We hope you like the card. We offer it with our heartfelt gratitude; thanks for reading the magazine, thanks for writing for the magazine, and thanks for advertising in the magazine. We'll be doing our best to earn your continued participation. Sincerely yours, Robert Ward Editor/Publisher New Products Industry-Related News & Announcements Oasys Offers Green Hills C++ Oasys, Inc. has introduced the Green Hills C++ compiler, which supports cross and native mode development. Green Hills C++ is integrated with the Oasys 680x0 and 88000 Cross Tool Kits, enabling embedded systems developers to take advantage of object-oriented techniques. Green Hills C++ supports Kernighan and Ritchie C and complies with ANSI C standard. Green Hills C++ provides object oriented programming features such as data abstraction, strong type checking, and overloading of function names and operators. New C++ features include classes with scope, and overloading new and arrow operators. Green Hills C++ also includes compiler optimizing techniques such as inlining, loop unrolling and register caching. Green Hills C++ compiler is available from Oasys on the Sun-3. Oasys claims that the compiler will be ported to other UNIX workstation and minicomputers soon. Oasys supports Designer C++, the C++ translator developed by Glockenspiel, Ltd. Oasys will provide current customers with the ability to upgrade to the Green Hills C++ compiler. For more information contact Oasys at 230 Second Ave., Waltham, MA 02154 (617) 890-7889; FAX (617) 890-4644. Library Brings UNIX Functions To Hercules Card Users Certified Scientific Software has announced a subroutine package that allows programmers using most PC-based UNIX systems to take full advantage of Hercules-type monochrome graphics adapters. The package includes the standard UNIX plot(3) subroutines plus many enhancements, such as patterned fills of circles, rectangles and user-defined shapes; two fonts -- 8x8 pixel and 8x16 pixel -- for labels; clipping windows; five pixel write-modes, including bit-set, bit-clear and exclusive-or; and routines to support double buffering using the Hercules adapter's two graphics pages, making animation effects possible. The subroutines use only integer code, so they will run efficiently whether or not floating-point hardware is installed. A 10-page manual and demonstration C code is included. The package is currently available for Interactive Systems 386/ix; AT&T Sytem V/386; Microport System V/AT; XENIX 286 v2.2/2.3 and 386 v2.3; and VENIX v2.3/2.4. A single-user license is priced at $99, plus $2 shipping and handling. The subroutines may be licensed for incorporation in programs for resale by special arrangement. For more information or a review copy, contact Certified Scientific Software, P.O. Box 802168, Chicago, IL 60680 (312) 326-6098. Send e-mail to: UUCP:{seismo,harpo,ihnp4, linus,allegra}!harvard!certif!herc INTERNET:certif!herc@ harvard. harvard.edu Screen Manager Professional Updated To Version 1.5B Logical Alternatives, Inc. has released version 1.5B of the Screen Manager Professional for C programmers. The S.M.P. is a tool box of over 150 pre-written functions for complex windowing, menu generation and interactive context sensitive help features. To maximize performance and minimize memory overhead, the windowing functions are written in assembly language. The smallest possible program size using the S.M.P. functions is approximately 7K. The menu system, on the other hand, is written in C, providing flexibility and allowing the programmer to customize the function. Other features include: keyboard filtering for data entry systems, OS and compiler independence, full video support, background processing, reconfigurable memory allocation, and a 300-page ring bound manual. This product also includes an event driven mouse support system, which makes S.M.P. comparable to a text-based Microsoft Windows programming interface. Full technical support is available including a new bulletin board for professional programmers: The LAB (814) 234-1881. The introductory price for S.M.P. v1.5B is $250, (with source code, $350). Screen Manager Professional supports Microsoft C, Borland's Turbo C, Watcom C, Lattice C, and Zortech C++. For more information contact Donald McCandless, Marketing Director, Logical Alternatives, Inc., Calder Square, P.O. Box 10674, State College, PA 16805 (814) 234-8088, BBS: (814) 234-1881, FAX: (814) 234-6864. TE Version 3.0 Announced Sub Systems, Inc. has released TE Developer's Kit v3.0. The new version includes a TES small window editor routine. An application program can utilize TES without programming changes to the routine. The application program passes a set of parameters which specifies the window coordinates, maximum file size and an input buffer or an input file. The output is either a buffer or a file. The TES routine supports screen scrolling functions, word-wrapping, and block commands. It requires 60K of memory and supports Microsoft and Borland C compilers. The package includes the complete source code. This version of TE Developer's Kit retains TE text editor source code and library routines from the earlier version. The package lists for $125. For more information contact Sub Systems, 159 Main St. #8C, Stoneham, MA 02180 (800) 447-6819 or (617) 438-8901. Powerline Updates Source Utilities Powerline Software, Inc. has released new versions of their programming utilities Source Print v4.0 and Tree Diagrammer v3.0. Powerline has added graphics drivers to support over 400 printers. These new features include support for many printers (including laser printer), support for C, Pascal, and dBASE from a variety of language development companies. Both Source Print (a source code formatting utility) and Tree Diagrammer (an "organizational chart" diagrammer) are software tools for all PC programmers coding in C, C++, dBASE, Pascal, BASIC, FORTRAN, and Modula-2. For more information contact Powerline Software Inc. at their new address: 826 Douglass Street, San Francisco, CA 94114 (415) 346-8325. Emulator Mimes Xenix Console Hansco Information Technologies, Inc. has released its new terminal emulator system, HIT/Ansi. HIT/Ansi is a memory-resident program for MS-DOS compatible computers that emulate the Xenix color console. The program may be called up while running any MS-DOS application with a hot key so that the computer functions as a terminal to a host Xenix machine. When the hot key is pressed again, the computer returns to MS-DOS and to whatever program was running. Using less than 48K of RAM, HIT/Ansi supports color (CGA, EGA, and VGA) or monochrome systems, 12 function keys and local printers in the foreground or background through the parallel port. A descriptive brochure and demonstration diskette for the product are available upon request. For more information contact Hansco Information Technologies, Inc., 185 West Ave., Ste. 304, Ludlow, MA 01056 (800) 548-9754 or (413) 547-8991. Saber And TI Join Efforts Saber Software, Inc., developer of Saber-C has announced a joint software development agreement with Texas Instruments, Inc. Engineering teams from both companies are using Saber-C for cooperatively developing new software technology that will be used in software products TI and Saber plan to introduce in the future. Texas Instruments will also use Saber-C widely for its own internal development projects. Saber-C runs on UNIX, Sun Microsystems Sun-3, Sun-4, Sun 386i and SPARCstation workstations. Saber-C is also available for DEC's VAXstation, and Ultrix. For more information, contact Saber Software, Inc., 185 Alewife Brook Parkway, Cambridge, MA 02138 (617) 876-7636; FAX (617) 547-9011. Watcom Ships v7.0 For 386 Hosts Watcom is now shipping the Watcom C v7.0/386 optimizing compiler and run-time library for the Intel 80386 architecture. Already available for the 16-bit MS-DOS environment with the 80X86 processors, Watcom C v7.0 is now available for the 32-bit 80386 processor. Watcom C v7.0/386 ports MS-DOS applications to 32-bit native mode, enabling full 386 performance without 640K limitations. Watcom C v7.0/386 generates code for 32-bit protect mode and can access large data areas without source modification or special compiler options. Watcom C v7.0 possesses 386-specific instructions, sophisticated addressing modes and 32-bit linear addresses. Porting to the 386 architecture involves recompiling existing programs and linking with the 386 library to enable addressing of up to 4 gigabytes of memory. Applications compiled with Watcom C v7.0/386 operate with MS-DOS extenders which enable use of 80386 protect mode. Both the 80386 software tools from Phar Lap Software and OS/386 from A.I. Architects support use of Watcom C v7.0/386 32-bit protect-mode with MS-DOS. Watcom C v7.0/386 includes the compiler run-time library, a "compile and link" utility, and Touch utilities, an object file disassembler, a patch utility, and the Watcom C Preprocessor. The list price for Watcom C v7.0 /386 is $895. For more information, contact Watcom at 415 Phillip Street, Waterloo, Ontario, Canada, N2L 3x2 (519) 886-3700, FAX (519) 747-4971, or call the Watcom C order and inquiry line toll free: (800) 265-4555. Sterling Castle Offers Logic Gem In Single Language Versions Sterling Castle is shipping a "single language edition" of Logic Gem v1.5, its logic processor and code generator. This edition includes one of BASIC, FORTRAN, Pascal, dBase and C, plus English for documenting procedures, writing pseudocode, and building rule bases for expert systems. The products are identical except that one programming language choice appears in the language menu instead of five. LogicGem includes an editor, interpreter and compiler and runs on PC, XT, AT, PS/2 or compatibles. LG requires 640K of RAM, PC/MS-DOS 2.0 or greater and can be used with a color or monochrome monitor. LG's "Programmer's Edition" complete with documentation has a suggested retail of $99. The single language edition, sold only directly from Sterling Castle, is $49.95 with complete documentation and on 3.5" or 5.25" disks. The full purchase price of the single language edition is applicable against a later purchase of the multi-language programmer's edition. There is a 90-day money-back guarantee, free technical support and 24 hour bulletin board service. Upgrades to v1.5 are free to registered users. Contact Sterling Castle, 702 Washington St., Ste. 174, Marina Del Rey, CA 90292. Inside CA (213) 306-3020 or (800) 323-6406; Outside CA (800) 722-7853; FAX (213) 821-8122. CI Adds Profiler To QNX Computer Innovations has added a new utility which provides statistical profiling of a program to the Computer Innovations C86 C Compiler for QNX. The profiler points out parts of the program that use the most CPU time, done in terms of source file constructs that the programmer can easily relate to: by module, function, or line number. The profiler is currently included with the C86 C Compiler package, and is available for downloading (by registered C86 users) from the Computer Innovations Bulletin Board Update System. For more information contact Computer Innovations, Inc., 980 Shrewsbury Ave., Tinton Falls, NJ 07724 (201) 542-5920. Spell Checker Works With C Geller Software Laboratories, Inc. has introduced SpellCode, a spell checker. SpellCode works with C, Pascal, BASIC, databases and Lotus spreadsheets as well as dBase and all work-alike interpreters and compilers. SpellCode includes a comprehensive English dictionary and a special dictionary of common computer terms. The user can also create as many customer dictionaries as needed. It is available from Geller Software Laboratories, Inc., 35 Stephen St., Montclair, NJ 07042 for a special introductory price -- $49.95. For more information call (201) 746-7402. MetaWare Available On SystemV/386 MetaWare's High C compiler will be offered on the Santa Cruz Operation (SCO) and AT&T UNIX System V/386 operating system. The High C compiler features over a dozen different global optimizations, including global allocation of values to registers, removal of invariant expressions from loops, live/dead analysis, dead code elimination, and constant and copy propagation. MetaWare's High C compiler also features a code generator that makes use of 386/387 instruction sets including support of in-line transcendentals and floating-point long doubles (80 bits). The code generator also features in-line intrinsic function; in certain cases, the compiler replaces a call to the C library with the actual in-line instructions, resulting in code that is smaller and performs fewer operations. The High C compiler provides ANSI compatibility, cross-language calling, acccurate and helpful diagnostics, and maximum configurability. Developers can select from a wide variety of compiler features through the use of toggles and programs. MetaWare supports the complete Intel 80x86 microprocessor family including the 8086, 80186, 80286, 80386, and 80486, and the Intel i860; Advanced Micro Devices' Am29K; Sun Microsystem's Sun386i, Sun-3, and Sun-4 workstations; Motorola's 680x0 family of processors; IBM's PS/2, RT, and 370; and DEC's VAX. Operating system support includes UNIX 4.x BSD, UNIX System V.x, SunOS, IBM's AIX, DEC's Ultrix, MS/PC-DOS, OS/2, DRI's FlexOS, AIA's OS/286 & 386, Phar Lap's 386DOS-Extender, DEC's VMS, and others. Most platforms are supported with native and cross compilers. For more information contact MetaWare Incorporated, 2161 Delaware Avenue, Santa Cruz, CA 95060-5706 (408) 429-6382; FAX (408) 429-9273. FairCom Announces Update For c-tree File Handler FairCom has announced c-tree File Handler/Server v4.3, which provides functions to store, update and retrieve fixed or variable length data in random or sequential order. c-tree comes with source code and employs portable client/server architecture. The new version has a high speed sorted key load routine enabling virtually linear time index creation regardless of the number of index entries. Another function returns the key value at an approximate given percentile of the ordered key value list. The new version also estimates the number of entries between two key values. c-tree v4.3 has new make files and scripts for OS/2, Watcom, MPW v3.0 and Commando tool support for all of MPW. There is server support for LightSpeed C on the Mac and server/client support for Turbo C. Reuse of depleted nodes in single-user and c-tree Server modes of operation is possible. Version 4.3C lists at $395 (plus shipping and handling). To order contact FairCom Corp, 4006 W. Broadway, Columbia, MO 65203, (800) 234-8180 FAX (314) 445-9698. Coromandel Releases C-Trieve For MS-Windows Environment Coromandel has announced the release of its C-Trieve-ISAM file manager for MS-Windows. C-Trieve/Windows, now shipping, is based on the X/Open standard. It also runs under MS-DOS, XENIX, UNIX and DESQview. C-Trieve can be used by both C and C++ programmers. C-Trieve/Windows is a library of routines that allows the programmer to build custom data management applications. C-Trieve/Windows is based on a Client-server model. A single server can support multiple clients and maintain application integrity using locking and transactions. C-Trieve/Windows is based on C-Trieve which is the native file manager of Coromandel's RDBMS, C-SQL. The current offering includes dBase and Btrieve. C-Trieve users can upgrade to C-SQL and continue to use their files; no need exists to translate or modify the data for SQL access. For more information contact Coromandel Industries, Inc., 108-27, 64th Road, Forest Hills, NY 11375 (718) 997-0699; FAX (718) 997-0793. Eigenware Tech Offers CSL Buyer's Guide Eigenware Technologies now has available a 45 page buyer's guide for the C Scientific Programming Library. This guide provides a description of the CSL product and several other related products and services. These other products include compilers, editors, technical monograph, and TeX typesetting software used for CSL documentation. Detailed ordering and international shipping information is also supplied in the buyer's guide. The guide is available for $5 from Eigenware Technologies, 13090 La Vista Drive, Saratoga, CA 95070. For more information call (408) 867-1184. QuickGeometry Receives Upgrade Building Block Software has released QuickGeometry Library v1.01, a collection of math subroutines for developing CAD/CAM, parametric design, NC programming, post processing, finite element analysis or other similar programs. The major enhancements are the addition of support for Turbo C, and internal changes that simplify interfacing to graphics libraries. The QuickGeometry Library provides CAD/CAM programmers with routines for standard geometric operations required for CAD/CAM software development. In addition, the QuickGeometry Library provides routines that read and write DXF files, and that manage lists. Selling for $199, the product includes source code, object code for MS-DOS, extensive documentation, working example programs, one hour of telephone support and a 30-day money-back guarantee. For more information contact Building Block Software, PO Box 1373, Somerville, MA 02144 (617) 628-5217. We Have Mail [Editor's Note: Yes, we omitted the listing from last month's letters column. It appears as a separate article in this issue, Dealing With Memory Allocation Problems. -- rlw] Dear Sir, It has been many years since I sent a letter to a periodical, some 16 or 17 years to be precise. I have some 22 years of programming background, ranging from systems programming to applications and telecommunications. As the original designer and author of SHADOW (IBM mainframe telecommunications system), and co-designer of MANAGE-IMS I feel I can speak with some experience. I mention my background not to attempt to impress, but to add some weight to my words about the latest fad in the C world. C++. When C was first inflicted on us I welcomed it and disliked it, however, two facts stand out. First, K&R are undoubtedly very bright people with much insight. Secondly, ANSI cleaned up the loose ends and now C is a serious commercial language. C is now one of the four that the IBM SAA endorses. I have written in C since 1982, using MVS, UNIX and the micro versions. Many years ago we in the mainframe world discovered the benefits of control blocks, pointers and vector tables. In fact the control block structure of any dynamic operating system is, no ifs, no ors, no buts about it, is an object oriented programmed system. This "new" concept of O.O.P. (object oriented programming) is what worries me. First, it is not new. We have used object oriented systems for all the 22 years I have been in the industry. I have a fear that OOP will become OOPS. I feel that as far as C goes, C++ is violating the cardinal rule "IF IT ISN'T FIXED, DON'T BREAK IT"! I have studied OOP systems, the new window systems are OOP, and on the whole well done. They exist without the ?benefit? of C++. As is stands, C supports objects very well. I have an example in the C language forum of Compuserve, complete in and of itself for anyone who cares to study it. In short, C++ is a farce. C++ I feel was implemented by some well intentioned people who have no serious commercial programming expertise, and certainly no IBM mainframe internals experience. C++ is a random collection of items, a mixed bag of minor changes, and the OOP extension. The minor additions attack the heart of structured programming (for example allowing data to be defined anywhere code may exist). They had some good ideas, existing for a quarter of a century in the mainframe world, such as defaults. Yet the defaults are positional as opposed to pure keyword! When keyword parameters are introduced into functions and macros then a whole new world is opened up. C++ felt it was better to stick to methods flying in the face of good mainframe experience and thus limit its abilities. The data reference, the change to casting, the inline functions are questionable at best, and ignore the potential increase in power of the processor and the optimisation ability of future compilers. Programmers are made to get involved with optimisation, not the machine. Overloaded functions I admit are a benefit. They are the base of the C++ object implementation. I ask myself if that benefit isn't perhaps the only benefit of C++. The object oriented side of C++ does nothing, except inheritance, that any C compiler today can do. And if serious preprocessors were defined with global symbols then inheritance can also be implemented. What I am saying is that rather than C++, let us have a full preprocessor with typical mainframe abilities, and skip the rest. C was designed to be bare bones, enhanced (very successfully) with functions. The quantum jump should be a preprocessor and proper macro and language preprocessor such as the IBM assembler macro facility. The next quantum leap is not the poorly thought out ideas of C++. In creating an object based system, much thought has to go into the structure, and this is true whether C++ is inheritance and scope, easily controlled in other ways if C is used, employing run time inheritance and binding. I am getting suspicious that perhaps AT&T felt it was losing control of its brilliant child, "C" and needed to show that perhaps they were still in the lead. I suspect that since OOP was becoming more the rage that they jumped on the bandwagon. They used that to reestablish their leadership. The C++ authors wanted to become the next generation of venerated programmers, to be the next K & R. I am sorry, but as senator Benson put it, "they are no K&R". OOP was not invented by AT&T, it is a long established method for handling interrupt and interrupt driven systems. The resurgence of OOP came about with among other things the need to handle the dynamic world of dynamic objects such as in windowing systems and the like. OOP is a good discipline where applicable. It has many uses in the distributed processing world of the future. I hope that the readers will take a closer look at C++ and study some OOP systems implemented in C and realise that C++ is a farce, a joke being perpetrated on the data processing world. I am all for positive change, this isn't it. I am recommending to my company that C++ not be implemented. I note that there will be no ANSI C++, they have seen the light. I thank you for your patience, Simon Wheaton-Smith 2902 N. Manor Dr. West Phoenix, AZ 85014 You're welcome to my patience, but not to any support for your position. I wonder if K&R had any IBM mainframe internals experience? If not, perhaps we should make them rescind C? -- rlw Dear CUJ, Please allow me to introduce myself. My name is Chris Proctor. I'm an IBM mid-range systems contractor. I felt compelled to write you a letter to tell you why I would not be renewing my C Users Group subscription. I am relatively new to C programming and I was hoping that your magazine would provide me with helpful hints and programming tips that would help me become a better C programmer. Unfortunately, in most issues I found nothing that was beneficial to me. Please believe me when I tell you that I am not "knocking" your magazine at all. I'm sure that if I was more knowledgeable in C, your magazine would be very interesting. But, quite frankly I don't understand half of the articles in each issue. What I would like to see is an article or section of each issue dedicated to the basics of C, or at least programming tips that the layman can understand. I can't believe that I am the only one that has not renewed my subscription because the articles are "over my head". Perhaps, something like I have mentioned may even increase subscriptions just from people glancing through the C Users Journal on the magazine rack. I realize that you have to appeal to the masses and not the exceptions and if that's the case, I'll probably subscribe to the magazine when I feel that it would be of some use to me. You have an excellent magazine. Keep up the good work. Sincerely yours, Chris Proctor 21352 Avenida Ambiente El Toro, CA 92630 I too would like to see some quantum of good tutorial material in every issue, in addition to the more demanding copy. Unfortunately, we don't get very many well-written tutorial submissions. If my readership includes some willing but uninspired authors, here's your chance. Send us a concise but thorough tutorial on some aspect of C. We need more such submissions than we are currently receiving. -- rlw Dear Howard, I was pleased that my article, "The C Programmer's Reference: A Bibliography of Periodicals," appeared in print in your January, 1990 issue. However, I was dismayed to learn that I had inadvertently omitted a couple of worthy entries. These annotations, with the appropriate citations, are as follows: C Gazette (quarterly, $6.50/issue, $21.00/year) C Gazette, 1341 Ocean Avenue #257, Santa Monica, CA 90401. A "code-intensive" quarterly which thrives on printing lots of C code (and some C++). Specializes in MS-DOS and OS/2, but no UNIX. An in-depth publication aimed at intermediate and advanced C programmers. Few advertisements and few reviews. For programmers who are serious about their C code. Journal of C Language Translation ($235.00/year) Journal of C Language Translation, 2051 Swan's Neck Way, Reston, VA 22091. An academic quarterly which just recently commenced publication. Aimed at compiler writers and programmers who must implement the ANSI standard in language products. Covers extensions to the standard, such as implementation of numerical representation, etc. No advertisements and few reviews. An important resource for programmers in this narrow niche. I had compiled the original bibliography some time ago, and from the holdings of a corporate library. I assumed that the library's holdings were relatively complete, and I overlooked the two periodicals above. I hope that this letter will fill the gap. I regret it if anyone was offended, and I trust that this information will further assist readers of The C Users Journal in their language research. Sincerely, Harold C. Ogg Chicago State University The Paul and Emily Douglas Library Ninety-Fifth Street at King Drive Chicago, Illinois 60628-1598 (For those wondering, Howard is our editorial coordinator. I should let him respond to this letter, but he's buried somewhere under some manuscripts and pasteups.) I appreciate the information. In addition to his column for CUJ, Rex Jaeschke also writes a C column for DEC Professional -- not a "C magazine", but at least another C resource. If you regularly refer to a C-related information source we failed to include, please write and we'll mention it here in a future issue. -- rlw Dear Mr. Ward; I'm glad the C Users Journal is starting to publish articles on the Macintosh, its development environment, and its operating system. Keep 'em coming! Nice article by Allan Brown [Bruton] in the October '89 edition. True, the Macintosh toolbox does add some additional complexity, but once one becomes accustomed to it -- and it may take quite a bit of time becoming fluent in "toolboxese" -- one can be assured, though, that there is less likelihood of code obsolescence and greater possibilities for code portability among the various Macintosh hardware platforms and operating systems by following the development guidelines and using the toolbox calls for performing window manipulations. Anyway, I tried executing the code presented on page 99 (Listing 1), and the code as written does not draw a set of nested rectangles as promised at the beginning of the article. When one executes the code specified in Listing 1, nested triangles are drawn on the screen. To obtain nested rectangles the variable yb will have to initialized to read yb = 25; rather than yb = 300; as printed in the article. That's the only change necessary for having the Macintosh draw nested rectangles. Thanks again for printing an article of interest to programmers who program the Macintosh in C. Yours truly, Clifford J. Campo 123 Fennerton Road Paoli, PA 19310 Gee, you mean rectangles have four sides? Maybe I should spend more time watching Sesame Street with my son. Thanks for the correction, and thanks for noticing our Macintosh coverage. We've really worked hard to get those stories. -- rlw Dear Robert: I'd like to offer several comments to your "Publisher's Forum" in the August 1989 issue. I like the new glossier paper; I think it makes the pages easier to turn because there's less friction between them. Goodness knows, we readers don't want too much friction. (Truly, I do like it better.) I can't tell you what a relief it is to read that you're refusing to get involved in C puns. At least in your articles. Your advertisers more than make up for it. (Of course, it's not just CUJ advertisers...) Too bad X3J11 didn't outlaw C puns as part of the ANSI standard. Regarding swimsuits, etc.: I agree that would be out of place in CUJ. There's plenty available elsewhere. However, your comment, "Wouldn't you rather explore lex than sex?" leaves me concerned. Have you somehow arrived at the assumption that real programmers are so obsessed with digital high tech that they will forego sex? Of course not. How do you think we burn off all of that Jolt and pizza? Not at a keyboard surely! Speaking of sex and assumptions, and here I am finally being serious, there's a big one or two in your comment, "We've even considered running pictures of all the staff (especially the women since most of them are single and most of you are male).", namely that all male CUJ readers are straight. I assure you, it ain't so! About 10% of most any population is gay and lesbian, and while I haven't seen any polls to confirm that this is true of programmers, I have no reason to feel I should believe otherwise. So, if you were to do swimsuits, it would only be fair to include your female and male staff. Fair to your straight women readers too, don't forget them! CUJ is great, please keep it up (speaking of standards and high ones at that)! Sincerely Bill Lee 5132 106A Street Edmonton Alberta, CA T6H 2W7 What can I say? -- rlw To The C_Users Group, Concerning Numerical Software Tools in C. It is a fine book for those starting to program in C. Any book in your Advanced topic area, I as well as all others, assume that Advanced means just that -- Advanced! An advanced book would be like Numerical Recipes in C by Press et.al. from Cambridge University Press. You truly need to re-analyze what is considered advanced considering that more and more books actually treating advanced topics are coming out. In the past, few knew anything about C. Since it is now the #1 language of choice, advanced isn't the advanced of yesterday. The book which I'm sending back should be considered elementary to intermediate. Even though it was published in 1987, does not mean that it is advanced. Further, four routines of the most elementary type, does not in my view constitute "Tools". Tools to me are a compendium of primitives that one may use in developing one's own applications. This book falls way short of that. Again further, the price is outrageous for what one receives. Jerry Rice, PHD. 504 Eastland St. El Paso, Texas, 79907 In all truth, I haven't read this book. In fact there are more then a few books among the 100 or so that we carry that I haven't read. Except when I have personal knowledge of the book's contents, we rely upon publisher's descriptions when categorizing the book. -- rlw User Interface Language Eases Prototyping Vincent Guarna and James Krause This article is not available in electronic form. Using 'Screen Machine' Rick Knoblaugh Rick Knoblaugh is a Systems Engineer specializing in systems programming for PCs. He is the coauthor of Screen Machine, a screen design/prototyping/code generation utility. He may be reached at 15014 River Park Dr., Houston TX 77070. Prototypes and code generators can significantly reduce development costs. In this article I'll discuss a recent consulting project and show how the "Screen Machine" -- a prototyping tool which I am making available to other programmers as shareware -- assisted in prototyping, generating C code for the user interface, and documenting the system. The Application My project was a student grade tracking application for a high school. The software allows student names and grades to be scanned into a PC clone using an optical mark reader, a scanning device which reads forms which have been marked with a pencil. Student names and grades can also be manually entered or edited. The product enables teachers to maintain their grade books on a PC. Grade tracking and printing tasks, such as letters to parents, are all handled in a menu-driven environment. Thus, the application required menus, data entry screens and help screens. I began by planning the major components of the software, such as the scanner communications and the decoding of the scanned data. Next I needed to develop a user interface from which all program functions could be selected. For this phase the user interface prototyping software was invaluable. Benefits Of Prototyping In the past programmers who have developed interactive programs, have painstakingly designed the appearance of screen displays on paper and then written the code for these user screens. Today, many developers are using some type of screen prototyping software. Most prototyping tools permit screen design using a powerful screen editor. Screen editors make it much easier to manipulate blocks of data, to center screen data, and to experiment with color and other aspects of screen appearance. In addition to a screen editor, prototyping packages usually include some control facility that allows branches to various screens to depend on user input. This allows the developer to create the "look and feel" of a user interface before any code is written. Prototyping also lets the user become more involved in the design of the user interface. More importantly, it allows the programmer to be more creative and to develop an interface that makes sense. Some prototyping tools also provide code generation for the screen displays. Once the screen design is finalized, the program automatically generates the associated source code. Screen Machine Screen Machine runs under MS-DOS and consists of a screen editor/code generator, a mini-language for prototyping the flow of application screens, and a TSR screen capture program which allows any text mode screen to be imported into the screen editor. Screen Machine can generate source code for screens in your choice of C, BASIC, Turbo Pascal, 8086 assembler, and dbase. Screen Machine is limited to handling display portions of screens only; it does not handle data input. The prototyping module permits the input of single keystrokes, allowing screens to be displayed when the operator selects a menu option or presses a specific key. Designing Screens With SCREEN I experimented with the appearance of the grade tracking application screens using Screen Machine's screen editor and code generator, SCREEN.EXE. As with most applications, I started with the main menu (Figure 1). The SCREEN box drawing feature makes it easy to put borders around menus and other screens. Text can be centered on a given line of the screen or within the graphics character borders of a drawn box. You can even shift the entire screen left or right to aid in centering screen data and attributes. Other screen editor features include: inserting and deleting lines, copying and moving blocks, selection of color, reverse video, undo of last editing function, key stroke macros, and online help. I saved my designed application screens in Screen Machine screen data files. (Screens can be saved with or without attributes). If no color or reverse video is needed, the screens can be saved as ASCII text files. Prototyping The Interface Once the data files for all application screens are complete, the programmer develops an executable simulation of the application interface using the Screen Machine's mini-prototyping language module, SHOW.COM. The completed simulation will display the main menu, accept keystrokes, and based on these keystrokes, select other application screens for similar processing. The SHOW mini-language consists of display/keystroke input statements, case statements, and goto and gosub statements. The heart of these is the display/keystroke input statement, whose syntax is: Filespec [basekey max] [/Tn] [/An] [/Xn] Filespec names the screen data file to be displayed. (e.g. I saved my main menu screen data file in C:\GRADE\MAINMENU.SRN.) The basekey is optional and represents the lowest-valued key accepted as input from the user when the screen is displayed. The basekey is one of these: A specific key, enclosed in quotation marks (e.g. "1"). A decimal scan code value (unquoted) (e.g. 59 for the key). An unquoted asterisk (*), which is taken to mean "any key". The max cannot be specified unless basekey is specified; it is the highest-valued key accepted as input. If input from a given screen falls neatly within a range of keystrokes (e.g. if on my main menu only "1" to "9" were used, and not ), specifying basekey and max eliminates all unwanted keystrokes. The T switch specifies a time value in seconds -- useful for creating timed "slide shows". SHOW will display the screen data file and then wait n seconds (0-255) before displaying the next screen. The A switch displays a screen data file in a certain attribute. This is generally only used if you have not saved attributes in your screen data files. The X switch is the key on which a "getout" is performed. "n" is specified in the same manner as basekey and max, i.e. either a quoted character or an unquoted scan code. A "getout" is accepted as a valid key press and performs any pending return or else returns to the operating system. Case statements allow branches to other portions of a SHOW command file to depend upon keystrokes input via the display/keystroke input statement. The syntax for the case statement is: case [key] [range] [S: G: R:] [label name] If a keystroke matches key or falls within range, control is transferred to label name. If S: is present, the transfer is executed as a gosub, meaning the address of the next display/keystroke input statement is put onto the "stack" and control is transferred to the label. G: does a goto transfer to the label. R: returns to the label (similar to BASIC). The syntax for labels is the same as in MS-DOS batch files (i.e. a ":" followed by a label name). The grade tracking SHOW command file appears in Listing 1. The top of the command file displays the main menu which is stored in the Screen Machine screen data file, MAINMENU.SRN. The asterisk after the file name instructs the SHOW program to wait after displaying the main menu until some key is pressed. The /X indicates that if a 9 is pressed, the SHOW command file should terminate and return to MS-DOS. The case statements perform gosubs to other labels in the command file. For example, if the user presses a 6, SHOW will gosub to the label otherprint where the print options menu is displayed and processed. The strange looking NUL screen data file name followed by the case * G:top is necessary because the limited SHOW command set only allows unconditional branching to be initiated in case statements. Case statements can only be performed after a screen data file has been displayed by a display/keystroke input statement. The reserved screen data file NUL only satisfies the case statement by simulating a screen display and a key stroke entry. The asterisk indicates that if any key is pressed, a goto should be performed to the label top. After the appropriate gosub is processed from the main menu, control transfers back to the top of the command file. Generating The Source Code A SCREEN program configuration option allows you to select the language to be generated. When SCREEN generates C code, it declares a structure _scrn and defines a global array of structures of type _scrn (Listing 2). Notice that the array of structures is named with screen_ followed by the name of the screen data file to prevent naming conflicts. After including these statements in your program, you can either write a routine to display the arrays of structures, or include the routine supplied with Screen Machine in your program as in Listing 3. The routine uses the BIOS software interrupt 10h function 9 to display the arrays of structures. Function 9 writes a character and attribute at the current cursor position. The Microsoft C library function _settextposition is used to position the cursor. The function disp_screen is called passing the name of the array of structures to be displayed and a flag indicating whether the screen should be cleared prior to displaying the data. disp_screen clears the screen using the background color defined in the variable color_back_grnd. This should be set to the desired background color. Data Entry Because Screen Machine handles only display portions of screens, I used my own general-purpose data entry routines for those portions of the application where data entry was required. Screen Capture A third Screen Machine module CAPTURE.COM is a TSR program that allows text mode screen displays to be captured and stored on disk. This utility makes it easy to include application screens, complete with sample data in the users manual. CAPTURE takes over the shift print screen function (interrupt 5). When the program is invoked and becomes resident, command line options specify the file name under which screen data files are to be stored and whether or not attributes should be included in the screen data files. If attributes are not desired, screen data is stored in ASCII text files. Captured screens can also be used as input into the screen editor/code generator. This means that any text mode screen can be translated into source code which will display that screen in one or all five supported programming languages. This capability can be used when translating an application from one language to another or if you want to generate source code for screens created with a prototyping tool that doesn't support source code generation. Conclusion Screen Machine does several things satisfactorily. Its lack of support for input fields may preclude your using it for some applications. Certainly, if you need really detailed simulations of your programs such as sound effects and emulation of disk I/O, you should use a more full-featured commercial prototyping program. Also, if you require a graphics interface then Screen Machine will not help you. Figure 1 Grade Book Main Menu 1) Scan Grades 2) Edit/View Grades 3) Print Grade Book 4) Scan Names 5) Print Rosters 6) Other Print Functions 7) Set Teacher Information 8) Drop Lowest Grade 9) Exit For help, press < Alt > < H >. Listing 1 /*SHOW command file for grade tracking program. /* ---------------------------------------------- :top mainmenu.srn * /x"9" /*display main menu, accept any key, /*exit to dos if "9" case "1" s:scangrades /*gosub to display the appropriate screens case "2" s:editgrades case "3" s:printgrades case "4" s:scannames case "5" s:printrosters case "6" s:otherprint case "7" s:setteacher case "8" s:droplow case 35 s:mainhelp /* help /*When all gosubs return, branch back to top. You can only branch /*as part of a case statement and you can only have a case statement /*after display/keystroke input statement. Thus, the special NUL /*screen name can be used to branch anytime. nul /*special reserved display/keystroke input statement case * g:top /*branch back to top of command file /*----------------------------------------------- :scangrades scangrad.srn /*display scan grades screen and wait for a key case * r: /*return /*----------------------------------------------- :editgrades editgrad.srn * /x1 /*display edit/view grades screen and wait for /*a key, return to caller if esc (scan /* code 1) is pressed case 35 s:edithelp /*if (scan code 35) is pressed, go /*display the edit/view grade help nul case * g:editgrades /*go back to edit/view grade screen /*------------------------------------------------- :printgrades /*display print grades screen prtgrade.srn * case * r: /*----------------------------------------------- :scannames /*display scan names screen scanname.srn * case * r: /*----------------------------------------------- :printrosters /*display print rosters screen prtrost.srn * case * r: /*---------------------------------------------- :otherprint /*display other print options menu prtmenu.srn "1" "6" /x"6" /*accept only 1-6, return to caller if 6 case "1" s:report1 /*branch to report 1 screen case "2" s:report2 /*branch to report 2 screen case "3" s:report3 /*branch to report 3 screen case "4" s:report4 /*branch to report 4 screen case "5" s:report5 /*branch to report 5 screen nul case * g:otherprint /*------------------------------------------------ :report1 reportl.srn * case * r: /*------------------------------------------------ :report2 report2.srn * case * r: /*------------------------------------------------ :report 3 report3.srn* case * r: /*------------------------------------------------ :report4 report4.srn * case * r: /*------------------------------------------------ :report5 report5.srn * case * r: /*------------------------------------------------ :setteacher setteach.srn * /*display set teacher information screen case * r: /*------------------------------------------------ :droplow droplow.srn * /*display drop lowest grade screen case * r: /*------------------------------------------------ :edithelp edithelp.srn * /*display edit/view help screen and return to case * r: /*caller when any key is pressed /*------------------------------------------------ :mainhelp mmhelp.srn * /*display main menu help screen and return to case * r: /*caller when any key is pressed Listing 2 struct _scrn { char *chrs; /*pointer to screen text*/ char cw; /*column where text appears*/ char rw; /*row where text appears*/ char att; /*attribute in which text appears*/ }; struct _scrn screen_mainmenu[]={ Click Here for Figure Listing 3 /*various include files*/ #include #include #include #include #define FALSE 0 #define TRUE 1 #define VIDEO 0x10 /*software interrupt 0x10 */ #define WRITE_ATTR_CHAR 9 /*function 9 */ void disp_screen(struct _scrn *, unsigned short ); struct _scrn { char *chrs; char cw; char rw; char att; }; struct _scrn screen_mainmenu[]={ {"Š‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰œ",21,6,31}, {" ",21,7,31}, {" Grade Book Main Menu ",21,8,31}, {" ",21,9,31}, {" 1) Scan Grades ",21,10,31}, {" 2) Edit/View Grades ",21,11,31}, {" 3) Print Grade Book ",21,12,31}, {" 4) Scan Names ",21,13,31}, {" 5) Print Rosters ",21,14,31}, {" 6) Other Print Functions ",21,15,31}, {" 7) Set Teacher Information ",21,16,31}, {" 8) Drop Lowest Grade ",21,17,31}, {" 9) Exit ",21,18,31}, {" ",21,19,31}, {" For help, press . ",21,20,31}, {"…‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰‰x",21,21,31}, {"\0",0,0,0} }"; main() { disp_screen(screen_mainmenu, TRUE); /*clear screen and then display the screen defined by screen_mainmenu*/ } long color_back_grnd= 1; /*all screens will use a blue background*/ /*----------------------------------------------------------- disp_screen - Use ptr passed to array of structures containing &text; col; row; and attribute. Use BIOS int 10h function 9 to display the data. If cls_flag is TRUE, clear the screen before displaying the data. When clearing the screen, use the attribute defined in the variable color_back_grnd ------------------------------------------------------------*/ void disp_screen(p, cls_flag) struct _scrn *p; unsigned short cls_flag; { char wcol; char * wsptr; union REGS inregs, outregs; if (cls_flag) { _setbkcolor(color_back_grnd); _clearscreen(_GCLEARSCREEN); } inregs.h.ah = WRITE_ATTR_CHAR; /*print char and attribute*/ inregs.x.cx = 1; /*print 1 char*/ while ( *(p->chrs) ) { wsptr=p->chrs; /*get ptr to string*/ wcol=p->cw; inregs.h.bh = 0; /*video page 0*/ inregs.h.bl = p->att; /*attribute to use */ while (inregs.h.al = *wsptr++) /*char to print*/ { /*position the cursor*/ _settextposition( (short) p->rw, (short) wcol++); int86 ( VIDEO, &inregs, &outregs ); /*print with BIOS*/ } p++; } } Prototyping Experiences Brett Martensen Brett Martensen is a Senior Systems Consultant with SRI Strategic Resources, Inc. He specializes in tools and techniques, including CASE, prototyping and JAD, to develop database applications. Areas such as entity relationship data modeling are his forte. He has a M.Sc. in Computer Science (1976) from Queen's University (Kingston, Ontario). When developing a prototype, one is faced with reaching a maximum level of functionality across the maximum scope of the application, but within a minimum time frame. Two productivity tools help reach these conflicting goals: the CASE (Computer Aided Software Engineering) tool, which is the specification engine, and a DBMS (DataBase Management System), which is the application engine. I recently participated with a team to develop a prototype system for Canada Post Corporation. This prototype had to be robust enough to be used across the country, at a number of different user sites during a three-month trial. Thus, it had to be more functionally complete than would normally be expected of a prototype. Background A prototype is a miniature system which approximates the final system but provides only a subset of the application's scope and functionality. As such, a prototype comes with all the benefits associated with modeling. A model is easier and certainly less expensive to change than a real system. Prototyping permits developers to elicit, model and then capture user requirements for a system. Like the buildings on a movie set, a prototype must look real even though it is only a facade. On a movie set, certain buildings have rooms, some of which are furnished. Similarly some features in a prototype are fully implemented, while others remain as images only. There are four levels of functionality used when describing a prototype: Level Functionality one Screens only two Screens with field entry and edit, some controllable flow. three Level two plus Create, Retrieve, Update and Delete of data and Menu linking screens together. four Level three plus Integrity checking, a correctly structured database and some application specific algorithms working. Most prototypes end up as a mixture of these levels applied to different parts of the application. The prototype developer must be able to respecify the data model and quickly regenerate the database structure because a prototype is not a static model; it goes through a number of iterations. A typical prototype development iteration consists of analyzing the requirements (read documentation, conduct interviews); specifying (data model, functional specifications); designing (display layout, reports); developing (program, fill in tables of data); demonstrating and using; and finally, reviewing the design with users using Joint Application Design (JAD) techniques. The JAD is a meeting in which a consensus can be reached amongst the user population as to the system requirements. The results of the JAD provide the requirements for the start of the next iteration. Theoretically, this cycle should be repeated three times during a prototyping project. Ideally, each iteration of the prototype progresses closer to the specification for the final system. A rule of thumb is that sixty percent of the remaining requirements are captured in each cycle. After the second cycle, the system should be 84 percent correct and after the third cycle, 93+ percent correct (rather like golf where each shot gets you closer to the hole). At some point, however, diminishing returns make further iterations pointless. The Canada Post Prototype The Canada Post Corporation prototype was developed using E-R Designer from Chen and Associates, a CASE tool for data modeling to specify the data model; and ZIM from Sterling Software, Zanthe Systems Division, a powerful 4GL DBMS to develop the prototype. ZIM is a natural choice for prototyping projects. It directly implements an Entity-Relationship data model which it keeps in an Active and Integrated Data Dictionary. An entity-relationship database structure allows transferring the conceptual data model produced in the specification stage directly into the database data dictionary. An Active data dictionary is important because the programs can access all the metadata. An Integrated data dictionary stores display specification information as well as the database structure. An advantage of choosing E-R Designer is that the data model can be exported and used to create the ZIM database description without rekeyboarding. ZIM also runs code either in interpreted or compiled mode. Since prototype performance is not a consideration, the interpreted mode is used. This mode has a macro substitution capability which allows names of entities, relationships, displays, and ZIM routines to be substituted in the programs and resolved at run time. The data dictionary stores all these names, as well as segments of ZIM code for macro substitution. Thus, the data dictionary becomes the repository of the prototype specifications, data, displays and programs. Other tools such as WordPerfect and internally developed ZIM utilities increased productivity further. We used WordPerfect's line draw facility to draw boxes and rapidly design the displays. Then below each display, the position of all the fields and their prompts were specified. A short ZIM program analyzed this information and created the display form specifications in the ZIM data dictionary. One of ZIM's most useful prototyping features is the way in which display forms relate to entities in the database. If a field on a display form is given the same name as a field in an entity, then the ZIM command CHANGE Form FROM EntitySet fills in the fields of the current form from the fields of the same name in the current record of the entity set. Similarly, ADD EntitySet FROM Form creates a new record in the entity set from the data entered on the form. This functional relationship between display form fields and entity set fields is re-inforced by maintaining the metadata in the data dictionary for each entity set field. (See Table 1.) The other ZIM database information normally needed length, decimals, indexed and required are also stored in the data dictionary. Other useful attributes, such as default value, data mask and validation rules, exist for the fields in the display form. We developed a number of generic, table-driven modules for the prototype's functional side. The concept of table-driven software is extremely powerful: by simply modifying the data in the tables, the developer can rapidly change the way in which a module functions, what it operates on and how it appears to the user. For example, we developed a menu program which was totally table driven. By changing the tables, the routines executed when the menu items were chosen, the menu structure and menu functions were modifiable. With the table-driven approach, code can be reused. For example, only one modify routine can be applied to any entity set, and an enhancement made to a routine is universally available in the prototype system. Thus, the only nonreusable code is the application-specific algorithms, which are all linked in via the program name attribute attached to each field. For the Canada Post Corporation prototype, generic table-driven routines were developed for Menu Level 1 and 2 screens (slide show) Entityset Lister, which provided access to the functions of: Sort list Pick record Print list Find record Add record Modify record Delete record Help This list doesn't include "Reporting" since programming a general-purpose reporting module is difficult to perform using the table-driven approach. The Print list routine provided simple reports. More complex reports which needed to were hand-coded in ZIM. Reports that did not need to function were presented as Level one screens. There exist both a spreadsheet and business graphics package which take data from a ZIM database and allow for its manipulation, analysis and presentation. Given that the Canada Post Corporation application involved large quantities of statistical information, both these packages were linked into the prototype to assist the user in performing ad hoc inquiries and analysis of the data. These two packages were especially useful for the design and creation of new and existing reports to elicit user feedback. They added substantial functionality to the prototype with very little effort. The modules of the prototype were linked together in the calling sequence as shown in Figure 1. This prototype environment can be easily extended as new generic routines are developed. A table-driven ad hoc inquiry function could be built and linked in via the menu. Since the purpose of a prototype is to develop a working model with frequent feedback from the user population, it is appropriate to add a feature which captures users' ideas while they are fresh in their minds. Since "help" was available throughout the prototype, a suggestion box feature was added to the "help" module which allowed for on-line and in-context idea capturing. These ideas were collected, printed and analyzed during the JAD sessions. User feedback was very general on the first iteration: "We also need to be able to store information about our services," for example. In subsequent iterations, suggestions became more specific: "Use the word 'item' rather than 'product'." The final goods delivered from a prototyping project consist of the working prototype and a large quantity of documentation. The documentation covers the working prototype and user requirements that were not implemented in the prototype, such as complicated application-specific algorithms or feedback from the final JAD. Conclusion The combined use of CASE tools and 4GLs allows for greater productivity in prototype development. Using generic table-driven modules results in less software development. As a result, the workload shifts to the specification and analysis tasks. As in so many situations where technology is applied to the automation of the simpler tasks, fewer people are required, but the ones remaining need more expertise. The skills of business and functional analysis, data modeling, and design, become more important than programming. Reference Application Prototyping: A Requirements Definition Strategy for the 80's, Bernard H. Boar, Wiley-Interscience Publication, John Wiley & Sons, New York, 1984. Figure 1 Name The unique name given to this data element. Type Whether alpha-numeric, integer, character, etc. Prompt The full username to appear on displays. Column Header An abbreviated name to appear at the top of any list. Program Name The name of the ZIM routine to be executed whenever the value of the field is changed. Helptext A user-readable explanation of the data element, its possible values and purpose, if necessary. Figure 2 The UI2 Code Generator Paul Combellick Paul Combellick has a BS in Petroleum Engineering from the University of Alaska, Fairbanks. He is a contract programmer specializing in dBASE and C local area network (LAN) database applications and can be reached at (602) 280-2569 or via Compuserve at 70671,3054. As a Network DBMS applications developer, I recently undertook my first major C project. After having scoffed at UI2 for several months, I decided to give it a try as a fast prototyping tool. The resulting productivity gains exceeded my most optimistic expectations. I was able to produce about fifty percent of the 20,000 lines of C code in this application using UI2. I used UI2 with Vermont Views Screen library and Btrieve Record Manager to build a Network DBMS application. As I was new to all three tools, I spent a third of project time learning the new tools, with the remaining weeks actually producing generated and hand-written code. By the time I completed this single DBMS application, I had produced a set of templates and template libraries, using UI2 terminology, that would allow me to produce the next Network DBMS application in a few weeks, rather than in months if the code were entirely written by hand. Description Of UI2 UI Programmer Version Two, The Developer's Release, by Wallsoft is a programmable code generator targeted toward dBASE programmers, but is flexible enough to be used for many languages in the MS/PC-DOS environment including C. UI2 contains four major components. Screen Editor. The user can interactively draw screens and define screen entities including menus, background text, boxes, variables, fields. Templates. Code generation language files define how the code for a particular screen will be created. Template libraries. A library contains groups of functions written in UI2's generation language that are called by the templates during the generation of the target source code file. Code Generator. An interpreter that executes the template language to generate the target source code for a particular screen. UI2 is shipped with a set of templates and template function libraries for dBASE programmers. The C programmer will have to create his own templates before any non-trivial C code, other than simple menus, can be generated. Case Study For this application the client specified C for portability to OS/2 and UNIX. The target system is a Novell Network which supports the Btrieve database server. Btrieve also has versions for OS/2 and Intel-based PCs running UNIX. I chose Vermont Views screen library for its portability to UNIX and OS/2. This networked DBMS application contains several types of screen input/output: menu-only screens, data entry-only screens, combined menu and data entry screens, and reports. I created three templates, one for each of the first three screen types. These templates are actually source code files, written in the code generation language, that are executed by the code generator's interpreter. The templates describe how the code generator should handle a particular screen object to create the target source code language. I painted the screens by drawing boxes, menus and their actions, background text, and defining data entry fields. Through the screen editor the user may specify menu attributes such as hot keys and the name of a function to call when the menu is selected. The editor also allows specification of field attributes such as type, width, picture and provision for such user-designed features as begin-field and end-field trigger and validation functions. The programmer designs the template language functions to take advantage of the entities and their attributes defined in the screen editor. UI2 has an interactive mode as well as a command line mode that allows UI2's code generator to be accessed by make. A make response file can include the dependencies for generating target source code files from screen definitions, as well as compiling and linking the entire system. I can now modify a template file and make will call UI2 to regenerate all the affected source code modules. To modify a screen definition file -- either by adding text or a data element, or changing one of the many field, form, or menu attributes -- I don't edit the C source file; instead, I modify the screen definition file using the UI2 screen editor. I design screens so that I never modify the UI2 generated C code files except through the UI2 screen editor. UI2 is not limited to any particular coding style or third-party library and is adaptable to many different compilers. UI2 was used on this project to generate 40 screens and about 10,000 lines of C code. The code generation language syntax is very dBASE-like and the learning period was brief. UI2 Strengths This type of code generator performs very well on repetitive tasks such as building screens. I was able to build all the screens -- both menu and data entry -- entirely with UI2. After learning Vermont Views, Btrieve, UI2 and building templates, I will probably be able to reproduce 40 new screens for a new application in a few days. More importantly, the generated C code will be free of syntax errors and errant pointers. This bug-free code is at least as important as the productivity in creating the code. Once the templates are debugged, future screens will be virtually free of syntax errors. On the very first project, I used UI2 to boost productivity significantly, despite a learning period to become familiar with a new tool. Limitations In light of the fact that UI2 was designed with the dBASE programmer in mind, it lacks a couple of features for the C programmer. The most obvious feature missing is a full-featured dictionary that supports C data types, including structures and scoping concepts, and general data file schema beyond the dBASE file support. However, I was able to work around most of the data dictionary limitations by creating a hidden box in each screen. The box, made up mostly of #includes and external declarations, contained code for the generator to insert literally into the generated C source code. Conclusion I am quite satisfied with UI2. I have created templates for my non-programmer partner to fast prototype systems for prospective clients in order to illustrate what a proposed system may look like. I believe that UI2 will boost my coding and debugging productivity by factors of five to ten in the area of screen generation and maintenance. On future projects I expect to realize tremendous productivity gains now that I am familiar with this tool and have created a set of templates and template libraries to create code that utilizes Vermont Views Screen Library. Listing 1 The fragment of the template in Listing 1 expands to produce the C code in Listing 2. /*********************** define the form ***************************/ <> /* define a form */ {menuname}_dfmp = fm_def( {formbox.row}, {formbox.col}, {formbox.height}, {formbox.width}, LNORMAL, BDR_NULLP ); /* define boxes around form items ****/ <> /*********** define background text */ <> sfm_help( "*DATA HELP" , {menuname}_dfmp ); /* define form help keyword */ <> /******* define form data fields *********/ <> Listing 2 /*********************** define the form ***************************/ /* define a form */ CUG_dfmp = fm_def( 0, 0, 21,80, LNORMAL, BDR_NULLP ); /* define boxes around form items ****/ bg_boxdef( 0,0,21,80,LNORMAL,BDR_SPACEP,CUG_dfmp); bg_boxdef( 5,14,11,52,LNORMAL,BDR_DLNP,CUG_dfmp); /*********** define background text */ bg_txtdef( 1, 28, "C USER'S GROUP UI2 DEMO", LNORMAL, CUG_dfmp); bg_txtdef( 2, 28, " ", LNORMAL, CUG_dfmp); bg_boxdef( 5,14,11,52,LNORMAL,BDR_DLNP, CUG_dfmp); bg_txtdef( 7, 19, "Name : [ ]", LNORMAL, CUG_dfmp); bg_txtdef( 8, 19, "Address : [ ]", LNORMAL, CUG_dfmp); bg_txtdef( 9, 19, "City : [ ]", LNORMAL, CUG_dfmp); bg_txtdef( 10, 19, "State : [ ] Zip : [ - ]", LNORMAL, CUG_dfmp); bg_txtdef( 12, 19, "Phone : [ ]", LNORMAL, CUG_dfmp); bg_txtdef( 13, 19, "Fax : [ ]", LNORMAL, CUG_dfmp); sfm_help( "*DATA HELP" , CUG_dfmp ); /* define form help keyword */ /******* define form data fields *********/ CUG_fld1 = fld_def( 7,33, NULLP , FADJACENT , "!!!!!!!!!!!!!!!!!!!!!!!!!", F_STRING , (PTR) name, CUG_dfmp ); CUG_fld2 = fld_def( 8,33, NULLP , FADJACENT , "XXXXXXXXXXXXXXXXXXXXXXXXX" , F_STRING , (PTR) address, CUG_dfmp ); CUG_fld3 = fld_def( 9,33, NULLP , FADJACENT , "XXXXXXXXXXXXXXXXXXXXXXXXX", F_STRING , (PTR) city, CUG_dfmp ); CUG_fld4 = fld_def( 10,33, NULLP , FADJACENT , "!!", F_STRING , (PTR) state, CUG_dfmp ); CUG_fld5 = fld_def( 10,48, NULLP , FADJACENT , "UUUUU-UUUU", F_STRING , (PTR) zip, CUG_dfmp ); CUG_fld6 = fld_def( 12,33, NULLP , FADJACENT , "(UUU)UUU-UUUU", F_STRING , (PTR) phone, CUG_dfmp ); CUG_fld7 = fld_def( 13,33, NULLP , FADJACENT , "(UUU)UUU-UUUU", F_STRING , (PTR) fax, CUG_dfmp ); MEL: A Metalanguage Processor George Crews George M. Crews received his bachelors in General Engineering from the University of Nevada at Las Vegas, and his masters in Engineering Science from the University of Tennessee at Knoxville. He is a "generalist" with over 15 years experience in mechanical and software engineering design and analysis. He may be contacted at 109 Ashland Lane, Oak Ridge, TN 37830 (615) 481-0414. As a mechanical engineer, my experience with analysis programs falls in the areas of structural stress, fluid dynamics, heat conduction, and thermal/hydraulic system simulation. Such programs present the technical software developer with a number of unique problems, not least of which is providing a user-friendly interface. Though program users tend to be computer literate, input data can often be voluminous and tedious to prepare; the typical user may make many runs with only slight modifications as design optimization is often accomplished by repeated analysis. Both input and output must be stored and presented in a manner that allows independent verification and validation. Finally, the information output from one program may be required as input by another. Another big headache is that modern (i.e., graphical) user interfaces tend to be hardware or system-software specific. A good universal interface would free the developer from the nuances of different machines and operating systems, while at the same time representing a standard that machine-specific routines can work with. MEL is my solution for making such technical programs more user-friendly and modularized. MEL (for MEtaLanguage data processor) is a set of input/output utilities that provides a standard interface between the program and the user. It can translate input data written in "pseudo-English" (Example 1) making the data available to the program as variables (Example 2). It can also translate program variables (Example 3) into pseudo-English (Example 4). Effort was made to provide data objects that could be easily incorporated into almost any engineering analysis program (Example 5). The pseudo-English look of MEL means that I/O will be more readable and comprehensible to the user (or checker). Secondly, MEL is object oriented in that it provides a structured and encapsulated I/O interface. Thus, development time will be reduced and future changes can be made to the program more easily. Thirdly, MEL's grammar is simple and unambiguous, with both input and output formats identical so that output from one program may serve directly as input to another. Finally, MEL can read and write data directly to a file so that a permanent record of a run and its results are available. Description In MEL the smallest unit of pseudo-English I/O is called a "descriptor." Its purpose is to describe something, either data or a command, to a program. The general format for descriptors is much like function calls in a typical programming language. An I/O unit consists of a descriptor name (somewhat like a function name), followed by a parameter list, followed by an end-of-unit symbol (the semi-colon). For example, consider the following MEL descriptor, which could be used as part of the input to a piping network analysis program: pipe, length = 100 (ft), diameter = 6 (in); This is a pipe descriptor whose parameters are length and diameter. The values assigned to these parameters would be 100 and 6, and in units of feet and inches, respectively. Although the tokens (names and parameters) making up descriptors are customized by the developer for each individual application program, the above grammar remains the same for all programs using MEL. (See Example 1 and Example 4.) MEL's format was chosen for its simplicity, while allowing for as much flexibility as possible without introducing ambiguity. In MEL, tokens may be abbreviated as long as they remain uniquely identifiable. MEL assumes a default parameter order if parameter names are missing. Comments may be included by enclosing them in double quotes; parameter values may be labeled as "unknown," etc. These format choices are designed to make programs incorporating MEL as convenient to the user as possible. Incorporating MEL In order to incorporate MEL into one of your own programs, you must customize the mel.h header file to be included in your application source code file. First create a "dictionary" for both input and output that defines the proper spelling, number, and types (integer, array, etc.) of data associated with each descriptor and parameter. (Note that by simply changing spellings in the dictionary you could go from pseudo-English to "pseudo-French" or some other "pseudo-language.") The task of defining dictionaries has been made as painless as possible by providing complete instructions and an example program on the MEL diskette available through the CUG library. (The diskette contains MEL source code, header file, documentation and instructions, an example program, and a conversion factor routine. Since a listing of all MEL routines would run over 50 pages, a complete listing has not been included with this article.) You will need to prepare documentation for the user, defining the dictionaries and explaining what the tokens mean. To obtain data from a descriptor, you must first read it and then extract the data (see Example 2). An example of outputing data is shown in Example 3. Allowing the user to input data with different units requires conversion to internal units (ASTM, 1982). Included on the MEL diskette is a routine that can convert more than 150 different units. Additional units and conversion factors can easily be added to the source code. How MEL Was Developed An early decision was to write MEL in C. Fortran is the traditional language for scientific programs; however, engineers like myself are beginning to realize that there is more to technical software development than simply correctly coding a complex algorithm. ANSI C has a number of significant non-numerical advantages over Fortran (Kempf, 1987). C allows for more flexible structured programing and data encapsulation techniques to be applied (also see Jeffery, 1989). C has more operators and program control constructs than Fortran. C allows indirection (pointers) where Fortran does not. C more easily interfaces to existing system software since much of this software is itself written in C. Also, C is a popular language for unconventional computer architectures such as parallel processors (Lusk, 1987) and neural networks. Let me also mention some of C's shortcomings, which are related to its relative naivete for scientific purposes. Dynamic array dimensioning in C is convoluted (Press, 1988). C does not have the numerical library that Fortran does. And finally, C does not allow operator overloading for data structures (complex numbers for example) nor does it have an exponentiation operator. However, I do not think these deficiencies are difficult to overcome. Partly as an experiment to form my own opinion about OOP, the design of MEL incorporates the object-oriented paradigm. I chose to make use of C's preprocessor to restrict the visibility of public type, function, and data declarations to just those objects that the application program may need at a certain place (see Example 5). (The private type, function, and variable data needed by the MEL routines themselves are not shown in the example and are hidden from your program by other defined/undefined manifest constants.) For another approach refer to the article by Jeffery. Summary And Future Enhancement Software engineering is rapidly evolving and everyone seems to have his or her own ideas about what makes a good user-interface. I believe MEL is a practical answer to the spectrum of interface problems confronting the developer and user of complex technical programs. Some may criticize MEL for its verbosity (as compared to Fortran's fixed field format), the time a user must spend learning to use MEL (versus a more interactive interface), and the somewhat clumsy way objects must be (or at least, were) encoded in C. These points are legitimate and are inherent in MEL's design. No design can be all things to all people. The next steps in MEL's evolution might be incorporating it into a language sensitive editor, a graphical output post-processor, and perhaps later, into an expert system shell specialized for the type of analysis being performed. Bibliography George M. Crews, "HAPN--A Hydraulic Analysis of Piping Networks Program," Masters Thesis in Engineering Science, University of Tennessee, Knoxville, 1989. A portion of this thesis describes MEL and how it was developed and used for a specific analysis program. David Jeffery, "Object-Oriented Programming in ANSI C," Computer Language Magazine, February, 1989. This article discusses the object-oriented paradigm and a way to implement it in C. James Kempf, Numerical Software Tools in C, Prentice-Hall, Inc., 1987. This book contains an introduction to both numerical programming and C. The emphasis of the text is on creating small routines that can be used as building blocks for larger programs. Possible shortcomings are its lack of data hiding and that it treats doubly dimensioned arrays statically rather than dynamically. Ewing Lusk, Overbeek, et al., Portable Programs for Parallel Processors, Holt, Reinhart and Winston, Inc., 1987. This book describes a set of C tools for use on a broad range of parallel machines. William H. Press, Flannery, et al., Numerical Recipes in C, Cambridge University Press, 1988. Based on an earlier Fortran edition, this is a great cookbook giving a wide range of oven-tested recipes for the numerical gourmet. It shows the correct way to handle multidimensioned arrays (dynamically). A complaint sometimes heard is that a few of the algorithms are getting obsolete due to rapid advances in numerical techniques being made. ASTM E 380-82 Standard for Metric Practice, American Society for Testing Materials, 1982. This standard contains many useful conversion factors between English and metric units. Listing 1 Example 1. An Example of MEL Input for a Hydraulic Analysis Program. (Note that tokens will be unique to each application.) title, 'Example Problem Illustrating MEL'; fluid, "water" density = 62.4 (lbm/ft3), viscosity = 1 (cp); node, 1, pressure = 8.67 (psi); "20 ft of water" branch, 100, from_node = 1, to_node = 2; pipe, length = 100 (ft), id = 6 (in), material = steel; end_of_branch; node, 2, pressure = 6.5 (psi); "15 ft of water" next; Listing 2 Example 2. Example of Obtaining Data From a MEL Descriptor: Descriptor: pipe, length = 100 (ft), diameter = 6 (in); Code fragment: double pipe_length, diameter; union meli_param_data data; /* see Example 5. */ char units[MAX_STRING_LEN+1]; int array_len; int unknown_flag; meli(); /* reads descriptor */ meli_data("length", &data, units, &array_len, &unknown_flag); /* gets pipe length */ pipe_length = data.real; /* will equal 100 */ meli_data("diameter", &data, units, &array_len, &unknown_flag); /* gets pipe diameter */ diameter = data.real; /* will equal 6 */ /* note that units, array_len, and unknown_flag are not considered (used). */ Listing 3 Example 3. Example of Outputting a MEL descriptor: Code Fragment: double pipe_length = 100, diameter = 6; union melo_param_data data; /* see Example 5. */ char length_units[] = "ft"; char diameter_units[] = "in"; int array_len = 0; int unknown_flag = 0; melo_init("pipe"); /* initialize */ /* get data ready to output: */ data.real = pipe_length; melo_data("length", &data, length_units, array_len, unknown_flag); data.real = diameter; melo_data("diameter", &data, diameter_units, array_len, unknown_flag); melo(); /* translates data into string */ Descriptor: pipe, length = 100 (ft), diameter = 6 (in); Listing 4 Example 4. An Example of Output Generated by a Hydraulic Analysis Program using MEL. (From the input data given in Example 1.) program, name = 'HAPN - Hydraulic Analysis of Piping Networks', problem_title = 'Example Problem Illustrating MEL'; message, text = 'Date: Thu Jul 13 09:02:11 1989'; message, text = 'Input filename: input'; equations, node = 0, loop = 0, iterations = 7; branch, number = 100, type = 'independent_branch', flow_rate = 436238 (lbm/h), flow_change = -6.20476e-007 (%), flow_dp = 2.17 (psi), elevation_dp = 0 (psi); component, branch_number = 100, component_number = 0, type = 'pipe', resistance = 4.95228 (Pa*s2/kg2), change_resistance = -1.24095e-008 (%), pressure_drop = 2.17 (psi); node, number = 1, pressure = 8.67 (psi); node, number = 2, pressure = 6.5 (psi); next; Listing 5 Example 5. Public Interface Between MEL and Any Application Program Using It. (Excerpted from mel.h header file.) /* if using MEL for input (#define MEL_INPUT), then must define the MEL input data object: */ #ifdef MEL_INPUT /* firstly, define input constants (all must be CUSTOMIZED for specific application program): */ #define MELI_MAX_DESCRIP_STR_LEN 256 /* maximum number of characters in any input descriptor string. */ #define MELI_MAX_PARAMS 6 /* maximum number of parameters for any descriptor (min num = 1). */ #define MELI_MAX_PARAM_STR_LEN 80 #define MELI_MAX_PARAM_ARRAY_STR_LEN 1 /* largest allowable parameter string lengths (min size = 1) */ #define MELI_MAX_PARAM_INT_ARRAY_LEN 1 #define MELI_MAX_PARAM_REAL_ARRAY_LEN 1 #define MELI_MAX_PARAM_STR_ARRAY_LEN 1 /* maximum number of elements in parameter data arrays (min = 1). */ #define MELI_UNITS_STR_LEN 80 /* maximum length of units associated with any param (min = 1) */ /* secondly, define input data structures: */ union meli_param_data { int integer; /* also holds boolean type */ double real; char string[MELI_MAX_PARAM_STR_LEN+1]; int integer_array [MELI_MAX_PARAM_INT_ARRAY_LEN]; double real_array[MELI_MAX_PARAM_REAL_ARRAY_LEN]; char string_array [MELI_MAX_PARAM_STR_ARRAY_LEN] [MELI_MAX_PARAM_ARRAY_STR_LEN+1]; }; /* this is used for input parameter data. it may either be an integer, real, string, array of integers, array of reals, or an array of strings. (to save space a union was used.) */ /* thirdly, define input variables: */ char meli_descriptor_string[MELI_MAX_DESCRIP_STR_LEN+1]; /* global storage for the input descriptor string. */ /* lastly, define input functions (typically they return 0 if no error encountered, else some nonzero error code): */ int meli_file(FILE *meli_file_handle); /* read a descriptor string from the input stream and call meli(). also, put copy of string read into meli_descriptor_string. */ int meli(void); /* translate meli_descriptor_string and put information into a private data structure (meli_datum). */ char *meli_descrip_type (void); /* return pointer to name of type of descriptor read by meli(). */ int meli_num_params(void); /* return number of parameters read by meli(). */ int meli_param(int param_num, char *param, union meli_param_data *data, char *units, int *array_len, int *unknown_flag); /* fill arguement list with param_num'th parameter read by meli(). (start with param_num = 0.) */ int meli_data(char *param, union meli_param_data *data, char *units, int *array_len, int *unknown_flag); /* see if *param was input. if it was, then fill argument list with data from meli_datum. */ #endif /* MEL_INPUT */ /* if using MEL for output, must define the MEL output data object: */ #ifdef MEL_OUTPUT /* firstly, define output constants (all must be CUSTOMIZED): */ #define MELO_MAX_DESCRIP_STR_LEN 256 /* how many characters can be in an output descriptor string? */ #define MELO_MAX_PARAMS 6 /* maximum number of parameters for any descriptor. */ #define MELO_MAX_PARAM_STR_LEN 80 #define MELO_MAX_PARAM_ARRAY_STR_LEN 1 /* largest allowable parameter string length. */ #define MELO_MAX_PARAM_INT_ARRAY_LEN 1 #define MELO_MAX_PARAM_REAL_ARRAY_LEN 1 #define MELO_MAX_PARAM_STR_ARRAY_LEN 1 /* maximum number of elements in array of parameter data. */ #define MELO_UNITS_STR_LEN 80 /* maximum string length of any units associated with a param. */ /* secondly, define output data structures: */ union melo_param_data { int integer; double real; char string[MELO_MAX_PARAM_STR_LEN+1]; int integer_array[MELO_MAX_PARAM_INT_ARRAY_LEN]; double real_array[MELO_MAX_PARAM_REAL_ARRAY_LEN]; char string_array[MELO_MAX_PARAM_STR_ARRAY_LEN] [MELO_MAX_PARAM_ARRAY_STR_LEN+1]; }; /* this is for output parameter data. it may either be an integer, real, string, array of integers, array of reals, or an array of strings. (to save space a union was used.) */ /* thirdly, define output variables: */ char melo_descriptor_string[MELO_MAX_DESCRIP_STR_LEN+1]; /* global storage for the output descriptor string. */ /* lastly, define output functions (typically return 0 if no error): */ int melo_init(char *descrip_type); /* initialize private data structure (melo_datum) to accept parameter data from following functions. output descriptor type will be descrip_type. returns 0 if no errors were encountered. */ int melo_data(char *param, union melo_param_data *data, char *units, int array_len, int unknown_flag); /* put data for parameter *param into the proper place in melo_datum. returns zero if no errors were encountered. */ void melo(int melo_verbose_flag); /* takes the information in melo_datum and translates it into melo_descriptor_string. user must set melo_verbose_flag = 1 to make output as readable as possible, set it equal to zero to make output as terse as possible (and still remain in MEL format). */ int melo_file(FILE *melo_file_handle, int melo_verbose_flag); /* take the information in melo_datum, translate it into melo_descriptor string, and output it to file. */ #endif /* MEL_OUTPUT */ /* now define data objects common to both input and output: */ /* if an error occurs, MEL will try and tell you what happened. so define required error handling information: */ #define MEL_MAX_ERR_MSG_LEN 80 struct mel_errors { enum { /* which error occured? */ mel_no_err, mel_read_err, mel_write_err, mel_end_of_file_err, mel_end_of_data_err, mel_syntax_err, mel_unknown_descrip_name_err, mel_unknown_param_name_err, mel_missing_param_name_err, mel_param_data_err, mel_missing_paren_err, mel_too_many_param_err, mel_missing_bracket_err, } type; int, start_line; /* on which lines did err occur? */ int end_line; /* (meaningful for input only.) */ char msg[MEL_MAX_ERR_MSG_LEN+1]; /* additional info describing err */ } mel_err; /* (not same as messages below). */ #define MEL_MAX_NUM_ERR_MESSAGES 13 #ifdef MEL-INIT /* the following describes each type of enumerated error: */ char mel_err_msg[MEL_MAX_NUM_ERR_MESSAGES] [MEL_MAX_ERR_MSG_LEN+1] ={"No errors encountered", "Can't read file", "Can't write file", "Unexpected end of file encountered", "End of input data encountered", "Descriptor/parameter syntax error", "Unknown descriptor name", "Unknown parameter name", "A (or another) parameter name was expected but is " "missing", "Unable to read parameter value(s) for this " "descriptor", "Missing right parenthesis while reading units", "Too many (or duplicate) parameters given for this " "descriptor", "Missing brackets around array data"}; #else extern char mel_err_msg[MEL_MAX_NUM_ERR_MESSAGES] [MEL_MAX_ERR_MSG_LEN+1]; #endif /* MEL_INIT */ Object-Oriented Programming As A Programming Style Eric White Eric White is a software engineer at Advanced Programming Institute, Ltd. He is working on a character-based version of XVT. XVT is a common programming interface with implementations for various window systems, including Macintosh, Microsoft Windows, Presentation Manager, OSF/Motif, and character based on UNIX and MS-DOS. He can be reached at API at (303) 443-4223. Object-oriented programming is a programming style that can be used in many languages, including C and C++. Some programmers think that C++ gives them the ability to do object-oriented programming. This isn't accurate -- C programmers can already do object-oriented programming. I will demonstrate by showing two identically structured object-oriented programs, one in C and the other in C++. Even though one can do object-oriented programming in C, C++ offers several advantages: C++ supplies syntactic support for object-oriented programming and C++ provides type checking where not possible in C. I am assuming the reader has already read one of the numerous magazine articles that introduce object-oriented programming. A good article is "Mapping Object Oriented Concepts Into C++ Language Facilities", CUJ July '89 by Tsvi BarDavid. If you already know C, an example of object-oriented programming in C can clarify exactly what is goes on in object-oriented programming. Once you understand the C example, the identical example in C++ can make learning C++ easier. You can even imagine how the code generated by a C++ translator looks. The Example I'll develop the comparison using a graphical application that could be the beginnings of a drawing program such as Mac Draw. This example is constructed with four classes of objects: graph_obj, circle, square, and double_circle. Three instructions can be given to any one of these objects: init, which takes as arguments the initial position and size of the object. init initializes the object, then draws it. move, which draws the object in black, modifies the position, then draws it in white. move takes a change in the y and x coordinates as arguments. draw, used by init and move. draw takes a color as an argument. The Listings Listing 1 is the pseudo-code for the example. The code in Listing 2 (obj.h) and Listing 3 (obj.c) facilitates object-oriented programming in C, allowing the creation of classes, methods, objects, and implementing inheritance. Listing 4 (drawc.c) and Listing 5 (drawcxx.cxx) are two examples of object-oriented code in C and C++ respectively. They perform identically. In the pseudo-code, you can see: We derive classes circle and square from class graph_obj. We derive class double_circle from class circle. All classes inherit the method move from class graph_obj. If method move needs to be invoked for an object of class circle, then method move of class graph_obj is actually the function called. We are able to reuse the move method for every class in this example. Class double_circle inherits the method init from class circle. Class double_circle overrides the method draw from class circle. If method draw needs to be invoked for an object of class double_circle, then the method is not inherited from the super-class. For portability, I isolate the graphics functions in a utility module. Listing 6 (utility.h) is the interface to the utility module. Listing 7 (utility.c) contains fatal() and the graphical functions. The utility module is compiled and linked with either the C or C++ code. The isolation also makes it easier to compare the two object-oriented examples. Object-Oriented Programming In C This system implements classes, methods, objects, inheritance, and messages. The entire module that facilitates object-oriented programming is less than 90 lines of code. I'll start with a simple data abstraction mechanism, then develop it into a system that supports classes, inheritance, and messages. The most natural means of creating an object and associating methods with it is to put pointers to the methods (pointers to functions) in a structure along with the data. A structure for an instance of a circle might look like this: struct { int y; int x; int radius; void (*init)(); void (*draw)(); void (*move)(); } circle; This implements an object that knows how to initialize itself, draw itself, and move itself. The implementation could vary for different types (such as a double circle). However, we might get tired of setting up the methods every time we create a new instance of a circle. A solution is to design another structure (called a class) that contains the pointers to the functions, and place only a pointer to the class in each object. With this technique we may create a class once, then create several objects and have them point to that class. To make the class structure more generic, we define an array of pointers to functions, and by convention, define the methods as an index into this array. The code now looks like /* defines for methods */ #define INIT 0 #define DRAW 1 #define MOVE 2 struct class { int nbr_methods; void (**method)(); }; typedef struct class CLASS; struct { CLASS *class; int y; int x; int radius; } circle; When creating a class, we need to initialize the array of pointers to functions after allocating memory for it. If the method is implemented in the class itself, then the pointer is set to the function address. If the method is inherited from the super-class, then the pointer is loaded from the super-class. To make an object more generic, we'll take the definition of the data out of the object and replace it with a pointer to the data. Space for the data is allocated when the object is created and freed when the object is no longer needed. Listing 2 contains the final definitions of structures for class and object. Classes To define a class: Define a structure to hold the information about the class. (Listing 6, lines 15-18) Write the methods (the functions associated with the class). An example is the DRAW method for class circle. (Listing 6, lines 69-81) Declare a structure of type class. (Listing 6, line 143) Call new_class(), which loads the pointers to the inherited methods from the super-class. It also saves the size of memory needed for each object in the class. (Listing 6, line 160) Call reg_method() to register each method that we want to implement in the class being created. Registering a method means storing a pointer to a function in the array of pointers to functions. reg_method() shouldn't be called for methods inherited from the super-class. (Listing 6, lines 161-162) Methods A method is a function written specifically to go with the class. In this example, methods don't return a value. All methods should be aware that obj->data is a pointer to the data allocated on the heap. For a particular class, this data is of an assumed structure type. By casting obj->data to a pointer to a structure, the method can access the object data correctly. All methods receive the argument arg_ptr, which can be used with the macro va_arg() if there are arguments to the method. See your documentation on stdarg.h. Objects The structure that holds what we need to know about an object is: typedef struct { void *data; CLASS *class; } OBJECT; To create and use an object: Declare a structure of type OBJECT. (Listing 6, line 148) Call the function new_object(), which registers a class with the object and allocates memory for the object. (Listing 6, line 174) Send messages to the object. With the graphical objects in the example, the first message that we want to send is the INIT message. (Listing 6, line 175). After that, we can send MOVE or DRAW messages. (Listing 6, line 186) When done with the object, we call free_object (), which frees the allocated memory. (Listing 6, line 191) Inheritance Inheritance of methods is demonstrated here. circle inherits MOVE from class graph_obj. double_circle inherits INIT and MOVE from its super-classes. I implement inheritance of data structures by having a sub-class allocate more memory than the super-class. The sub-class data consists of the parent-class data followed by the data specific to the subclass. Messages There is a distinction between a message and a method. A message gets sent to an object, and then something decides which method to invoke. Invoking a method means that the function that is part of the class is called. In C++, the translator decides which method to invoke. In the system implemented in C, the function message() (Listing 3) decides, based on the class of the object. Summary Of OOP In C One disadvantage of doing object-oriented programming in C is that there is no function prototyping. We have no idea what the arguments to a method are when we declare the pointers to functions in the class structure. Programmers are responsible for sending the correct parameters to a message. Another disadvantage is that when writing methods, the programmer must access the data in the object correctly. The pointer to the data in the object structure must be cast as a pointer to the correct structure type. Object-Oriented Programming In C++ The C++ example also demonstrates classes, methods, objects, inheritance, and messages. I'll explain a small subset of the syntax of C++, only what is essential to do object-oriented programming. There are many features of C++ that have nothing to do with object-oriented programming, and the object-oriented programming part of C++ is elaborate, with useful but nonessential features. The subset is: Definition of a class, with and without a super-class. Definition of a method. Declaration of an object. Sending a message to an object. Classes The three essential pieces of a class are: The data structure of the class. The super-class if there is one. The methods. The definition of a class in C++ looks like: class graph_obj { public: int y; int x; void init(int y, int x); void move(int y, int x); virtual void draw(int color){}; }; y and x are the data that will be contained in an object of class graph_obj. To define methods, you put the function prototype for the methods in the definition of the class. The class graph_obj doesn't have a super-class. When defining a class where there is a super-class, you follow the name of the class by a colon (:), the keyword public, and the name of the super-class. For example: class circle : public graph_obj { public: int radius; void init(int y, int x, int radius); void draw(int color); }; Members of a class may be private or public. For simplicity's sake, all members of all classes in this example are public. I'm not attempting to do data-hiding in this example. Hiding data is a separate (and important) issue, but is beyond the scope of this article. The keyword public before the name of the super-class means that all the public members of the super-class are public members of the sub-class. Methods The definition of a method looks similar to that of a function. To define the name of the function, you follow the class name with the scope resolution operator :: and the name of the method. For example, the draw method for class circle would look like this: void circle::draw(int color) { /* code to draw a circle */ ... } Here is an important note about coding a method. A hidden argument to every method is the object. When a method gets invoked for a particular object, by definition you get access to that object. You can access the members of that object just by using the names of the members. Methods are invoked much as functions are called in C. Sometimes, when writing code for a method, we want to force a method to be invoked for a super-class, and the class for which we are writing the method has a method of the same name as the one in the super-class that we want to invoke. In this case, we can use the scope resolution operator (::) and force the method to be invoked for the super-class. For the init method for class circle, to invoke the init method for class graph_obj, we specify the name of the class, followed by the scope resolution operator, followed by the name of the method. Sometimes the method to invoke at run time can't be determined because a particular section of code could be operating on many types of objects. In C++, code such as this must be operating on objects of a certain class, or of a sub-class of that class. If you declare a method of a class highest in the class hierarchy virtual, C++ will wait until run time to make the decision of which method to invoke, and will invoke the correct method for the object being operated on. To do this, C++ puts something in the object that indicates which class it is. Resolution of the method to invoke at run time is called late binding. This is useful when you send messages to pointers to objects, where the pointer could point to one of several classes of objects. It's also useful in a method that serves a class and its subclasses. draw is virtual because the method move (which uses the method draw) in class graph_obj also serves classes circle, double_circle and square. In C++, each class can have two special methods: the constructor and destructor. Essentially the constructor is called automatically when an object comes into scope, and the destructor is called when an object goes out of scope. For example, if you declare an automatic object at the start of a function, the constructor is called at the time of declaration, and the destructor is called before the function returns to its calling function. Constructors and destructors are not essential to object-oriented programming. In other systems, programmers make a method specifically for initializing an object when they need one, then send that message to the object after creating it. In the C++ example that accompanies this article, I don't use the built-in constructors and destructors. In both the C and C++ examples, I have a method that initializes the values of the graphical object. I call this method INIT. In the C example, I use a function that allocates memory for the object before use and frees the memory after use. These functions aren't defined as part of a class and should not be confused with methods. Objects An object declaration looks like a declaration of something for which there is a typedef in C. A declaration of an object of class circle looks like: circle c1; In the graphics example, immediately after declaring a graphical object the init message is sent to the new object. This gives the object its starting position and size, and draws it on the screen. Listing 7, line 99 shows initialization of a circle at position (40, 40), with a radius of 20. After sending the init message, we can send a move message to the object, causing it to move on the screen. (Listing 7, line 103-105). In the C example, we use a pointer in an object to point to the data specific to that instance of the object. new_object() allocates that data on the heap, and the function free_obj () frees it. In contrast, the C++ translator actually creates a structure that contains the data. In our example, this structure is an automatic structure. Space for it gets deallocated when main() returns. We don't need to free any data on the heap as we needed to do in the C example. Inheritance Just as in the C example, the C++ example demonstrates inheritance of methods. double_circle inherits init and move from class circle. Messages Sending a message in C looks like: message(&c1, MOVE, 1, 1); Sending a message in C++ looks like: c1.move(1, 1); We specify the same essential elements in both cases. They are: The object (c1) The message (MOVE or move) The number of pixels to move in the y and x direction. Summary Of OOP In C++ Data hiding and modularity are important issues in C++ as in other languages. I am not addressing these issues and have put the entire program in one source file. I want to focus on the object-oriented aspect and keep it simple. Often in C++, when a message is sent to an object of a known type, the compiler resolves the particular method to invoke at compilation time. This is called early binding. In contrast, the function message() in the C scheme presented here resolves the issue of which method to invoke at run time. This is called late binding. Because the C methodology always does late binding, a little more code must always be executed at run time. The C code may be a bit slower than the code generated by the C++ translator. However, when using virtual functions, I believe that the speed of sending a message in C is comparable to C++. C++ inherits many of the characteristics of C. In C++, you have the ability to corrupt memory in the same ways that you can corrupt memory in C. This causes temporal and referential non-localization of bugs. C++ offers the same beneficial characteristics of C such as speed, compactness, and the possibility of portability. Portability The C code is quite portable and runs on: Microsoft C v5.1 Microsoft Quick C v2.0 Zortech C compiler The C++ code runs on: Zortech C++ compiler Glockenspiel C++ translator using the Microsoft C compiler v5.1. The graphics code works on CGA, EGA, Hercules and VGA. The utility module can use either the graphics library that accompanies Microsoft C v5.0 or the graphics library that comes with the Zortech C++ compiler. If you are using the Microsoft graphics library and Hercules graphics, before you can run these programs you need to run MSHERC.COM. The Zortech graphics library has its origin at the lower-left corner. Microsoft has its origin at the upper-left corner. Also, because pixels are not square, neither the Zortech nor the Microsoft libraries create perfectly round circles. Because this article is focusing on object-oriented techniques and not on graphical techniques, I didn't address any of these problems. Exercises A few valuable exercises might be: Make a new class such as a diamond. Make a new method such as expand or contract that will change the size of an object. Adapt this system to another graphical system. Acknowledgements I thank Marc Rochkind and Tom Cargill, who taught me much of what I know about object-oriented programming. Listing 1 Class Graphical Object Graphical Object is an abstract class. There will never be any instances of this class. Classes Circle and Square are subclasses of this class. Graphical Object data: y position x position Graphical Object methods: INITIALIZE Starting y position Starting x position DRAW Only implemented by subclasses MOVE Arguments: Increment in the y direction Increment in the x direction Send the draw black message to the object (erase the object). Modify the x and y position of the object per the arguments passed to the MOVE method. Send the draw white message to the object. Class Circle Circle is a subclass of class Graphical Object. Circle data (in addition to Graphical Object data): radius of the circle Circle methods: INITIALIZE Arguments: Starting y position Starting x position Radius Send the INITIALIZE message to class Graphical Object Store the size in the Circle data. Send the DRAW message to the Circle. DRAW Argument: Color of the circle to be drawn. Draw the circle on the screen. MOVE Inherited from the class Graphical Object. Class Square Square is a subclass of class Graphical Object. Square data: the length of a side of the square Square methods: INITIALIZE Arguments: Starting y position Starting x position Radius Send the INITIALIZE message to class Graphical Object Store the size in the Square data. Send the DRAW message to the Square. DRAW Argument: Color of the square to be drawn. Draw the square on the screen. MOVE Inherited from the class Graphical Object. Class Double_circle Class Double_circle is a subclass of class Circle. Double_circle data: Same a for a Circle. Double_circle methods: INITIALIZE Inherited from class Circle. DRAW Argument: Color of the Double_circle to be drawn. Draw a circle on the screen. Draw a slightly smaller concentric circle on the screen. MOVE Inherited from class Circle. Listing 2 001 /* obj.h - Interface to module for object oriented 002 programming in C. */ 003 004 struct class { 005 int size; /* size of data */ 006 int nbr_methods; 007 void (**method)(); 008 }; 009 010 typedef struct class CLASS; 011 012 typedef struct { 013 void *data; 014 CLASS *class; 015 } OBJECT; 016 017 void new_class(CLASS *class, CLASS *super_class, 018 int nbr_methods, int size); 019 void reg_method(CLASS *class, int mth, void (*fcn)()); 020 void new_object(OBJECT *obj, CLASS *class); 021 void message(OBJECT *obj, int msg, ...); 022 void free_object(OBJECT *obj); 023 void free_class(CLASS *class); Listing 3 001 #include 002 #include 003 #include 004 #include "utility.h" 005 #include "obj.h" 006 007 void new_class(CLASS *class, CLASS *super_class, 008 int nbr_methods, int size) 009 { 010 int x; 011 class->nbr_methods = nbr_methods; 012 class->size = size; 013 class->method = 014 (void (**)())malloc 015 ((unsigned)(nbr_methods * sizeof (void (*)()))); 016 for (x = 0; x < nbr_methods; ++x) 017 class->method[x] = (void *)NULL; 018 if (super_class != NULL) 019 for (x = 0; x < super_class->nbr_methods && 020 x < class->nbr_methods; ++x) 021 class->method[x] = super_class->method[x]; 022 } 023 024 void free_class(CLASS *class) 025 { 026 free(class->method); 027 } 028 029 /* register a method with a class */ 030 void reg_method(CLASS *class, int mth, void (*fcn)()) 031 { 032 if (mth < 0 mth >= class->nbr_methods) 033 fatal("attempting to register an invalid method"); 034 class->method[mth] = fcn; 035 } 036 037 /* initialize an object */ 038 void new_object(OBJECT *obj, CLASS *class) 039 { 040 void *v; 041 obj->class = class; 042 v = malloc((unsigned)class->size); 043 if (v == NULL) 044 fatal("smalloc failed"); 045 obj->data = (void *)((unsigned char *)v); 046 } 047 048 /* send a message to an object */ 049 void message(OBJECT *obj, int msg, ...) 050 { 051 va_list arg_ptr; 052 va_start(arg_ptr, msg); 053 if (obj->class->method[msg) == NULL) 054 fatal("no method for this class"); 055 (*obj->class->method[msg])(obj, arg_ptr); 056 va_end(arg_ptr); 057 } 058 059 /* free the data allocated for an object */ 060 void free_object(OBJECT *obj) 061 { 062 free(obj->data); 063 } Listing 4 001 /* interface to utility module */ 002 003 extern int g_white; 004 extern int g_black; 005 006 void fatal(char *s); 007 void g_init(void); 008 void cleanup(void); 009 void g_circle(int y, int x, int radius, int color); 010 void g_square(int y, int x, int size, int color); Listing 5 001 #include 002 #include 003 #include 004 #include "utility.h" 005 #ifdef __ZTC______LINEEND____ 006 #include 007 #else 008 #include 009 #endif 010 011 int g_white; 012 int g_black; 013 014 void fatal(char *s) 015 { 016 printf("FATAL ERROR: %s\n", s); 017 exit(1); 018 } 019 020 void trace(char *fmt, ...) 021 { 022 static FILE *outfp = NULL; 023 va_list arg_ptr; 024 va_start(arg_ptr, fmt); 025 if (outfp == NULL) { 026 unlink("tf"); 027 if ((outfp = fopen("tf", "w")) == NULL) 028 fatal("fopen failed\n"); 029 setbuf(outfp, NULL); 030 } 031 vfprintf(outfp, fmt, arg_ptr); 032 va_end(arg_ptr); 033 } 034 035 /* utility function to put screen in graphics mode */ 036 void g_init(void) 037 { 038 #ifdef_ZTC_____LINEEND____ 039 fg __init__all(); 040 g_white = FG_WHITE; 041 g_black = FG_BLACK; 042 #else 043 struct videoconfig this_screen; 044 _getvideoconfig(&this_screen); 045 switch (this_screen.adapter) 046 { 047 case _CGA: 048 case _OCGA: 049 _setvideomode(_HRESBW); 050 break; 051 case _EGA: 052 case _OEGA: 053 _setvideomode(_ERESCOLOR); 054 break; 055 case _VGA: 056 case _OVGA: 057 case _MCGA: 058 _setvideomode(_VRES2COLOR); 059 break; 060 case _HGC: 061 _setvideomode(_HERCMONO); 062 break; 063 default: 064 printf("This program requires a CGA, EGA, MCGA,"); 065 printf("VGA, or Hercules card\n"); 066 exit(0); 067 } 068 g_white = _getcolor(); 069 g_black = 0; 070 #endif 071 } 072 073 /* utility function - wait for a key so we can see 074 graphics, set video mode back to character mode */ 075 void cleanup() 076 { 077 int ch; 078 ch = getchar(); 079 #ifdef __ZTC______LINEEND____ 080 fg_term(); 081 #else 082 _setvideomode(_DEFAULTMODE); 083 #endif 084 /*lint -esym(550,ch) */ 085 } 086 /*lint +esym(550,ch) */ 087 088 void g_circle(int y, int x, int radius, int color) 089 { 090 #ifdef __ZTC_____LINEEND____ 091 fg_drawarc((fg_color_t)color, FG_MODE_SET, ~0, x, y, 092 radius, 0, 3600, fg_displaybox); 093 #else 094 _setcolor(color); 095 _ellipse(_GBORDER, x - radius, y - radius, x + radius, 096 y + radius); 097 #endif 098 } 099 100 void g_square(int y, int x, int size, int color) 101 { 102 #ifdef __ZTC______LINEEND____ 103 int hs; 104 fg_box_t box; 105 hs = size / 2; 106 box[FG_X1] = x - hs; 107 box[FG_Y1] = y - hs; 108 box[FG_X2] = x + hs; 109 box[FG_Y2] = y + hs; 110 fg_drawbox((fg_color_t)color, FG_MODE_SET, ~0, 111 FG_LINE_SOLID, box, fg_displaybox); 112 #else 113 int hs; 114 hs = size / 2; 115 _setcolor(color); 116 _rectangle(_GBORDER, x - hs, y - hs, x + hs, y + hs); 117 #endif 118 } Listing 6 001 #include 002 #include 003 #include 004 #include "utility.h" 005 #include "obj.h" 006 007 /* methods for graphical_object, circle, double_circle, square */ 008 #define INIT 0 009 #define DRAW 1 010 #define MOVE 2 011 012 /********************************************************/ 013 /* CLASS GRAPHICAL OBJECT */ 014 015 struct graph_obj_s { 016 int y; 017 int x; 018 }; 019 020 typedef struct graph_obj_s GRAPH_OBJ_T; 021 #define GRAPH_OBJ_SIZE sizeof(GRAPH_OBJ_T) 022 #define GRAPH_OBJ_OFFSET 0 023 024 /* graph_obj_init(object, y_position, x_position); */ 025 void graph_obj_init(OBJECT *obj, va_list arg_ptr) 026 { 027 GRAPH_OBJ_T *g; 028 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 029 GRAPH_OBJ_OFFSET); 030 g->y = va_arg(arg_ptr, int); 031 g->x = va_arg(arg_ptr, int); 032 } 033 034 /* graph_obj_move(object, distance_y, distance_x); */ 035 void graph_obj_move(OBJECT *obj, va_list arg_ptr) 036 { 037 GRAPH_OBJ_T *g; 038 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 039 GRAPH_OBJ_OFFSET); 040 message(obj, DRAW, g_black); 041 g->y += va_arg(arg_ptr, int); 042 g->x += va_arg(arg_ptr, int); 043 message(obj, DRAW, g_white); 044 } 045 046 /********************************************************/ 047 /* CLASS CIRCLE */ 048 049 struct circle_s { 050 int radius; 051 }; 052 053 typedef struct circle_s CIRCLE_T; 054 #define CIRCLE_SIZE sizeof(CIRCLE_T) + GRAPH_OBJ_SIZE 055 #define CIRCLE_OFFSET sizeof(GRAPH_OBJ_T) 056 057 /* circle_init(object, y_position, x_position, radius); */ 058 void circle_init(OBJECT *obj, va_list arg_ptr) 059 { 060 CIRCLE_T *c; 061 graph_obj_init(obj, arg_ptr); 062 (void)va_arg(arg_ptr, int); 063 (void)va_arg(arg_ptr, int); 064 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET); 065 c->radius = va_arg(arg_ptr, int); 066 message(obj, DRAW, g_white); 067 } 068 069 /* circle_draw(object, color); */ 070 void circle_draw(OBJECT *obj, va_list arg_ptr) 071 { 072 int color; 073 CIRCLE_T *c; 074 GRAPH_OBJ_T *g; 075 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET); 076 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 077 GRAPH_OBJ_OFFSET); 078 color = va_arg(arg_ptr, int); 079 /* g_circle(g->y, g->x, c->radius, va_arg(arg_ptr, int)); */ 080 g_circle(g->y, g->x, c->radius, color); 081 } 082 083 /********************************************************/ 084 /* CLASS SQUARE (very similar to CIRCLE) */ 085 086 struct square_s { 087 int size; 088 }; 089 090 typedef struct square_s SQUARE_T; 091 #define SQUARE_SIZE sizeof(SQUARE_T) + GRAPH_OBJ_SIZE 092 #define SQUARE_OFFSET sizeof(GRAPH_OBJ_T) 093 094 /* square_init(object, y_position, x_position, size); */ 095 void square_init(OBJECT *obj, va_list arg_ptr) 096 { 097 SQUARE_T *s; 098 graph_obj_init(obj, arg_ptr); 099 (void)va_arg(arg_ptr, int); 100 (void)va_arg(arg_ptr, int); 101 s = (SQUARE_T *)((unsigned char *)obj->data + SQUARE_OFFSET); 102 s->size = va_arg(arg_ptr, int); 103 message(obj, DRAW, g_white); 104 } 105 106 /* square_draw(object, color); */ 107 void square_draw(OBJECT *obj, va_list arg_ptr) 108 { 109 SQUARE_T *s; 110 GRAPH_OBJ_T *g; 111 s = (SQUARE_T *)((unsigned char *)obj->data + SQUARE_OFFSET); 112 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 113 GRAPH_OBJ_OFFSET); 114 g_square(g->y, g->x, s->size, va_arg(arg_ptr, int)); 115 } 116 117 /********************************************************/ 118 /* CLASS DOUBLE CIRCLE (sub-class of CIRCLE) */ 119 120 #define DOUBLE_CIRCLE_SIZE CIRCLE_SIZE 121 122 /* double_circle_draw(object, color); */ 123 void double_circle_draw(OBJECT *obj, va_list arg_ptr) 124 { 125 int color; 126 CIRCLE_T *c; 127 GRAPH_OBJ_T *g; 128 c = (CIRCLE_T *)((unsigned char *)obj->data + CIRCLE_OFFSET); 129 g = (GRAPH_OBJ_T *)((unsigned char *)obj->data + 130 GRAPH_OBJ_OFFSET); 131 color = va_arg(arg_ptr, int); 132 g_circle(g->y, g->x, c->radius, color); 133 g_circle(g->y, g->x, c->radius - 2, color); 134 } 135 136 /********************************************************/ 137 138 int main(int argc, char **argv); 139 int main(int argc, char **argv) 140 { 141 int x; 142 143 CLASS graph_obj; 144 CLASS circle; 145 CLASS square; 146 CLASS double_circle; 147 148 OBJECT c1; 149 OBJECT s1; 150 OBJECT dc1; 151 152 g_init(); 153 154 /* make class graphical object */ 155 new_class(&graph_obj, NULL, 3, GRAPH_OBJ_SIZE); 156 reg_method(&graph_obj, INIT, graph_obj_init); 157 reg_method(&graph_obj, MOVE, graph_obj_move); 158 159 /* make class circle */ 160 new_class(&circle, &graph_obj, 3, CIRCLE_SIZE); 161 reg_method(&circle, INIT, circle_init); 162 reg_method(&circle, DRAW, circle_draw); 163 164 /* make class square */ 165 new_class(&square, &graph_obj, 3, SQUARE_SIZE); 166 reg_method(&square, INIT, square_init); 167 reg_method(&square, DRAW, square_draw); 168 169 /* make class double_circle */ 170 new_class(&double_circle, &circle, 3, DOUBLE_CIRCLE_SIZE); 171 reg_method(&double_circle, DRAW, double_circle_draw); 172 173 /* make a circle object */ 174 new_object(&c1, &circle); 175 message(&c1, INIT, 40, 40, 20); 176 177 /* make a square object */ 178 new object(&s1, &square); 179 message(&s1, INIT, 40, 100, 20); 180 181 /* make a double circle object */ 182 new_object(&dc1, &double_circle); 183 message(&dc1, INIT, 40, 160, 20); 184 185 for (x = 0; x < 100; ++x) { 186 message(&c1, MOVE, 1, 1); 187 message(&s1, MOVE, 1, 0); 188 message(&dc1, MOVE, 0, -1); 189 } 190 191 free_object(&c1); 192 free_object(&s1); 193 free_object(&dc1); 194 195 free_class(&graph_obj); 196 free_class(&circle); 197 free_class(&square); 198 free_class(&double_circle); 199 200 cleanup(); 201 202 return (0); 203 } Listing 7 001 #include 002 #include 003 #include "utility.h" 004 005 /*********************************************************/ 006 /* CLASS GRAPHICAL OBJECT */ 007 008 class graph_obj { 009 public: 010 int y; 011 int x; 012 void init(int y, int x); 013 void move(int y, int x); 014 virtual void draw(int color){}; 015 }; 016 017 void graph_obj::init(int y2, int x2) 018 { 019 y = y2; 020 x = x2; 021 } 022 023 void graph_obj::move(int y_delta, int x_delta) 024 { 025 draw(g_black); 026 x += x_delta; 027 y += y_delta; 028 draw(g_white); 029 } 030 031 /*********************************************************/ 032 /* CLASS CIRCLE */ 033 034 class circle: public graph_obj { 035 public: 036 int radius; 037 void init(int y, int x, int radius); 038 void draw(int color); 039 }; 040 041 void circle::init(int y2, int x2, int radius2) 042 { 043 graph_obj::init(y2, x2); 044 radius = radius2; 045 draw(g_white); 046 } 047 048 void circle::draw(int color) 049 { 050 g_circle(y, x, radius, color); 051 } 052 053 /*********************************************************/ 054 /* CLASS SQUARE */ 055 056 class square: public graph_obj { 057 public: 058 int size; 059 void init(int y, int x, int radius); 060 void draw(int color); 061 }; 062 063 void square::init(int y2, int x2, int size2) 064 { 065 graph_obj::init(y2, x2); 066 size = size2; 067 draw(g_white); 068 } 069 070 void square::draw(int color) 071 { 072 g_square(y, x, size, color); 073 } 074 075 /*********************************************************/ 076 /* CLASS DOUBLE_CIRCLE */ 077 078 class double_circle: public circle { 079 public: 080 void draw(int color); 081 }; 082 083 void double_circle::draw(int color) 084 { 085 g_circle(y, x, radius, color); 086 g_circle(y, x, radius - 2, color); 087 } 088 089 /********************************************************/ 090 091 int main(void); 092 int main(void) 093 { 094 int x; 095 circle c1; 096 square s1; 097 double_circle dc1; 098 g_init(); 099 c1.init(40, 40, 20); 100 s1.init(40, 100, 20); 101 dc1.init(40, 160, 20); 102 for (x = 0; x < 100; ++x) { 103 c1.move(1, 1); 104 s1.move(1, 0); 105 dc1.move(0, -1); 106 } 107 cleanup(); 108 return (0); 109 } Tools For MS-DOS Directory Navigation Leor Zolman Leor Zolman wrote "BDS C", the first C compiler designed exclusively for personal computers. Since then he has designed and taught programming workshops and has also been involved in personal growth workshops as both participant and staff member. He STILL doesn't hold any degrees. His latest incarnation is as a CUJ staff member. As an MS-DOS user with a large amount of hard disk space to manage, I frequently find myself cd-ing all over the system in pursuit of source files and data. The standard MS-DOS command processor COMMAND.COM's repertoire of options for facilitating system navigation is bare-bones and full of idiosyncrasies. For instance, to change directly to an arbitrary drive and user area, the user must enter the drive selector and path specification as two separate commands. Switching from the root directory of drive C: to the \work directory on drive D: requires the command sequence: C:\>d: (select D:) D:\>cd work (change to the desired directory) D:\WORK>... (All examples assume the PROMPT environment variable is set to $p$g so that COMMAND.COM will display the current path as part of the system prompt.) If the user attempts to select a different drive and a path with one command, he will find that apparently nothing has happened: C:\>cd d:\work C:\>... Actually, the system has selected the specified path to be active on the specified drive, but the specified drive is not selected to be current! The system maintains a current working directory for each logical drive. If the user were then to select that other drive, i.e., C:\>d: D:\WORK>... then the selected path would show up as the current directory. Another "missing" feature in the standard command environment is a simple directory-name aliasing mechanism, so that one can switch quickly to commonly-used directories even if the path name happens to be lengthy. MS-DOS does provide a simplistic facility (the subst command) to relate an arbitrary path to a new drive designator, but subst isn't really adequate: the alias name is limited to a single letter and there is no facility for viewing all active assignments. I would prefer to have the ability to assign arbitrary mnemonics to arbitrary paths, and to have those mnemonics be recognized when specified in cd commands. I would also like some clean mechanism for instantly switching to the previous directory -- even if I've forgotten what it was. The Answer To address these needs, I wrote an extended CD command that supports combined drive and path specifications and a companion command that returns the user to the previous directory (taking the directory specification from information recorded in an environment variable by the extended cd command). The cd-replacement stores the old full path name in an environment variable before switching to a new specified path, and the companion command reads this environment variable and returns to the original directory upon request. Since the extended cd must modify its parent's environment, it uses the functions for modifying the master environment which appeared in the July 1989 issue of CUJ. CDE (for CD Extended) works similarly to MS-DOS's cd command, except for the following special cases: When both a drive designator and a path name are specified, the specified drive is immediately selected together with the path. When the argument is identified as the name of an existing MS-DOS environment variable, the named variable is assumed to contain a path name to be substituted as the path to switch to. In support of the "return to previous directory" feature, I decided to implement a "directory stack" mechanism. This stack is maintained via environment variables, and the user may select the naming convention for those variables by customizing the #define statements in CDERET.H. (See Listing 1.) One master environment variable (I call it CHAINS) specifies the maximum size of the directory stack. When CDE is first invoked, CDE checks to see if the CHAINS variable has been previously defined in the environment; if so, its current value is used. If not, CDE initializes CHAINS to a default value (also specified by a definition in the header file). Thus, the user has the option of setting the value of CHAINS explicitly (using the standard built-in command set) or allowing CDE to handle the initialization of CHAINS automatically. (See Listing 2.) A "stack" of size CHAINS is represented by a set of environment variables named by a common base name (I use CHAIN) with position numbers appended. Thus, with CHAINS=3, after several CDEs the environment variables CHAIN1, CHAIN2 and CHAIN3 would be created to store the pertinent path names in the environment. Every time CDE is used to change directories, it "pushes" the old current working directory "on the stack" by reassigning all the relevant environment variables. CHAIN1 is always the top of the stack, CHAINn (where n = CHAINS) is the base. Since there is no disk activity involved, this process is quite fast. The RET command (Listing 3) "returns" to the previous directory (either specified by CHAIN1 or undefined), then "pops" the stack by reassigning all the active environment variables in reverse order. As long as CHAINS is greater than 1, then the directory stack behaves as described above and successive uses of RET unravel the stack. When CHAINS is set to 1, RET considers this a special case: after returning to the directory specified by CHAIN1, CHAIN1 is reset to the name of the directory that was current at the time of the RET call. Thus, repeated uses of RET with CHAINS equal to 1 effect a "toggle" between two directories. Depending on the way your system is organized, this toggling mechanism may be more useful to you than the directory stack mechanism. Icing The directory aliasing feature is activated by simply setting an environment variable to the full path desired, then using that environment variable name as a parameter to CDE. For example, C:\>set WORK=d:\project\subproj\ new\testing C:\>cde work D:\PROJECT\SUBPROJ\NEW\TESTING>... As a special case, for convenience, giving the CDE command without any arguments will cause CDE to look for a special environment variable (I call it HOME) and switch to the directory it specifies. If you spend much of your time headquartered at one particular directory, this is an easy way to go back to it from anywhere in the system, regardless of the state of the directory stack. The current directory at the time this special form of CDE is given will, as usual, be recorded in the environment by CDE in case you want to use RET from the HOME directory. When setting environment variables in general, be careful not to type spaces between the end of the variable name and the = sign. DOS would keep the space as part of the variable name, and things wouldn't work. The CDE program will handle spaces after the = sign (and before the text) with no problem, but it's probably safer to be consistent and use no spaces whatsoever. Implementation Both CDE.C and RET.C have two phases of operation: the first phase performs the required drive/directory selection, and the second phase updates the related environment variables. If the first phase fails, then the programs exit immediately; there's no need to update environment variables if the current directory wasn't changed. To obtain the name of the target directory in phase one, RET simply accesses the CHAIN1 environment variable. If the variable does not exist, then CDE has never been run and an appropriate message is displayed. If CHAIN1 exists, it specifies the target path. CDE.C gets its target path name from the command line. If the name happens to be the name of an active environment variable, then the value of the variable with that name is used to obtain the target path. The directory selection process itself is identical for both commands and takes two steps: the selection of the logical drive and the selection of the desired directory. The drive is selected first; if that fails, we quit and no harm has been done. Once the new drive has been selected, then the new path is selected. If that fails, we have to go back and reinstate the original drive. If it succeeds, we're done with phase one. Phase two for RET.C is relatively straightforward. If CHAINS is equal to 1, then the CHAIN1 environment variable is set to the original current directory name (before phase one) in order to support the toggling feature. For other values of CHAINS, the directory stack is "popped" by looping to reassign each CHAINn variable to the value of its next higher counterpart. CDE's phase two begins by making sure the CHAINS environment variable, used to specify the stack size, is present and initialized. If it exists, its value is assigned to the program variable chaincnt. If CHAINS does not exist, then it is initialized to the default value (specified by the symbolic string constant DEFAULT_CHAINS). Finally the directory stack is "pushed" by copying each CHAINn variable (for n = 1 to CHAINS-1) to its next higher counterpart. CHAIN1 is a special case; it is assigned to the name of the current directory before phase one was completed. Configuration The following symbolic constants may be changed to suit your own preferences: CHAINS_VAR The master directory chain size control variable CHAIN_BASE The "base" name of directory stack variables DEFAULT_CHAINS The default value for CHAINS_VAR (in quotes) HOME_NAME The name of the env. variable for home directory The CDE.EXE and RET.EXE commands should be placed in a directory that is somewhere in your system search path. (I use c:\bin for all my personal utilities.) System-Dependent Functions The two areas of high compiler-dependency in this application, direct console I/O and DOS logical drive selection, have been isolated in a separate utility library named UTIL.C (Listing 4). The only support function required by the functions in UTIL.C is the bdos function typically supplied with most popular compiler libraries. If you need to write the bdos function yourself, the prototype is shown at the top of the UTIL.C source file. It takes an interrupt (int 21h) function number, a DX register value, and an AL register value as parameters (although the AL parameter is not needed for this application). The bdos function can easily be written in terms of any of the more general operating system interface functions (int86(), intdos(), etc) you may have available. To keep the commands' .EXE file sizes as short as possible, all messages are displayed on the console using direct console I/O calls (through bdos facilities) so as not to require the file I/O support to be dragged into the linkages. The UTIL.C functions cputs () and putch () are similar to their namesakes in the Microsoft library and are provided here for the benefit of users of compiler packages that do not include these functions. The setdrive() function I provide is cleaner than Microsoft's _dos_setdrive(). The library functions chdir() and getcwd() are used by the commands and should be available in your compiler's standard library. When compiled with optimization, both CDE.EXE and RET.EXE weigh in at just over 6K, so their load-and-run time is negligible. Caveats The following line in your CONFIG.SYS file will insure plenty of environment space for the CHAIN variables: shell = c:\command.com /p /e:1500 Due to an as-of-yet inexplicable MS-DOS anomaly, specifying too small a value for the environment (xxxx in /e:xxxx) may cause the system to hang up after CDE or RET completes execution. The message I've gotten says something about COMMAND.COM being "invalid". While this has never been destuctive, it has required a re-boot of the system. The only way I've found (so far) to avoid this problem is to allocate plenty of extra environment space. If anyone has a more "bulletproof" solution, please let us know here at CUJ. I recommend highly that one modification be made to the Master Environment package as listed in the 7/89 CUJ: the environment variable name should be converted to upper case both in the m_getenv() and m_delenv() functions. As written, only the m_putenv() function converts the name to upper case, and this causes failure when either m_getenv() or m_delenv are called with lower-case variable names. To make this change, alter the lines reading: n = name; to: n = strupr(name); There is one such line near the beginning of both the m_getenv() and m_putenv() functions. Linking The commands to compile and link CDE.C and RET. C using Microsoft C are shown at the top of the source file listings. I arbitrarily named the master environment package ENVLIB.OBJ, so including envlib on the qcl command line links in the object module. Summary The CDE and RET commands provide a clean, quick and convenient mechanism for alleviating some of MS-DOS's command processor limitations. Although there are plenty of full-blown command processor replacements, shells and special-purpose TSRs out there (even for free) that offer alternative ways to "get around" your DOS system, few (if any) of these can offer 100% compatibility with all other packages and TSRs, zero bytes of system RAM overhead (unless you count the few extra bytes of environment space required), and virtually instantaneous gratification. And you even get the source code! Listing 1 /* * UTIL.H: Includes and definitions for the CDE/RET * Directory Navigation utilities. */ #define MAX_DIRNAME_SIZE 100 /* longest conceivable directory name size */ #define MAX_EVARNAME_SIZE 20 /* max length of env. var. names created */ #define DEFAULT_CHAINS "1" /* initial default dir. stack size */ #define CHAINS_VAR "CHAINS" /* name of env. var. controlling stack size */ #define CHAIN_BASE "CHAIN" /* base name of env. vars holding dir names */ #define HOME_NAME "HOME" /* Name of 'home dir' environment variable */ /* * Prototypes for utility functions in CDERET.C: */ void error(char *msg); int cputs(char *txt); int putch(char c); int setdrive(int drive_no); int getdrive(); void change_dir(char *newpath); /* * Prototypes for Master Environment Control routines * (functions from CUJ 7/89) */ char *m_getenv(char *name); int m_putenv(char *name, char *text); int m_delenv(char *name); Listing 2 /* * CDE.C: Extended "cd" command for MS-DOS. * Written by Leor Zolman, 9/20/89 * * Features: * 1) Allows changing to another drive and directory in one step * 2) Supports directory aliasing through environment variables * 3) With no arguments, optionally switches to 'home' directory * (if the HOME environment variable is currently defined) * 3) Manages a "previous directory" stack through environment * variables. The number of entries in the stack is dynamically * configurable through a special controlling environment variable. * 4) For special case of stack size = 1, toggling back and forth * between two directories is supported * * Usage: * cde [d:] [path] (changes to given drive/directory) * cde (indirect dir change on environment variable) * cde (changes to HOME directory, if defined, or * returns current working directory otherwise) * * Compile/Link: * cl /Ox cde.c util.c envlib (where ENVLIB.OBJ is Master Env. Pkg.) * * Uses the Master Environment library from CUJ 7/89. * */ #include #include #include #include #include "util.h" main(int argc, char **argv) { char *pathp; char cwdbuf[MAX_DIRNAME_SIZE]; /* buffer for current dir name */ int chaincnt; /* size of dir stack */ char chaincnt_txt[10], *chaincntp; char chnevar1[MAX_EVARNAME_SIZE], /* env var names built here */ chnevar2[MAX_EVARNAME_SIZE]; char chndname_save[MAX_DIRNAME_SIZE], *chndname; char itoabuf[10]; /* used by itoa() function */ int i; /* Get current dir. name and current drive: */ getcwd(cwdbuf, MAX_DIRNAME_SIZE); if (argc == 1) /* if no args given, */ if (pathp = m_getenv(HOME_NAME)) /* if HOME directory defined, */ { change_dir(pathp); /* then try to change to it. */ strcpy(chnevar1, CHAIN_BASE); /* set top-stack env var */ strcat(chnevar1, "1"); if (m_putenv(chnevar1, cwdbuf)) /* to old dir */ error("Error setting environment variable"); return 0; } else { /* just print current working dir */ cputs(cwdbuf); putch('\n'); return 0; } if (argc != 2) error("Usage: cde [d:][newpath] or \n"); pathp = argv[1]; /* skip whitespace in pathname */ if (chndname = m_getenv(pathp)) /* if env-var-name given, */ pathp = chndname; /* use its value as new path */ change_dir(pathp); /* Read or initialize master chain length variable: */ if ((chaincntp = m_getenv(CHAINS_VAR)) == NULL) if (m_putenv(CHAINS_VAR, strcpy(chaincntp = chaincnt_txt, DEFAULT_CHAINS))) error("Error creating environment variable"); /* Update the environment directory chain: */ chaincnt = atoi(chaincntp); for (i = chaincnt; i > 0; i--) { /* construct name of previous dirname variable: */ if (i != 1) { strcpy(chnevar2, CHAIN_BASE); strcat(chnevar2, itoa(i-1, itoabuf, 10)); } if (chndname = ((i != 1) ? m_getenv(chnevar2) : cwdbuf)) { /* copy value of prev. to current */ strcpy(chndname_save, chndname); /* m_putenv() bashes it */ strcpy(chnevar1, CHAIN_BASE); strcat(chnevar1, itoa(i, itoabuf, 10)); if (m_putenv(chnevar1, chndname_save)) error("Error setting environment variable"); } } return 0; } Listing 3 /* * RET.C: Return to previous working directory * Written by Leor Zolman, 9/89 * * (companion to CDE.C) * Uses the Master Environment package from CUJ 7/89 * * Usage: * ret (returns to previous directory) * * Compile/Link: * cl /Ox ret.c util.c envlib (ENVLIB.OBJ is Master Environment pkg) */ #include #include #include #include #include "util.h" main(int argc, char **argv) { char *pathp; char cwdbuf[MAX_DIRNAME_SIZE]; int chaincnt; char chnevar1[MAX_EVARNAME_SIZE], /* env var names built here */ chnevar2[MAX_EVARNAME_SIZE]; char chndname_save[MAX_DIRNAME_SIZE], *chndname; char itoabuf[10]; /* used by itoa() function */ int i; /* Get current dir. name and current drive: */ getcwd(cwdbuf, MAX_DIRNAME_SIZE); if (argc != 1) error("Usage: ret (returns to last dir cde'd from)"); if ((pathp = m_getenv(CHAINS_VAR)) == NULL) error("cde hasn't been run yet"); else chaincnt = atoi(pathp); /* See if CDE has created any entries: */ strcpy(chnevar1, CHAIN_BASE); strcat(chnevar1, "1"); if (!(pathp = m_getenv(chnevar1))) /* if so, pathp points to last dir */ error("No previous directory"); /* else no previous dir */ change_dir(pathp); /* change to previous directory: */ /* Update the environment directory chain: */ if (chaincnt == 1) /* special case: record old dir */ { if (m_putenv(chnevar1, cwdbuf)) error("Error setting environment variable"); return 0; } for (i = 1; ; i++) { /* get name of current dirname variable */ strcpy(chnevar1, CHAIN_BASE); strcat(chnevar1, itoa(i, itoabuf, 10)); strcpy(chnevar2, CHAIN_BASE); strcat(chnevar2, itoa(i + 1, itoabuf, 10)); if (!(chndname = m_getenv(chnevar2))) break; /* found end of saved chain */ /* copy value of next higher to current */ strcpy(chndname_save, chndname); /* m_putenv() bashes it */ strcpy(chnevar1, CHAIN_BASE); strcat(chnevar1, itoa(i, itoabuf, 10)); if (m_putenv(chnevar1, chndname_save)) error("Error setting environment variable"); } return 0; } Listing 4 /* * UTIL.C: Utility functions for CDE/RET package * * These function rely on the "bdos" library function * from your compiler's library. Prototype: * * int bdos(int dosfn, unsigned dosdx, unsigned dosal); */ #include #include #include #include "util.h" /* * Print error msg and abort: */ void error(char *msg) { cputs("cde: "); cputs(msg); putch('\n'); exit(-1); } /* * Change to specified drive/path, terminate program on error: */ void change_dir(char *new_path) { int old_drive; old_drive = getdrive(); while (*new_path && isspace(*new_path)) /* skip whitespace */ new_path++; if (new_path[1] == ':') /* if drive designator */ { /* given, then set drive */ if (setdrive(tolower(*new_path) - 'a')) error("Can't select given drive\n"); new_path += 2; } if (*new_path && chdir(new_path)) /* If path given, set new path. */ { setdrive(old_drive); /* If error, restore drive */ error("Can't change to given path"); } } /* * DOS functions, written in terms of the "bdos" function: */ int cputs(char *txt) /* display msg, console I/O only */ { char c; while [c = *txt++) { if (c == '\n') putch('\r'); putch(c); } return 0; } int putch(char c) /* display a char on console */ { return bdos(2, c, 0); } int setdrive(int drive_no) /* set logical drive. Return */ { /* non-zero on error. */ int after; bdos(0x0E, drive_no, 0); after = bdos(0x19, 0, 0); if ((after & 0xff) == drive_no) /* low 8 bits are new drive no. */ return 0; /* set correctly */ else return -1; /* error. */ } int getdrive() /* return current logical drive */ { return bdos(0x19, 0, 0); } Dealing With Memory Allocation Problems Dear Mr. Ward: I am not much of a letter writer, but after reading the July 89 issue of The C Users Journal I felt I could save some of your readers a lot of time tracking down a problem with the Microsoft C, version 5.10 memory allocation routines. Enclosed is a listing and the output from the program. This may help Steven Isaacson who is having memory allocation problems using Vitamin C. I found this problem after a week of tracking down a memory leak problem in a very large application. My final solution was to write my own malloc()/free() rountines that call DOS directly. This will let the DOS allocator do what is is supposed to do. No time penalty was noticed in our application. Note if you do write your own malloc()/free() routines, call them something else! MSC uses these routines internally and makes assumptions about what data is located outside the allocated area. I always use a malloc()/free() shell to test for things like memory leaks and the free of a non-allocated block. It also will give you an easy way to install a global 'out of memory' error handler. The code supplied by Leonard Zerman on finding the amount of free space in a system is simplistic and very limited. A better routine would build a linked list of elements and then the variable vptrarray could be made a single pointer to the head of the list. The entire routine becomes dynamic, much more robust, and there is no danger of overflowing a statically allocated array. See the supplied code for an example. The linked list implementation has the side effect that it will work on a virtual memory system. Why you would want to do this is beyond me, but it could be considered a very time consuming way to find out what swapmax is set to on a UNIX system. If you have any questions, please contact me. My phone number is (408) 988-3818. My fax number is (408) 748-1424. Sincerely yours, Jim Schimandle Primary Syncretics 473 Sapena Court, Unit #6 Santa Clara, CA 95054 Editor's Note: If you couldn't find "Listing 1" in last month's "We Have Mail", you needn't fear the onset of any perceptual disorder -- there was no Listing 1. Usually publishers blame this kind of problem on someone else -- the printer, the typesetter, the proofreader, the paste-up artist. Unfortunately this publisher doesn't have any convenient scapegoats; I pasted up the letters section (something I often do), and failed to include the listing. Anyway, here is the original letter and the promised listing. This time it will be right -- my staff is doing it. --rlw Listing 1 /*---------------------------------------------------------------------- ++ membug.c Demonstrate MSC malloc() large size problem Description membug.c demonstrates a problem that occurs when Microsoft C, version 5.10 is used to allocate and free large blocks of memory. If this program is compiled and run, you will find that the first list will have significantly more memory allocated to it. The second list will only have 1 to 2 elements allocated to it, depending on your memory layout. The basic problem is that MSC never deallocates a DOS allocated memory block, even if the memory call is about to fail. Thus, the first list causes the MSC runtime to allocate memory in 48K blocks. When the first list is freed, the 48K blocks remain. Then, when the second list is allocated, there are only 2 blocks that DOS can carve the 60K blocks from: the default memory segment and the last DOS memory block. The default memory segment is 64K, so we should always get an allocation from it. The last memory block can be expanded by DOS to fit the 60K request if your memory layout will allow it. Note that if you reverse the order of memory requests, both will return the same number of memory blocks because the 48K requests will fit in the 60K blocks. Compilation Compilation is under Microsoft C, version 5.1 using the command: c1 /W3 AL membug.c Execution Execution of the program should use the command line: membug > membug.out +- $Log$ -- */ #include #include #include #include /* Local definitions */ /* ----------------- */ #define FIRST_ALLOC_SIZE 48000 #define SECOND_ALLOC_SIZE 60000 /* Memory allocation list structure */ /* -------------------------------- */ typedef struct mb /* Memory list node */ { /* ---------------------------- */ struct mb * mb_next ; /* Pointer to next block */ char mb_data ; /* Start of data area */ /* Actual data area size is */ /* determined by runtime */ /* malloc() argument */ } MEM_BLOCK ; /* Pointer conversion macros */ /* ------------------------- */ #define FARPTR_SEG(a) ((int) (((unsigned long) (a)) >> 16)) #define FARPTR_OFF(a) ((int) ((long) (a))) #define MAKE-FARPTR(seg,off) ((void far *) ((((long) (seg)) << 16) + (of))) /* Function prototypes */ /* ------------------- */ void main() ; void DOS_Mem_Display(char *) ; /*--------------------------------------------------------------------*/ + main Entry point for MSC dynamic memory test Usage void main() Parameters None Description main() is the entry point for the Microsoft C dynamic memory test. The function allocates a list of FIRST_ALLOC_SIZE elements, frees the first list, allocates a second list of SECOND_ALLOC_SIZE, and frees the second list. The statistics printed out are the total bytes allocated by each allocation and a dump of the DOS memory list after each allocation/free. Notes None - */ void main() { MEM_BLOCK * list ; MEM_BLOCK * p ; long first_size ; long second_size ; /* Allocate list using first allocation size */ /* ----------------------------------------- */ list = NULL ; first_size = 0; while ((p = (MEM_BLOCK *) malloc(FIRST_ALLOC_SIZE)) != NULL) { p->mb_next = list ; list = p ; firstsize += FIRST_ALLOC_SIZE ; } /* Print first allocation results */ /* ------------------------------ */ printf("***** First allocation - %ld *****\n\n", first_size) ; DOS_Mem_Display("After first allocation/n") ; /* Free first list */ /* --------------- */ while (list != NULL) { p = list ; list = list->mb_next ; } DOS_Mem_Display("After first free\n") ; /* Allocate list using second allocation size */ /* ------------------------------------------- */ list = NULL ; second_size = 0 ; while ((p = (MEM_BLOCK *) malloc(SECOND_ALLOC_SIZE)) != NULL) { p->mb_next = list ; list = p ; second_size += SECOND_ALLOC_SIZE ; } /* Print second allocation results */ /* ------------------------------- */ printf("***** Second allocation - %ld *****\n\n", second_size ; DOS_Mem_Display("After second allocation\n") ; /* Free second list */ /* ---------------- */ while (list != NULL) { p = list ; list = list->mb_next ; free (p) ; } DOS_Mem_Display("After second free\n") ; } /*--------------------------------------------------------------*/ DOS_Menu_Display psp_seg = *(p+1) + ((*(p+2)) << 8) ; blk_paras = *(p+3) + ((*(p+4)) << 8) ; size = ((long) blk_paras) << 4 ; if (psp_seg == 0) { prg = (unsigned char far *) "(free)" ; total += size ; } else { ip = (unsigned int far *) MAKE_FARPTR(psp_seg, 0x2c) ; prg = MAKE_FARPTR(*ip, 0) ; while (*prg != '\0') { prg += strlen((char *) prg) + 1 ; } prg += 3 ; } sprintf(str, "%5d %91d %p", idx++, size, p) ; printf("%s\t%s\n", str, prg) ; if (*p == 'z'} { break ; } p = MAKE_FARPTR(FARPTR_SEG(p) + blk_paras + 1, 0) ; } sprintf(str, "Total Free: %ld", total) ; printf("%s\n\n", str) ; } /*--------------------------------------------------------------------*/ Standard C Quiet Changes, Part I P.J. Plauger P.J. Plauger has been a prolific programmer, textbook author, and software entrepreneur. He is secretary of the ANSI C standards committee, X3J11, and convenor of the ISO C standard committee. A language standards committee can commit a variety of sins. It can eliminate existing features, so that existing programs that use them generate diagnostics with new translators. It can add lots of new features, so that existing programs trip over them and generate diagnostics. It can even redefine existing features, so that existing programs apparently misuse them and generate diagnostics. All of these are nasty things to do. A committee that indulges in such sins had better be prepared to justify its actions. Discarded features must be arguably dangerous, or at least not worth the clutter they cause by remaining in the language. Added features must fill a real need and not add to the clutter. Changed features require the most justification of all, since they cause the greatest disturbance. So long as changes cause diagnostics, however, you can live with them. Even if you have to convert half a million lines of existing C code, you know how to proceed. Stuff your code through the new translator and see where it gripes. For very common gripes, you can often contrive a global edit that will mechanically fix up the code. For the rest, you at least have your attention forcibly directed to the areas where you must manually intervene. The worst sin of all for a language standards committee is to make a change that does not cause a diagnostic. You have a working program with your existing C translator. You upgrade to a standard C compiler and your program quietly recompiles. The only problem is, it behaves differently. That is a project manager's worst nightmare. Even if you generally like the new behavior, you have a serious problem on your hands. That half a million lines of existing C code may change its behavior in only a handful of places.You cannot rashly assume that the new behavior is acceptable every place. (Probably it is not.) You need to locate every place and check the implications of the change. Committee X3J11 dubbed such alterations "quiet changes." We blanched every time we faced the prospect of introducing one. We did our best to avoid them. Nevertheless, we occasionally found compelling reasons to adopt quiet changes along with various other subtle but noisy changes. So we made sure that we documented every quiet change we made in the Rationale that accompanies the Standard. I discussed the most ambitious of these quiet changes last year. (See "Standard C Promotes Types According to Value Preserving Rules," CUJ August '88.) The rules for mixing signed and unsigned integer operands in an expression were, in the past, both subtle and varied. The Committee discussed the different approaches at length before choosing a particular set of "promotion" rules. I did my best to present all the arguments and to justify the choice we eventually made. This column and the next endeavor to summarize all of the quiet changes made in Standard C. They may not affect you because there have been numerous dialects of C in past practice. (That's a principal reason for making a language standard, to eliminate dialects.) We labeled something a quiet change if any significant dialect of C quietly changed meaning. The change may not affect your favorite dialect. Nevertheless, you should be aware of any possibility of a quiet change in C code. Who knows, you may already have a lurking problem in code moved from a different implementation of C. In the explanations that follow, I have copied the description of each quiet change almost verbatim from the Rationale for Standard C. They appear in the same order as in the Rationale, which reflects the order of topics presented in the Standard. The Quiet Changes "Programs with character sequences such as ??! in string constants, character constants, or header names will now produce different results." For example, printf ("You said what??!\n"); quietly becomes printf ("You said what\n"); This is the result of introducing trigraphs. The committee felt a compelling need to provide a way to represent certain characters unavailable in EBCDIC or the invariant subset of ISO 646. (The characters are [\]^{/}~#.) The alternate forms had to be representable using just the common subset of characters. They also had to be usable within character constants, string literals, and header names. Since existing programs can conceivably contain an arbitrary sequence of characters in these places, we had no way to satisfy these basic requirements without introducing the possibility of a quiet change. We settled on trigraphs, or three-character sequences, as a compromise. Digraphs might be easier to type, but were more likely to change the meaning of older programs. (C uses all of the characters in the subset, so even code outside quotes and headers is endangered.) Each trigraph begins with two question marks, to minimize the chance of a quiet change. It ends with a character from the subset that is designed more or less to suggest the replacement character. Nobody pretends that ??< is a highly readable alternative to {. But then nobody prevents you from filtering your C code before you send it to a printer. (You might, for example, overstrike a left parenthesis and a minus sign to print a left brace instead of printing the actual trigraph.) Trigraphs serve the limited purpose of providing a minimal interchange standard for shipping C between countries. (Even the Danes, who are adamant that trigraphs are insufficient, have offered no alternative to their use within quotes and header names.) "A program that depends upon internal identifiers matching only in the first (say) eight characters may change to one with distinct objects for each variant spelling of the identifier." For example, int get_stuff_DEF; f() { extern int get_stuff_REF; return (get_stuff_REF); } A clever programmer may expect that all the names beginning with get_stuff refer to the same data object. That is no longer true. There was widespread support for longer names in C. The eight-character significance limit inherited from Ritchie's original implementation is certainly inadequate. Worse, implementations differed on the treatment of "insignificant" characters in a name. (Is an implementation obliged to ignore the extra characters when comparing names? Or is it merely permitted to ignore them?) Further confusing the issue was the distinct, and more severe, limit on external names imposed by old-fashioned linkers. The committee decided on a three-tiered limitation on names. First, any name can be as long as a logical line. An implementation can choose to inspect all characters when comparing names. Second, an implementation must inspect at least the first 31 characters. It can choose to look at no more than 31 characters. Finally, an implementation may require that external names differ in the first six characters, and ignore case distinctions. These rules were adopted despite a few notorious cases cited of existing programs that would quietly change. It seems that some implementations ignore characters after the first eight. Some programmers have made a practice of intentionally punning by writing distinct names that are intended to compare equal. I don't recall the rationale for this practice and I don't care. The practice is sufficiently barbaric that it garners little sympathy, even if it can be the victim of a quiet change. "A program relying on file scope rules may be valid under block scope rules but behave differently -- for instance, if d_struct were defined as type float rather than struct data in the following example:" typedef struct data d_struct { /* ... */ }; first() { extern d_struct func(); /* ... */ } second() { d_struct n = func(): } (This example from the Rationale is not wonderful. I even had to fix a small bug in reproducing it here.) At issue here is the clash between C as a block scoped language and C as a "flat" language with separately compiled modules. The former requires that names be forgotten at the end of the scope in which they are defined. The latter requires that external names be remembered and matched up across separate compilations. Past implementations differ widely on the treatment of extern declarations within function bodies. Do such declarations percolate out, a block at a time, to file level so they can be matched up with any other file-level declarations for the same name? Or does each such declaration form a worm-hole out to the linker, with the worm-hole forgotten at the end of the block? Or does something even more bizarre occur? The example above can give different results with different interpretations. In the first case, the declaration of func percolates out from the first function. It is then visible within the second function, so the assignment makes sense. In the second case, the declaration of func goes out of scope at the end of the first function. The second function must assume that func is implicitly declared as an external function returning int. In this case, you get a diagnostic. But change the type definition to float, as the Rationale suggests, and you get a quiet (but erroneous) conversion across the assignment. Like the previous issue on identifier lengths, here is a case where a quiet change is essentially unavoidable. Existing dialects differ too much for the standard to contain a common subset of behavior. What the committee chose, in fact, was the second behavior. C is a block structured language with holes blown in it. A translator can diagnose conflicting external declarations within a translation unit. It can also elect not to do so, since this is a case of "undefined behavior." A linker can diagnose conflicts between separate compilations. It can also elect not to do so. In practice, most compilers and few linkers will choose to diagnose such conflicts. "Unsuffixed integer constants may have different types. In K&R, unsuffixed decimal constants greater than INT_MAX, and unsuffixed octal or hexadecimal constants greater than UINT_MAX, are of type long." For example, on an implementation where type int occupies 16 bits, f(32768); /* argument now 16- bits */ i = OxFFFFF / -10; /* divide now unsigned */ This is part of the fallout of choosing value-preserving rules for promoting types in expressions (discussed later). The committee felt obliged to tidy up the typing rules for integer constants, to maintain a consistent philosophy toward preserving the expected value of a sub-expression. Ritchie's original rules required that 32768 have type long on an implementation where type int occupies 16 bits. That led to occasional surprises, particularly when writing arguments on function calls. (There were no function prototypes in those days to fix up or diagnose improper argument types.) With value-preserving promotion rules, however, you get the expected result more often by making 32768 type unsigned int. And such a choice is more consistent with the basic philosophy of choosing the "cheapest" type that preserves the value of an expression. Similarly, octal and hexadecimal integer constants are expected to be unsigned. It is silly for one to lose its unsignedness just because its value is too large to be represented as type int. Consistency requires that 0x10000 (on an implementation where type int occupies 16 bits) have type unsigned long instead of long. In both cases, you can contrive programs that quietly change meaning with the change of typing rules for integer constants. The committee felt, however, that such programs were already at risk in being moved among existing dialects, which supported a variety of promotion rules. "A constant of the form '\078' is valid, but now has different meaning. It now denotes a character constant whose value is the (implementation-defined) combination of the values of the two characters '\07' and '8'. In some implementations the old meaning is the character whose code is 078 == 64." This is a consequence of now disallowing the digits 8 and 9 in octal escape sequences. Even the earliest C compilers have tolerated the practice, and more than a few programs have taken advantage of this tolerance. Nevertheless, the committee felt it was sufficiently barbarous that it had to be dropped. (The committee did not revoke the even more barbarous license to write 111l in place of 111L.) "A constant of the form '\a' or '\x' now may have different meaning. The old meaning, if any, was implementation defined." For example, char letter = 'a'; if (letter == '\a') /* no longer same as 'a' */ The backslash is no longer ignored in front of an arbitrary letter. Worse, Standard C now gives special meaning to \a. The committee felt obliged to add to the list of character escape sequences. The sequence \a stands for the "alert" character. In ASCII, it is the BEL code that rings the bell on old Teletype terminals and makes some sort of electronic beep on modern ones. The sequence \x signals the start of a hexadecimal escape sequence of arbitrary length. Neither of these escape sequences was officially defined in the past. There was the general promise that a backslash before a character with no magic meaning simply stood for that character. (I had, in fact, written a number of strings that used \x as a place holder to be filled in. That was my tough luck.) Nevertheless, the addition could cause a quiet change. "A string of the form "\078" is valid, but now has different meaning." See above for the same issue with character constants. The only difference is that the string literal gets longer. Character constants pack all the character codes into a single int value, in an implementation-defined manner. "A string of the form "\a" or "\x" now has different meaning." See above for the same issue with character constants. "It is neither required nor forbidden that identical string literals be represented by a single copy of the string in memory; a program depending upon either scheme may behave differently." For example, char *s = "abc"; ..... if (s != &"abc"[0]) printf("s has changed\n"); The printed message is correct only if both instances of "abc" become the same data object. This is not guaranteed in Standard C. Here is another case where existing dialects of C were in conflict. Some dialects guarantee that identical string literals are represented by a single copy within a translation unit. Others guarantee that each string literal occupies distinct storage. The committee chose to leave the choice up to the implementation. It is "unspecified," so the implementation need not document the choice or even be consistent in how it chooses. (Another example of unspecified behavior is the order in which a program evaluates multiple arguments on a function call.) Naturally, any program that depends on some particular behavior is likely to be disappointed by some conforming implementation. "Expressions of the form x=-3 change meaning with the loss of the old-style assignment operators." For example, i =-3; /* now stores -3 */ It has been many years since UNIX C reversed the assigning operators. Where now you write -= you once wrote =- as in the example above. Programmers who are stingy or haphazard with spacing around operators got burned often enough that Ritchie switched C to match the Algol 68 convention. Nevertheless, a number of commercial C compilers retained the old forms for backward compatibility with early C code. Disallowing the old forms can, of course, lead to all sorts of nasty puns. Those who didn't bite the bullet back in the seventies must do so now. Intermission That's about half of the quiet changes documented in the Rationale for the C standard. Tune in next month for the rest of the story. Doctor C's Pointers (R) Header Design And Management Rex Jaeschke Rex Jaeschke is an independent computer consultant, author and seminar leader. He participates in both ANSI and ISO C Standards meetings and is the editor of The Journal of C Language Translation, a quarterly publication aimed at implementers of C language translation tools. Readers are encouraged to submit column topics and suggestions to Rex at 2051 Swans Neck Way, Reston, VA, 22091 or via UUCP at uunet!aussie!rex. All too often, programs just "happen." There is little if any serious design done, and programmers "design on the fly", using an approach I call stepwise refinement. That is, you code a bit and test it then iteratively refine it till it's somewhere close to what you think you want. And after you have hard-coded the same macro definitions and function declarations in ten different places you think perhaps it would be a good idea to create a header instead. However, this either doesn't get done or it's done at the local level to solve just the particular problem in the code you are currently working on. For the most part, I find people program defensively. Designing and managing headers is an integral part of a C project design. It must be done before any code is written to ensure that the design is consistent, can be managed easily, and that a high degree of quality assurance can result. The lack of properly designed headers is a likely recipe for added development, debugging, and maintenance time, as well as significantly reduced reliability. There are many aspects to designing headers. In this article I will look at those I've recognized. However, before I begin, a definition of the term header is in order. I think you all know what a header is but for the purposes of this discussion, I will consider a header to be a collection of declarations that can be shared across multiple source files via the #include preprocessing directive. And while a header is typically represented as a source code file on disk, it need not exist as such. For example, a header might actually be built into the compiler (at least the standard ones like math.h could be) or it could be compiled into some binary form that the preprocessor can more easily or efficiently handle. The specific representation details are left to the implementer's choice and will not be further discussed here. As such, I prefer to use the term header rather than header file or include file since the last two names imply a file representation. Whatever term you use, be consistent. Header Categories There are four categories into which headers can be classified: standard, system, library, and application. A standard header is one of the 15 defined by ANSI C, such as stdio.h, math.h, and string.h. ANSI requires you to include standard headers using the notation #include . Do so even if #include "header.h" appears to work for them. A standard header is stored in some special place such that it can be accessed from all places in which a source file can be compiled. A system header is one supplied by the compiler vendor that can be used to interface to and/or exploit the host hardware and/or operating system. Examples on MS-DOS systems include bios.h and dos.h; on VAX/VMS, headers rms.h, rab.h, and fab.h are used to access the RMS file system; and on UNIX, the special set sys\*.h is provided. An implementer can provide as many system headers as he needs. VAX C, for example, comes with about 200. Since system headers are useful to all applications, they are typically stored in the same place as standard headers. A library header is one provided with a third-party library such as a windows, graphics, or statistical package. Again, a product may include many headers and you may use a number of different libraries in the same application. Library headers are also universally shareable and will likely reside with standard and system headers. An application header is one you design for a particular application and as such, it should be located in a place separate from headers in the other three categories. It is possible, however, that over the course of designing an application, you build a header that is useful beyond the life of the current system. This header then, should really be treated as a miscellaneous library header. If each programmer on the project develops his own private miscellaneous headers naming conflicts can easily arise, so you must ensure that private headers are not used. During testing stages of a project, it can be very tempting to provide a quick (and often dirty) fix to a given problem by simply changing a header and recompiling the offending source module. However, this can cause other nasty side-effects later on when the system as a whole is rebuilt. Also, you must never, never, ever even think of changing a standard, system, or library header; these are sacred. For example, you might discover you need macros called TRUE and FALSE in several modules and since stdio.h is included in all of them, why not simply add definitions for these macros to that header? Afterall, it can't hurt any existing uses of these headers, can it? Apart from reflecting bad style when you next (re)install the compiler, these changes are lost. One solution to this is to make all headers, including application headers that have been moved to production, read-only. That way, if you should ever try to change or overwrite them you are reminded of the seriousness of such an action. Header Names ANSI C requires the standard header names to be written in lower case. Do so even if your file system is case insensitive (as is the case with MS-DOS and VAX/VMS.) In fact, ANSI does not require that filenames of the form header.h be supported by your file system. The compiler must accept #include , but is allowed to map the period or any other part of that header name to other characters. The convention of naming headers with a .h suffix is exactly that, a convention and need not be followed by user-written headers. Certainly, it's a useful default convention if you have no good reason to do otherwise. If you wish to port code, keep in mind that the length of significance, case distinction, and format of filename (assuming a header is a file), are all implementation-defined. It is generally considered bad style to specify device and or directory information in a header name. Considering that almost all compilers provide compile-time options and/or environment variables to specify an include search path, I see no reason to unduly reduce your flexibility options. Header Contents Just what should go in a header and how big should headers be? It is relatively easy to answer the "what." If something cannot be shared, it does not belong in a header. For the record, candidates for inclusion in a header are: macros, typedefs, templates for structures, unions, and enumerations, and function prototypes, extern data declarations, and preprocessing directives. Placing anything else in a header needs careful scrutiny. In particular, including executable code that is not inside a macro definition is very bad style. My rule of thumb is to put all related stuff together in one header. However, if that makes for a very large header and the contents can easily be broken into logical subsets, then I prefer each subset be in its own header. It's useful to give such headers names with the same prefix so you can easily determine they are related. The only difference here is whether the preprocessor has to process one big header instead of just those parts it needs. Don't get too hung up on worrying how much work the preprocessor has to do unnecessarily since that's what CPU cycles are for. In fact, in the extreme case where you put each declaration in its own header, the preprocessor won't need to do any extra work, except for opening and closing all those headers. It's quite likely that, while most things will fit neatly into related groups each in a header, some miscellaneous bits will be left over. About the only way to handle these reasonably is a miscellaneous header. ANSI C has one of these, called stddef.h. Whatever organization you chose, everything that can be shared should be shared. That is, you should make sure that all macros, function prototypes, etc., are part of some header and not hard-coded in source files directly. Each header should be self-contained. If one header refers to something in another header, the first should directly include the second. Forcing the programmer to know and remember the order in which related headers need be included is burdensome and unnecessary. Protecting Header Contents It is very likely that in some source modules you will include the same header multiple times, once directly and one or more times indirectly via other headers. Since everything in a header is supposed to be shareable, there should be no problem in processing the same header multiple times except the extra work of preprocessing. Right? Well, that's not quite true. Specifically, if the same typedef or structure, union, or enumeration template definition is seen more than once, the compiler produces an error so they must be somehow protected. The best way to achieve this is to place a conditional compilation protective wrapper around the whole header as follows: /* header local.h */ #ifndef LOCAL_H #define LOCAL_H ... #endif I prefer to use a macro spelled in upper case the same as the header, along with a suffix of _H. This naming convention is easy to understand and is very unlikely to be used for other macros elsewhere in the set of headers. Using something like LOCAL could easily be used as a different macro elsewhere, leading to confusion. Since the standard headers can also be included multiple times and some of them contain typedefs and structure templates, these too must be protected. Check those provided with your compiler to see if they indeed are protected. The only difference between your wrapper and that used by the standard headers is that you must not begin your private macro name with an underscore while they must, since that's the implementer's namespace. It is preferable to have each thing defined in one, and only one, header. However, for various reasons it may be desirable to duplicate something in multiple headers. The problem here is to make sure that all of those headers containing duplicates can be included at the same time. For example, consider the case of having a typedef for count in two headers as in Listing 1. You should also check your standard headers for this kind of protection since size_t, the type of the sizeof operator, is required to be typedefed in five of them. Note that ANSI C places strict rules on whether a standard header can include another standard header. For example, most identifiers defined in a standard header are only "reserved" if their parent header is included. For example, if you don't include one of the six standard headers that define NULL, you are perfectly safe in defining your own identifier NULL even though it would be bad style. So, if assert includes stdio.h, all the names in stdio.h would become defined as well, even though they are not defined in assert.h. And while assert.h could contain #undefs to remove these, there is no way for it to remove any typedefs or template definitions. Many mainstream compilers claiming ANSI conformance or claiming to be tracking the ANSI standard break this rule. As such, they are not ANSI-conforming. Check your standard headers for this. Conditional Inclusion There are a number of ways to conditionally include headers as necessary. Perhaps the best is to conditionally compile a subset of #include directives inside a header, based on the existence or value of a macro defined using a compiler option. That is, the compilation path is specified outside all source modules. This way, you can trigger any possible conditional compilation path from as few as one macro. You also have the ANSI invention of #include macro where macro must expand to a header name of the form <...> or "...". You also can use the stringize and token pasting preprocessor operators # and ## respectively, to construct a macro that is to expand to a header name. I have also found that it is a good idea to remove as many preprocessing directives as possible from source modules into headers. In particular, I find conditional compilation directives in source code to be most distracting, especially when there are more that two compilation paths. The aim is to isolate such dependencies into headers so you can forget about them and get on with the business of implementing or maintaining the application. An example of this strategy follows: #if TARGET == 1 fp = fopen("DBAO:[direct]master.date", "r"); #else fp = fopen("A:\direct\master.date", "r"); #endif This can be implemented in a much clearer way by abstracting the filename into a header as in Listing 2. Planning For Debugging And Maintenance People who don't design programs are unlikely to plan for debugging and maintenance. They probably don't even write a shopping list for that matter. Unfortunately, there are lots of these people programming, many of them in C. It is very naive and probably irresponsible to believe that with a non-trivial program, debugging will be a mere formality and that you will always be around to maintain the code. Over the years I have found it a useful idea to include a header called something like debug.h into every source file I write when working on a non-trivial project. If the header is empty, that's fine. However, it makes it very easy to add or change that header's contents and recompile all or part of the system for testing. Since you have one header included everywhere, it is trivially easy to make powerful changes and to experiment. And the cost of having this flexibility is practically nothing, if you cater for it at the beginning. Concatenating Headers There are always people who try to stretch a language's capabilities to the extreme. For example, they place part of a source file in one header and the rest in another and include them both to form a valid source module. Cute, but very bad style. Let's look at just what can and cannot be split across multiple source modules, and therefore across multiple headers. A source module must contain complete tokens. That is, a source token cannot be split across two files. Specifically, the notation of backslash/new-line continuation cannot be used in the last line of a source file. Likewise, a comment cannot span two files. With string literal concatenation now supported by ANSI, you could have a string in one file concatenated with a string in another, but that would require the strings to be outside a macro definition and I have already said that's very bad style. You could also split a structure template definition across multiple files, but I see no benefit. One thing not immediately obvious in ANSI C is that each matching set of #if/endif and corresponding #elif and #else directives must be contained within the same source file. That is, the #if and matching #endif directives must be in the same source file. Conclusion I have addressed many issues here most of which have arisen from my own experiences. I am sure there are others that could be added. For the most part, I find header design to be simply a matter of common sense once you know and understand the tools the language and preprocessor provide. But then again, I find that to be pretty much the solution to a vast number of problems. It's sad that common sense is not all that common. Listing 1 /* h1.h */ #ifndef H1_H #define H1_H ... #ifndef COUNT_T #define COUNT_T typedef unsigned int count; #endif ... #endif /* h2.h */ #ifndef H2_H #define H2_H ... #ifndef COUNT_T #define COUNT_T typedef unsigned int count; #endif ... #endif #include "h1.h" /* count defined */ #include "h2.h" /* count not redefined */ Listing 2 /* files.h */ #if TARGET == 1 #define MASTER_FILE "DBAO:[direct]master.date" #else #define MASTER_FILE "A:\direct\master.date" #endif /* source.c */ #include "files.h" ... fp = fopen(MASTER_FILE, "r"); On The Networks Games And Tongues Sydney S. Weinstein Sydney S. Weinstein, CDP, CCP is a consultant, columnist, author and President of Datacomp Systems, Inc., a consulting and contract programming firm specializing in databases, data presentation and windowing, transaction processing, networking, testing and test suites and device management for UNIX and MSDOS. He can be contacted care of Datacomp Systems, Inc., 3837 Byron Road, Huntingdon Valley, PA 19006-2320 or via electronic mail on the Internet/Usenet mailbox syd@DSI.COM (dsinc!syd for those that cannot do Internet addressing). RPN Fans - Here's One For You Before I took over David Fiedler's column, he mentioned in his last installment the ultimate on-screen calculator for UNIX systems. Here now is a simpler one, usable on any system that has a curses package or emulation library. It emulates the HP-16C and can popup on both UNIX and MS-DOS. Support for floating point, hexadecimal, decimal, octal and binary modes is provided. The calculator, written by Emmet Gray of the US Army, has ten registers and supports computer-oriented functions. It was posted to comp.sources.misc and is available from the archive sites that support that group, including uunet. New Games New versions of several games were distributed recently in comp.sources.games. These include version 4 of Conquer, a middle earth multi-player game for UNIX systems. Source to the game itself, as well as the patches, is available from the archive sites for comp.sources.games, including uunet. Conquer v4 patches are volume 8, issues 1 - 4. Nethack has also had a major update in comp.sources.games volume 8, issues 6-12. New screens and enhancements were added to this display-oriented dungeons and dragons game. Galactic Bloodshed, an empire-like war game has also been upgraded this month in comp.sources.games, volume 8, issues 26 - 30. This upgrade gives several new versions to keep those UNIX systems busy. A new game has also appeared, a two-handed card game similar to Bridge and Spades (especially two-handed Spades). It's a trick-taking game with a trump suit determined by bidding. Cards are drawn from the deck, each player taking a turn drawing one card from the top of the deck. If you desire to keep that card, it becomes part of your hand and the next card is discarded without being seen, otherwise you discard it and take the next card. This yields two thirteen-card hands. Bidding is based on the number of tricks you think you can take, with the last winner naming the trump. Lastly, the hand is played out. Scoring is simple; if the bid is made, you score ten times the bid plus the number of overtricks. If you go down and don't make the bid, you score negative ten times the bid. The winner is the first player to 250 points. The author, Scott Turner from UCLA, has asked for help in improving the bidding process. He has provided a program with a very interesting set of bidding options coded as rule based, neural networks, and then a cheating bidder that reads both hands. However, he is not happy with the outcome and is asking for help. The program gives ample statistics for tuning a bidding algorithm and those of you up to a challenge just might want to take him up on his offer for help. Back To Work Several serious works also appeared recently on the networks. For those diehard fans of vi type editors comp.sources.misc recently distributed "stevie" (ST Editor for VI Enthusiasts), a public domain clone of UNIX's vi editor. This version was developed for the Atari ST, but has since been ported to UNIX, OS/2, DOS and Minix-ST. Unsupported ports also included in the release include Minix-PC, Amiga, and some Data General systems. Thus, stevie appears to be extremely portable. Makefiles are included for all the systems. Stevie's main drawback, for some environments, is that it keeps the file being edited in memory, limiting the size of the file to be edited for systems with smaller addressing spaces or without virtual memory. It was originally written by Tim Thompson, but this latest version was posted by Tony Andrews at onecom!wldrdg!tony. He also will mail diskettes to those who send him a formatted disk along with a self-addressed, stamped disk mailer for returning the disk. He can write Atari ST (SS or DS) or MS-DOS (360K or 1.2M) formats. His address is Tony Andrews, 5902E Gunbarrel Avenue, Boulder, CO 80301. Now that Berkeley has released much of its BSD 4.3-tahoe release to the public, sections of it are being ported to UNIX System V and Xenix. Comsat, the BSD mail notification daemon, was recently posted to comp.sources.misc. Comsat sends messages to users when mail is delivered for them. It uses a daemon approach, and thus does not need to wait for the current command to complete or the user to type a carriage return to the shell. Also included in this port are changes to smail v2.5 necessary for it to notify comsat when mail is delivered. Users control whether or not they get notification using the biff command, which is also included. Since UNIX System V usually doesn't support the Berkeley socket interface, this port uses named pipes, so the notification is limited to the local machine. Those with the socket interface can use the BSD version of the program. Thanks to David MacKenzie for his porting effort. Foreign Tongues? In volume 8, issues 65-87 comp.sources.misc has distributed a major effort that will strike people as either a godsend or totally useless. If you need to print foreign languages with their extended character set support, the "cz text to PostScript system" is for you. It is a table-driven system that can be used to convert any "context-free octet-based character set into PostScript." This means that every character in the character set is represented by one or more eight-bit bytes and that only the bytes of that character determine what it prints, not other bytes in the file. This excludes locking shift sequences. Even if you don't need the foreign language support, the posting had an addendum called libhoward that includes several C functions to convert numeric literals to internal representations and perform string manipulation all with error recovery. It's all documented and worth looking at, even just to see how he did it, courtesy of Howard Gayle of Ericsson Telecom AB in Sweden (howard@dahlbeck.ericsson.se). Yea! Its Back, Maybe? After a long absence from USENET with no postings, comp.sources.unix distributed the first program of Volume 20. It is a contribution from Barry Books at IBM releasing into the public domain an include file tester. This tester checks include files for POSIX 1003.1 and ANSI compliance. It reports missing items, additional items allowed by the standard, and additional items not allowed by the standard. References to the standards documents are also included in the report. This could prove to be a really useful tool for portability. Unfortunately, after this promising posting, comp.sources.unix has been quiet again. Hopefully, Rich Salz, the moderator, will find time to resume the postings shortly. Upcoming Releases Perl, Larry Wall's Practical Extraction and Report Language, is going though its beta period on a new version via alt.sources. Version 3 has lots of new features, and next time I will give an in-depth review of this new release from one of the net's most respected authors of "Off the Wall Software". Less, a more replacement (a display pager) is also in beta test with its newest release. alt.sources is wonderful for hints of what is to come. Many authors are using it for beta test distributions. Another major package is also in its latest beta round; the Extended Portable Bitmap Toolkit appeared recently in alt.sources. This set of tools is used to convert images from one bitmap format to another. It supports many formats and again, next time, a more detailed report. If you have a pending release you would like covered in this column, drop me a line. My electronic address is syd@DSI.COM and I look forward to hearing from you. Questions & Answers malloc, Porting, And Stack Overflow Ken Pugh Kenneth Pugh, a principal in Pugh-Killeen Associates, teaches C language courses for corporations. He is the author of C Language for Programmers and All On C, and is a member on the ANSI C committee. He also does custom C programming for communications, graphics, and image databases. His address is 4201 University Dr., Suite 102, Durham, NC 27707. You can fax me your questions at (919) 493-4390. While you hear the answering message, hit the * button on your telephone. Or you can send me e-mail kpugh@dukeac.ac.duke.edu (Internet) or dukeac!kpugh (UUCP). Q I was having problems using malloc on a UNIX machine. After allocating some memory with malloc(), I wrote past the end of the allocated memory. The next time I called malloc(), it hung up. I ran the same program on an IBM-PC and it worked fine. What gives? Jim Campbell Durham, NC A Writing beyond (or before) the memory space that is allocated with malloc and related functions can cause some serious problems. These functions allocate a block of memory from the heap (memory space not used for code, data, and stack). They return the address of the memory block. The memory remains allocated until you call free(), passing it the address of the block. This deallocates the block and returns it to the heap. When the program exits, it will free up any allocations you have for which you have not called free(). These functions look like: #include void *malloc(size_requested) size_t size_requested; /* Number of bytes */ void free(pointer) void *pointer; /* Address of memory to free */ You request an amount of memory in bytes. The function returns to you an address which points to the first byte of the allocated memory. You can use this memory for any purpose. However, you should not write in the memory preceding or following the allocated block. The operating system and/or the compiler usually use a few bytes of memory adjacent to the allocated block. These bytes, sometimes called the "block header", may come before or after the block. The header keeps such information as the size of the block allocated, and usually some pointers, including one to the next block (i.e., a linked list). If the information in this block header is destroyed, the system cannot allocate a new block or deallocate an old block. Basically, the block looks something like the diagram in Figure 1. Let's assume that the information is kept after the block, as it appears in the case of your UNIX machine. You probably did something like: char *pc; pc = malloc(100); ... *(pc + 100) = 0; ... pc1 = malloc(200); and overwrote the first byte in the block header. When you attempted the next allocation, malloc() hung up as you destroyed the block header for the previous block. On a PC, the block header typically appears before the allocated memory. In that case, your program ran okay, as you were simply writing into unallocated memory, which contains no information. Depending on the order in which you perform allocations and illegal accesses, you could still have problems. For example, let's assume that you performed both allocations first, and then an illegal access: char *pc; char *pc1; pc = malloc(100); pc1 = malloc(200); *(pc + 100) = 0; Assuming that you do not attempt to allocate blocks later on in the program, this will execute as if no error occurred until the program attempts to exit. When the operating system tries to free the allocated memory, it will become confused due to the erroneous block header information. You will get a dreaded "Memory allocation error -- system halted" message. With some compilers, malloc() does not call the operating system routine if the request can be satisfied from its own unallocated buffer. In this case, you may not see this allocation error, since the exit operations will simply free all the buffer at once and not the individual pieces. Q I am using an array of pointers; each pointer points to a structure; and each structure contains several strings of various lengths. My array of pointers is declared something like this: struct { char firstname[MAX_FIRSTNAME+1]; char lastname[MAX_LASTNAME+]; char homephone[MAX_HOMEPHONE+1]; char workphone[MAX_WORKPHONE+1]; char areacode[MAX_AREACODE+1]; char street [MAX_STREET+1]; char city[MAX_CITY+1]; char state[MAX_STATE+1]; char comments[MAX_COMMENTS+1]; } *record[MAX_RECORDS]; It follows that I could display each element of the structure that represents the current record as follows: show_record () { printf("%s\n",record[current-record]->firstname); printf("%s\n",record[current-record]->lastname); printf("%s\n",record[current-record]->homephone); printf("%s\n",record[current-record]->workphone); printf("%s\n",record[current-record]->areacode); printf("%s\n",record[current-record]->street); printf("%s\n",record[current-record]->city); printf("%s\n",record[current-record]->state); printf("%s\n",record[current-record]->comments); } However, it seems that much of the code is unnecessarily duplicated. It would be more efficient if I could create a loop and access a different element of the structure each time through the loop. My show_record() function would then look something like this: show_record() { int i: for(i = 0; i < NUM_OF_FIELDS; i++) { printf("%s\n",record[current_record]->??? ); } } Where ??? is the part I can't figure out. I could think of ways to do it in assembly language by providing additional data types and accessing them in the loop. Since the elements of a structure are usually word aligned, it's hard to even be sure how many bytes are between each element of the structure. Again, any information you could provide would be greatly appreciated. Jonathan Wood Irvine, CA A Accessing individual members of a structure in a loop is a commonly needed operation. There are several ways that you can do this. Let me change your structure template slightly and add a tag-type. I normally avoid declaring variables when declaring a structure template, eliminating the need to declare those variables when you use the template in another program. A clean structure template is a handy thing to have around because it makes declaring variables of the same structure a breeze. struct s_record { char firstname[MAX_FIRSTNAME + 1]; ... }; struct s_record *record[MAX_RECORDS]; You could use a static variable, which will have constant addresses and set up an array of pointers to those addresses. show_record() might then look like: static struct s_record print_record; #define NUMBER_FIELDS 9 char *record_field_address[NUMBER_FIELDS] = { &print_record.firstname, &print_record.lastname, ... /* Remainder of the fields */ }; show_record() { int i; /* Copy in the record to be printed */ print_record = *record[current_record]; for (i=0; i < NUMBER_FIELDS; i++) { printf("%s\n", record_field_address[i]); } } One feature in the new ANSI standard, the offsetof() macro, can help you out here. Its syntax is: #include offsetof( type, member-name) The type is a structure type and the member-name is a member in the structure. Instead of keeping the address of individual members in an array, you simply keep the offsets from the start of a structure. For example, #define NUMBER_FIELDS 9 size_t record_offsets[NUMBER_FIELDS] = { offsetof(struct s_record, firstname), offsetof(struct s_record, lastname), ... /* Remainder of the fields */ }; Now show_record could look something like: show_record() { int i; char *pc; pc = (char *) &record[current_record]; for (i=0; i < NUMBER_FIELDS; i++) { printf("%s\n", pc + record_offsets[i]); } } Note that the conversion of the address to a char pointer is necessary. If you simply printed out &record[current_record] + record_offsets[i], you would get the address of something which is record_offsets[i] * sizeof(struct s_record) after the beginning of record. I would suggest that you change the calling sequence of show_record so that it expects a record (or an address of a record). This way, you can print out records that are not part of the array (such as a record that might be used for input purposes). show_record(record) /* Prints out a record */ struct s_record record; { int i; char *pc; pc = (char *) record; for (i=0; i < NUMBER_FIELDS; i++) { printf("%s\n", pc + record_offsets[i]); } } or show_record(precord) /* Prints out a record, whose address is passed */ struct s_record *record; { int i; char *pc; pc = (char *) precord; for (i=0; i < NUMBER_FIELDS; i++) { printf("%s\n", pc + record_offsets[i]); } } You might want to be even more organized and create another structure that contains not only the offsets, but also the names of the members, so that you can use the same names everywhere you print the record. struct s_field { char name[MAX_FIELD_NAME + 1]; size_t offset; } #define NUMBER_FIELDS 9 struct s_field fields[NUMBER_FIELDS] = {"First name", offsetof(struct s_record, firstname)}, {"Last name", offsetof(struct s_record, lastname)}, ... /* Remainder of the fields */ }; With this you might have a function like: show_record_with_field_names(precord) /* Prints out a record, whose address is passed */ struct s_record *record; { int i; char *pc; pc = (char *) precord; for (i=0; i NUMBER_FIELDS; i++) { printf("%-20.20s: %s\n", fields[i].name, pc + fields[i].record_offsets); } } You should note that elements of a structure are not necessarily word aligned. On a PC, they can be byte aligned or word aligned. I prefer packed (i.e., byte alignment) structures, in order to save space, but there is a slight element of speed in using non-packed structures. Note that the sizeof() operator and the offsetof() macro take into account any padding bytes (unused bytes due to alignment). In fact, it is the potential presence of padding bytes that made the ANSI committee eliminate the equality comparison of structures. For example: func() { static struct s_record record_1; struct s_record record_2; if (record_1 == record_2) ... } The padding bytes in record_1 will be set to 0, since it is a static variable. The padding bytes in record_2 will be garbage, since record_2 is an automatic variable. You could use the fields array shown above to create a structure comparison function, if you required it. Q How do you make a binary data file that is portable between the MAC and the IBM PC? Richard Walton Wellesley, MA A Porting data files between any two systems presents a problem in that the representation of the numbers varies from computer to computer. A common way of avoiding this problem is to output the data to an text file using fprintf() and to read the data on the other machine using fscanf(). For example, on one machine you would have: struct_s record { int one_number; double another_number; }; write_record_to_file(data_file, record) FILE *data_file; struct s_record record; { int ret; ret = fprintf(data_file, "%d %lf\n", record.one_number, record.another_number); return ret; } On the other machine, you would use: read_record_from_file(data_file, precord) FILE *data_file; struct s_record *precord; { int ret; ret = fscanf(data_file, "%d %lf", &(precord-> one_number), &(precord->another_number) ); return ret; } If you do not wish to have the overhead of the conversions done by fprintf() and fscanf(), then you will need to write some specific code. For example, suppose on an IBM you have written out the records as: write_record_to_file(data_file, record) FILE *data_file; struct s_record record; { int ret; ret = fwrite(&record, sizeof(struct s_record), 1, data_file); return ret; } On the other machine, you will have to rearrange the bit patterns manually: #define SIZE_BUFFER 8 /* Size of record on other machine */ read_record_to_file(data_file, precord) FILE *data_file; struct s_record *precord; { int ret; char buffer[SIZE_BUFFER]; ret = fread(&buffer, SIZE_BUFFER, 1, data_file); /* Now you need to convert each value individually */ convert_ibm_int_to_mac_int(&buffer[0]ø, & (precord->one_number); convert_ibm_double_to_mac_double((&buffer[2], & (precord->another_number); return ret; } Now each of the individual members must be dealt with separately. The double conversion is a bit of a bear. As they say in the teaching business, it is reserved as an exercise for the student. The integer conversion might look like: convert_ibm_int_to_mac_int(pibm_number,pmac_number) char *pibm_number; char *pmac_number; { /* Reverse the byte order */ *(pmac_number) = *(pibm_number + 1); *(pmac_number + 1) = *(pibm_number); } Note that I have simply shown a return value for each of these file functions. You probably want to be more clever and test the functions so that the return value is consistent among all the functions. For example, the first function might look like: #define BAD_IO 1 #define GOOD_IO 0 write record_to_file (data_file, record) FILE *data file; struct s_record record; { int ret; int io_ret; ret = fprintf(data_file, "%d %lf\n", record.one_number,record.another_number); if (ret < 1) io_ret = BAD_IO; else io_ret = GOOD_IO; return io_ret; } Q I am in the process of implementing hotkey-controlled real-time data acquisition for some laboratory experiments. This is being achieved by taking control of the keyboard interrupt number 0x09. My compiler is Microsoft C v5.1. The experimental apparatus has three distinct modes of operation: A, B, and C, which are to begin upon the striking of their respective keys from the keyboard. Assume that task A, defined by its function, fA(), is currently executing and that the user now strikes the key to commence task B, similarly defined by its function, fB(), so that fA() stops and fB() starts. My question is this: Can you continually interrupt function i and start function j and expect to escape a stack overflow? How does one handle suspending a function at an arbitrary time with no a priori intention of returning to it (which would free the stack space used by the function). I would imagine that you could do this a few times, but what about suspending A and starting B (or C) an arbitrary number of times? Perhaps setjmp() and longjmp() are the solution. Another serious problem that concerns me is that my method does not seem to admit a way to signal end-of-interrupt to the keyboard handler (or to whatever is listening). Because the directives to begin execution of function A, B, or C are embedded in the new 0x09 interrupt handler, the handler could potentially never finish executing during the experiment. Is there a better implementation which can achieve what I need and still use hotkeys? Mark S. Petrovic Stillwater, OK A You are right in your concern over stack overflow. If you keep calling an interrupt function without clearing up the stack (i.e., with an IRET instruction), you will eventually run out of stack space. An interrupt function that might cause overflow could look like the following, where keyboard_input() is a function that gets the actual keystroke. control_function() /* This will only be called if a keyboard interrupt */ { int c; /* Get the key that was hit */ c = keyboard_input(); switch(c) { case 'A': function_a(); break; case 'B': function_b(); break; case 'C': function_c(); break; default: function_default(); break; } /***** This function never returns *****/ } function_A() { /* Code to perform function A */ } function_B() { /* Code to perform function B */ } function_C() { /* Code to perform function C */ } function_default() { /* Code to perform default function */ } Everytime you invoke the interrupt, another set of flags and return addresses are pushed onto the stack. set_jmp/longjmp provide an appropriate mechanism for implementing the sort of structure you desire. These two functions allow you to set a place marker in your code (setjmp) and then jump directly back to it from another routine (longjmp). Without setjmp/longjmp to report an error that occurred several levels deep in a program, you could return an error value at every level as you exit the nested calls. With setjmp/longjmp you can instead simply jump back to a central error handler and give it the error value. The function calls are: #include int setjmp(environment) jmp_buf environment; /* Will hold the place information */ and void longjmp( environment, return_value) jmp_buf environment; /* Place information from setjmp */ int return_value; /* To be returned to setjmp */ setjmp() returns 0 the first time it is invoked. The calling function can test this and ignore any error condition. When longjmp() is called, the next C instruction to be executed is the equivalent of a return from setjmp(). This returns execution to the place marked by the call to setjmp(). One of the parameters to longjmp() is a non-zero value which was setjmp()'s return value. longjmp() cleans up the stack from any nested function calls. The parameter passed to setjmp() is of type jmp_buf. This variable holds information regarding the current position of the stack. You can call setjmp() in many different places and pass it different variables of type jmp_buf. The value passed to longjmp() determines to which of the setjmp() calls it will return. The code below gives an indication of how your problem might be programmed. You would connect this up to the keyboard interrupt. #include #define TRUE 1 #define FALSE 0 control_function() /* This will only be called if a keyboard interrupt */ { int c; /* Character input */ int ret; /* Return value from setjmp() */ jmp_buf environment; /* For the setjmp */ static int init = FALSE; /* First time through flag */ if (init) { /* Stop previous execution */ longjmp(environment, 1); } ret = setjmp(environment); if (ret == 0) { /* This is the return from the initial setup */ init = TRUE; } else { /* This is the return from the longjmp */ ; } /* Get the key that was hit */ c = keyboard_input(); switch (c) { case 'A': function_a(); break; case 'B': function_b(); break; case 'C': function_c(); break; default: function default(); break; } /******* THIS FUNCTION NEVER RETURNS */ } Alternatively, you could avoid using an interrupt by coding each function to periodically check for something on the keyboard stack. This approach does kludge up your lower level functions. However, if the lower level functions have sections of code that should not be interrupted, then this less elegant method may be preferable. Two Microsoft (and some other compiler) functions (not ANSI standard) support this alternate approach. The kbhit() function returns non-zero if there is a key in the buffer. The getch() function returns the character in the buffer, without waiting for a carriage return. #include #define TRUE 1 #define FALSE 0 main() { int c; /* Get the key the first key*/ while (1) { c = getch(); switch (c) { case 'A': function_a(); break; case 'B': function_b(); break; case 'C': function_c(); break; default: function_default(); break; } } /* End while loop */ function_A() { /* Code to perform function A */ /* Inside each loop: */ if (kbhit()) return; } function_B() { /* Code to perform function B */ /* Inside each loop: */ if (kbhit()) return; } function_C() { /* Code to perform function C */ /* Inside each loop: */ if (kbhit()) return; } Figure 1 New Releases A New Year's Wish List Kenji Hino Kenji Hino is a member of The C Users' Group technical staff. He holds a B.S.C.S. from McPherson College and an undergraduate degree in metalurgy from a Japanese university. He is currently working toward an M.S.C.S. at the University of Kansas. New Releases CUG299 -- MEL and BP This volume contains two programs, MEL -- Universal Metalanguage Data Processor submitted by George Crews (TN), and BP -- Back Propagation for neural networks by Ronald Michaels (TN). MEL provides an I/O interface between a program and the user. It can take input data written in "pseudo-English" and translate it into program variables. It can also translate a program's variables into pseudo-English. (See the article on page 33 in this issue.) MEL was originally designed for use with engineering analysis programs. It was written in ANSI C and was developed using Microsoft C v5.1. The disk includes MEL source code, a test example program, sample input and output files, documentation, and the article and listings from this issue. Since MEL provides only a processor engine, you need to define your own input and output data format rule (called a dictionary) for your application program in mel.h. BP is a simple implementation of the back propagation algorithm as an example of a neural network. The implementation is based upon the article in Nature, "Learning representations by back propagating errors" by Rummelhart, Hinton and Williams. BP employs an adaptive algorithm that converges as result of learning. BP was developed on an AT clone with a math coprocessor using Zortech C v1.07. The disk also includes the Hercules graphics version of BP. CUG300 -- MAT_LIB Our first volume in the 300's is a shareware package, MAT_LIB -- Matrix Library submitted by John J. Hughes III (TN). MAT_LIB includes approximately 50 C language functions and macros which input and output tabular data maintained in ASCII text files. While the tabular data is in RAM, it is stored in dynamically-allocated token or floating-point arrays on the heap. Functions are provided to examine an ASCII text file to determine the number of rows, columns, and token size of the tabular data in the file. Other C macros dimension either a floating-point or string token array large enough to hold the ASCII data. Once in memory, floating-point array matrix operations can be performed on the data. Token array data can be converted to and from float or integer values. Floating-point arrays which have been modified by calculation can be merged into token arrays for output or they can be output to a text file directly. The output text files can in turn be used as the input for later application programs found in MAT_LIB text file formats. The disk includes a users manual, test programs, example programs, and small and medium model libraries for Turbo C. The library source can be obtained for $20 from the author (John Hughes III, 928 Brantley Dr., Knoxville, TN 37923). CUG301 -- BGI Applications This volume contains graphics applications that use Borland Graphics Interfaces (BGI) submitted by three authors, Mark A. Johnson (CO), Henry M. Pollock (MA), and John Muczynski (MI). All programs were compiled with Turbo C and use BGI files. The disk includes C source code, executable code and BGI files. Mark A. Johnson has created DCUWCU -- a simple application environment that provides a mouse-driven cursor, stacked pop-up menus, and forms that contain editable fields and a variety of selectable buttons. The sample program DRAW allows you to draw lines, circles, and text on the screen using a mouse. A stacked pop-up menu can be invoked anywhere on the screen (Figure 1). DRAW uses public domain Microsoft mouse routines written by Andrew Markley (CUJ Sept/Oct 1988). An article describing DCUWCU appeared in the Jan '89 issue of CUJ (p. 67). Henry M. Pollock has submitted a demonstration program combining trig functions and graphics functions in Turbo C. By selecting an option from the menu, the program displays circleoids, asteroids, spirals, cycloids (Figure 2), etc. My review of the JJB library in the October 1989 issue prompted John Muczynski to create a graphics pull-down menu system with deeply nested menus. The separate include code allows you to change key assignments and create macros. The new configuration may be saved and restored. He also has submitted an example program, "Conway's game of life," using the pull-down menu. Updates CUG295 -- blkio Library The blkio library released in the November issue has been updated. Version 1.1 includes minor bug fixes and modifications. Retrospective CUG started collecting and maintaining public domain source code (originally just BDS C source code) nine years ago. The library started with just ten standard CP/M 8-inch disks. Currently, the total number of volumes (one volume includes one to three 360K MS-DOS disks) has surpassed 300. The past nine years have brought remarkable changes in C compiler technology and in the microcomputer marketplace. Figure 3 shows the change in formats requested by our members. Over the past three years, CP/M has become virtually extinct and MS-DOS has come to dominate. More interesting, however, is the diversity of operating systems used in recent years. Macintosh, UNIX/Xenix, Atari and Amiga have appeared more than ever -- indicating that more and more programmers who use non-MS-DOS operating systems are interested in C and are seeking portable C source code. I think this trend is strong evidence that C is a portable language. Table 1 shows the 20 most popular disks in the last three years. The most-ordered CUG disk is MicroEmacs v3.9 (CUG#197 and CUG#198). MicroEmacs faithfully implements most of the features of Richard Stallman's Emacs editor. Daniel Lawrence claims copyright privileges for this version which has also been updated and enhanced many times by our staff and members. The secret of MicroEmacs' popularity seems to be its portablity (it runs on more than ten different operating systems), rich set of features, and its configurability -- a built-in macro language lets MicroEmacs be tailored to virtually any task. The next two most popular disks are UNIX tools used in compiler development. CUG#172, #173 and #290 are LEX, a lexical analyzer that extracts tokens from an input character stream. CUG#285 is a YACC compatible parser and code generator. As you'll notice from the Top 20 list, our library contains a wide variety of application programs and development tools, including cross-assemblers, windows, graphics, an AI application, communications, and a math package, among others. One of the more recent trends in the library is the emergence of shareware. Even though you must pay some minimal fee for the source code of a shareware program, the quality of some volumes is very competitive with more expensive commercial products. Another trend is the submission of more serious and specialized applications. For example, the 3-D medical imaging software on CUG#293-294. Wish List Even with all this diversity, there are many frequently requested packages. A Simple Text Editor Many people have asked for a simple text editor that can be embedded in their application. The editor needn't be fancy and powerful like MicroEmacs, but should offer these features: Be callable (as a function) from the application program Function in both full-screen and windowed applications Can retrieve and save a file Can browse a file (page up/down) Be modeless Support block manupilations (block copy, move, or delete) Can be compiled with small model under MS-DOS Can read up to 30K ASCII text Search or replacement is optional Go to the specified line number is optional An ANSI C Compiler This is a real challenge. We hope to address this need by distributing the GNU C compiler (and C+ + compiler) from The Free Software Foundation. .PDX Or .DBF File Function Libraries A .PDX file is an image file produced by ZSoft's PC Paintbrush. It is a common graphics file format for the PC and is also used by most scanners, fax programs, and desktop publishing programs. A .DBF file is a data file used by Ashton-Tate's dBase programs. We need function libraries that manipulate these standard format files. Spread Sheet As with the editor, we need a simple spread sheet that can be embedded in larger applications. Pascal To C Translator This would be useful for Pascal programmers trying to port their programs. Michael Yokoyama (HI) has forwarded such a program to us, but we have been unable to contact the author, Per Bergsten of Sweden, to get permission to release the program. Please let us know if you can contact Per Bergsten or know of an independent version of this code. C To Pascal Useful for Pascal programmers who want to port an application program written in C. Cross C Compiler Thanks to Will Colley, we have a variety of cross assemblers. However, our only cross C compiler is CUG204, 68000 C compiler by Matthew Brandt, which runs under MS-DOS and generates 68000 object code. We need more variety in this area (like a cross C compiler that runs under MAC and generates 8086 code). Download Fonts In A Laser Printer All sorts of applications could make better use of laser printer capabilities if they could download special fonts. We'd like a library of functions that can read Bitstream, Ventura Bitmap and other popular font files and download them to an HP compatible. Sideways Text Not a configuration utility that uses a printer's landscape mode, but a utility that exploits a printer's graphic mode to print 90ø rotated text. Why not? Database Management We would like a simple and useful relational database manager -- in C. If you've seen C source code such as those listed here or can implement them, please let us know. In addition, we are interested in obtaining C++ and C source code for Macintosh. Moreover, I believe you have your own wish list. Please let me know about it for a future column. P.S. Henri de Feraudy of France, the author of Small Prolog in CUG#297, is sending us a PC version of Little Small Talk. It will be a new release in a future issue. Figure 1 Figure 2 Figure 3 Table 1 Year 1987 1. 173 LEX Part 1 (lexical analyzer) 2. 172 LEX Part 2 3. 198 MicroEmacs v3.9 Source (text editor) 4. 197 MicroEmacs v3.9 Executable & Documetation 5. 175 (Replaced with CUG285) 6. 174 (Replaced with CUG 285) 7. 201 MS-DOS System Support (ANSI driver, TRS, ..etc.) 8. 204 68000 C Compiler (cross compier for MSDOS) 9. 236 Highly Portable Utilities (Unix-like tools) 10. 200 Small C Interpreter 11. 220 Window BOSS (window library) 12. 227 Portable Graphics 13. 164 Windows 14. 218 Dictionary Part I 15. 217 Spell & Dictionary Part II (spell checker) 16. 155 B-TREES, FFT, etc. (balanced binary tree, fast fourier transform) 17. 228 Miscellaney IX (window, ISAM routines, .. etc. 18. 165 Programs from Reliable Data Structures (from Plum Hall) 19. 216 Zmodem & Saveram (communication) 20. 226 ART-CEE (rule-based inference engine) Year 1988 1. 197 MicroEmacs v3.9 Exec. & Doc. (Text Editor) 2. 198 MicroEmacs v3.9 Source 3. 259 Console I/O & Withers Tools (window functions) 4. 255 EGA Graphics Library 5. 172 LEX Part 1 (Lexical analyzer) 6. 173 LEX Part 2 7. 260 Zmodem, CU, tty Library (communication) 8. 236 Highly Portable Utilities UNIX-like tools) 9. 151 Ed Ream's Screen Editor for IBM PC 10. 263 C_wndw Toolkit (windows) 11. 248 Micro Spell (spell checker) 12. 241 Inference Engine & Rule Based Compiler 13. 242 Still More Cross Assemblers 14. 155 B-TREES, FFT, etc. (balanced binary tree, fast fourier transform) 15. 227 Portable Graphics 16. 247 Miracl (multi-precision integer and rational arithmetic C library) 17. 246 Cycles, Mandelbrot 18. 232 Little Smalltalk - Unbundled Part 2 19. 231 Little Smalltalk - Unbundled Part 1 20. 265 cpio Installation Kit (archive utility) Year 1989 (Until October) 1. 197 MicroEmacs v3.9 Exec. & Doc. 2. 198 MicroEmacs v3.9 Source 3. 285 Bison for MS-DOS (YACC like parser) 4. 290 FLEX (fast lexical analyzer) 5. 263 C_wndw Toolkit 6. 283 FAFNIR (general-purpose, table-driven forms engine) 7. 277 HP Plotter Library (graphics) 8. 173 LEX Part 2 9. 172 LEX Part 1 10. 284 Portable 8080 Emulator 11. 260 Zmodem, CU, tty Library 12. 236 Highly Portable Utilities 13. 276 Z80 and 6804 Cross Assembler 14. 155 B-TREES, FFT, etc. 15. 241 Inference Engine & Rule Based Compiler 16. 242 Still More Cross Assemblers 17. 273 Turbo C Utilities 18. 261 68K Cross Assembler for MSDOS 19. 220 Window BOSS (window library) 20. 292 ASxxxx C Cross Assemblers C Programmer's Toolbox/PC Kenji Hino Kenji Hino is a member of The C Users' Group technical staff. He holds a B.S.C.S. from McPherson College and an undergraduate degree in metalurgy from a Japanese university. He is currently working toward an M.S.C.S at the University of Kansas. Unlike UNIX, MS-DOS has no standard utility programs to support C programmers in program development or maintenance. In the past, C programmers have developed their own tools from scratch or ported tools from other operating systems to MS-DOS. UNIX tools have been ported most, simply because they are the "right" tools to improve programmer productivity. This report looks at a collection of UNIX-like tools, C Programmer's Toolbox/PC revision 2.0 by MMC AD Systems. Component The Toolbox/PC consists of Volumes I and II, which are available separately or together. I recommend getting both. Each volume includes two IBM 360K disks and costs $99.95; both volumes together go for $175. The manual (in a binder) describes both volumes, regardless of whether you purchase Volume I, II, or both. The C Programmer's Toolbox is available from MMC AD Systems, Box 360845 Milpitas, CA 95035, phone (408) 263-0781. Although the Toolbox/PC runs on PC/MS-DOS, MMC AD Systems also distributes versions of the Toolbox for the Macintosh MPW and the Sun UNIX system. The installation of the Toolbox on either a floppy disk or hard disk system is straightforward; just copy all files from the distribution to your disk. If you install the Toolbox on hard disk systems, be sure that the path is set correctly. The Tools The Toolbox includes 21 tools (see Table 1). All the tools are command-line driven. The corresponding UNIX tools are also listed in the same table. The tools help analyze the structure, format and execution of programs, manipulate and/or modify program input/output data, or verify program input/output data (see Figure 1). Covering all 21 tools in a report this size is impractical and undesirable. Thus, I will focus on the analytical tools, CFlow, PMon and CritPath. These tools are mainly used to understand a program's structure and to analyze the performance of your application program for the enhancement. CFlow Whether developing or maintaining a program, as the program becomes larger, you tend to lose sight of the overall program structure. Discerning the inter-relationships between modules becomes harder as the program grows. Even worse, you may have to study code written by somebody else. CFlow is a tool for studying code. It scans one or more C source files to generate reports that describe the hierarchy of both defined and invoked functions (external or library functions). Figure 2 shows a program flow tree, one of the reports produced by CFlow. (The analyzed source code is shown in Listing 1 and is adapted from a program in the CUG PD Library. The original author is Richard Threlkeld.) The line indentation indicates the level of function invocations. If the same function is referenced more than once, the line number of the last reference is attached to the beginning of the line. An asterisk (*) indicates whether the function is an external or run-time library function. Within the parentheses following a function name is the source filename and a starting line number of the function definition. In order to obtain the desired result, you must specify the dash/slash options appropriately. For example, function names at each level of a CFlow tree are displayed in alphabetical order by default. If you want function names displayed as they are encountered, use the -e option. In addition, when using multiple input files, the -f option is useful to display the location of each function. In this version 2.0, many improvements were made over the previous version. CFlow now reports a function pointer (such as (*a) ()) or function address (such as f(); a = f;). It also has a virtual memory system that handles programs of unlimited size (true for some of the other tools, too). The biggest improvement is that CFlow now automatically preprocesses your source code. That is, it recognizes #if directives to read and process the appropriate portions of your code. This, however, creates one problem. If a function is a macro, it is expanded and replaced with some system-level function, surprising you with some unfamiliar function name in the report, such as _filbuf() instead of getc(). This can be solved by turning off the preprocessor with the p switch, thereby sacrificing all the preprocessor benefits. Along with the CFlow Tree, CFlow generates a Master Define Function List (a list of caller and callee), an Undefined Function List (a list of external or library functions) and Function Called by List (a list of callee and caller) when you specify the proper dash/slash options. Using CFlow, the programmer can easily and quickly understand how a program is structured and which module is invoked by which module. To understand visually, you can draw the structure diagram as in Figure 3, based on the Program Flow Tree. In Figure 3 for example, if a portion of the code in crc_update() is modified, you know from the reports which other functions will be affected (in this case, crc() and crc_finish()). PMon And CritPath The execution profiler PMon is a tool which analyzes a program. It determines how much execution time is spent on each symbol (functions or BIOS/DOS calls) or program area. During program execution, PMon resides in memory with the target program, intercepts the program at regular intervals and examines the CS: IP register of the target program to determine which section of code is currently being executed. PMon tracks this information for each intercept and, using the information from the .MAP file (symbol entries), generates a set of reports. I tested PMon using the CRCK (Cyclic Redundancy ChecK) program CRC15.EXE. The program listing of CRC15 is in Listing 1; it must be compiled and linked to generate a .MAP file. The .MAP file is then processed by MapVar and placed into PMon with the target executable program. Figure 4 shows two reports resulting from monitoring CRC15. The first report is the program execution summary, which gives the complete synopsis of the program's execution. Descriptions for certain summary headings are: Total execution clicks. The total number of clock ticks recorded in the program initiation, execution, and termination. Total monitored hits. The actual number of clock ticks recorded during program execution. Total symbol entries. The total number of symbols (function names) used in the program. Number of symbols hit. The number of symbols detected in the execution. Total symbols hits. The total number of times PMon found the program executing as opposed to BIOS, DOS, or other resident programs. Time in program. The total time spent in the program vs. BIOS/DOS functions and other activities (Time below/above). Time in BIOS/DOS. The total time spent in BIOS/DOS functions. According to the program execution summary, CRC15 processed 1 file within 6 seconds. Although CRC15 contains 115 symbol entries, PMon found only four symbols during program execution, even though it checked CRC15 a total of 92 times. CRC15 made 113 DOS system calls using 12 different DOS calls. Of the 92 times checked, PMon found the program executing for 4.76 seconds (79.3%) and BIOS/DOS for 1.24 seconds (20.7 %). The second report, the Symbol execution Summary, shows where a monitored program is executing within itself, excluding DOS calls. Abs Adr -- the starting address (segment:offset) of asymbol. Hits -- the total number of times PMon found the execution of a symbol. Loc% -- the percentage of activity of a symbol when compared with the total execution excluding DOS calls. Tot% -- the percentage of activity of a symbol when compared with the total execution including DOS calls. Entry Name -- Symbol name. In this example, PMon detected that function crc_update(), whose starting address is 0:011e, executed 50 times and took 63.5% of total execution time excluding DOS calls and 54.3% of total execution including DOS calls. In addition, PMon generates a BIOS Interrupt Summary, a DOS Function Call Execution Summary Report and DOS Function Call Execution Detail Report showing the statistics of BIOS/DOS operations performed in the program execution such as Character input/output, File input/output, etc. Although these reports provide a good amount of information about software performance, further analysis can be done with CritPath command. CritPath determines the critical path of a program by analyzing the reports generated by CFlow and PMon commands. A program's critical path is the sequence of functions called from main() that consumes more execution time than any other sequence. Figure 5 shows a Critical Path Report generated by CritPath. The report provides the primary information necessary to improve a program's performance. The report shows a list of the 20 functions that used the most execution time (Top 20 Functions in Actual Time), a list of the 20 functions that by themselves and through other functions that they called used the most execution time (Top 20 Functions in Cumulative Time). Finally, the reports provide a list of the functions that comprise the critical path of the program. In this example, the critical path is the sequence of functions crc() and crc_update(). CritPath also generates both a Function Summary Report that evaluates the performance of all functions and system calls in the program and a Weighted Hierarchical Program Flow Tree. Using the statistics produced by PMon and CritPath, programmers can spot places where performance could be improved. However, these tools only identify weak spots in the program and don't come up with the method to improve the performance. Such information might be obtained from books such as Supercharging C With Assembley Language by Harry Chesley, Mitchell Waite, The Waite Group. Conclusion Overall, compared to UNIX tools, the Toolbox tools have more options and provide more detailed information, helping the programmer to take more control over program output. On the other hand, he or she must read the manual very carefully and specify the appropriate options that will generate the desired result. Furthermore, the input source code for some tools should be not only syntactically correct but done in good programming style, even if the program compiled fine. Otherwise, the output information might come out confusing. For example, the inappropriate choice of options and poor programming style (such as Listing 1) cause CFlow to report an identifier, crc as a function address, not as a variable (crc is used for a function name and variable name. This can be detected by CXref.). CFlow also doesn't distinguish between function invocation and function declaration inside a function. For beginners, the Toolbox can be a good starting point for using tools to improve productivity since the commands are very uniform and the manual is well written. In the manual, each tool is uniformly explained using sample results. In particular, observations and suggestions about the reports generated are honest and good advice for users. For advanced programmers, the combination of CFlow, PMon and CritPath can give them clues for fine tuning or improving software performance either after the program has been developed or when it is about to be updated. CFlow, CPrint, CXref and CLint can be used to study existing programs and will greatly reduce maintenance cost. Figure 1 Figure 2 *** Program Flow Tree *** ------------------------- 1: main(CRC15.c:4) 2: crc(CRC15.C:29) 3: crc_clear(CRC15.C:58) 4: crc_finish(CRC15.C:80) 5: crc_update(CRC15.C:63) 6: 5 crc_update() 7: exit(*) 8: fclose(*) 9: fopen(*) 10: fprintf(*) 11: printf(*) 12: _filbuf(*) 13: 7 exit(*) 14: 11 printf(*) Figure 3 Figure 4 *** Program Execution Summary *** Program executed: crc15.exe Delay/Run period (clicks): 0/0 Start date/time: October 19, 1989 19:45:12 Stop date/time: October 19, 1989 19:45:18 Elapsed execution time: 0: 0: 0: 6 6 seconds Total execution clicks: 95 Approximate clicks/second: 15.8 Approx sample period (ms): 63.2 Total monitored hits: 92 Total symbol entries: 115 Number of symbols hit: 4 % of total symbols hit: 3.5 Total symbol hits: 73 Avg hits/hit symbol: 18.3 Number of monitored interrupts: 2 Number of interrupts used: 2 % of total monitored: 100.0 Total BIOS interrupt calls: 141 Avg # interrupts/hit: 7.4 Total BIOS interrupt hits: 19 Avg # hits/interrupt: 0.1 Number of DOS calls used: 12 Total DOS program calls: 113 Time in program (secs): 4.76 % of total: 79.3 Time in BIOS/DOS (secs): 1.24 % of total: 20.7 Time below program (secs): 0.00 % of total: 0.0 Time above program (secs): 0.00 % of total: 0.0 Total KNOWN time used (secs): 6.00 % of total: 100.0 Total UNKNOWN time used (secs): 0.00 % of total: 0.0 *** Symbol Execution Summary *** Abs Addr Hits Loc % Tot % Entry Name --------- -------- ----- ----- ---------- 7a 12 16.4 13.0 _crc 11e 50 68.5 54.3 _crc_update 3e4 1 1.4 1.1 __chkstk 1edc 10 13.7 10.9 __aNlshr --- HINT --- HINT --- HINT --- HINT --- HINT --- HINT --- HINT --- HINT --- Concentrate on the following functions to improve your program's performance: _crc ( 13.0) _crc_update ( 54.3) __aNlshr ( 10.9) Figure 5 *** Critical Path Report *** ---------------------------- Top 20 Functions in Actual Time ------------------------------- Rank Seconds % Total Function Name ---- ------- ------- ------------- 1. 3.3 54.3% crc_update() 2. 1.0 17.4% __SysCall_3fH() 3. 0.8 13.0% crc() 4. 0.7 10.9% _aNlshr() 5. 0.1 1.1% _chkstk() 6. 0.1 1.1% __SysCall_3dH() 7. 0.1 1.1% __SysCall_40H() 8. 0.1 1.1% __SysCall_43H() 9. 0.0 0.0% crc_clear() 10. 0.0 0.0% crc_finish() 11. 0.0 0.0% exit() 12. 0.0 0.0% fclose() 13. 0.0 0.0% fopen() 14. 0.0 0.0% fprintf() 15. 0.0 0.0% main() 16. 0.0 0.0% printf() 17. 0.0 0.0% _filbuf() Top 20 Functions in Cumulative Time ----------------------------------- Rank Seconds % Total Function Name ---- ------- ------- ------------- 1. 6.0 100.0% crc() 2. 6.0 100.0% main() 3. 2.7 44.6% crc_finish() 4. 2.7 44.6% crc_update() 5. 0.8 14.1% __SysCall_3fH() 6. 0.5 8.7% _aNlshr() 7. 0.0 0.0% crc_clear() 8. 0.0 0.0% exit() 9. 0.0 0.0% fclose() 10. 0.0 0.0% fopen() 11. 0.0 0.0% fprintf() 12. 0.0 0.0% printf() 13. 0.0 0.0% _chkstk() 14. 0.0 0.0% _filbuf() 15. 0.0 0.0% __SysCall_3dH() 16. 0.0 0.0% __SysCall_40H() 17. 0.0 0.0% __SysCall_43H() The Critical Path ----------------- Act Rank Cum Rank (%) (%) ---- ---- ---- ---- 0.0 15 100.0 2 main() 13.0 3 100.0 1 crc() 0.0 10 44.6 3 crc_finish() 54.3 1 44.6 4 crc_update() Critical path hits = 62 Total hits = 92 Critical path time = 4.0 secs Total time = 6.0 secs % of total = 67.4 Table 1 Toolbox Volumes I & II UNIX tools Description ================================================================== Cat cat, cp Concatenate Data CharCnt wc Count Characters,Lines... CFlow cflow Trace C Program Flow CLint lint C Semantic Checker CPrint cb, indent C Source Code Beautifier/Reformatter CritPath Critical Path Analyzer CXref xref C Cross Reference Detab expand Remove Tabs Entab unexpand Restore Tabs ExecTime time Time Program Execution FileComp comp Compare Files FileDiff diff Difference Files FileDump od Dump File FileList List and Find Files Fill Expand Text Template MapVar Extract Load Map Variables PMon prof, gprof Program Performance Monitor STrip Extract Text Tail tail Copy End of File TabTran sed Translate Tabs TransLit tr Transliterate Characters Listing 1 #include main(argc,argv) int argc; char **argv; { int i; void crc(); if (argc <= 1) { printf("USAGE:crc15 filename [filename...]\n"); exit(1); } for(i=1; i < argc; i++) { printf ("\n%-s ",argv[i]); crc(argv[i]); } exit(0); } /* main */ /* CRC * Cycric Redundancy Check * */ void crc(argv) char *argv; { FILE *fd; int crc; int c; char crc_char; int crc_clear(),crc_update(),crc_finish(); fd = fopen(argv,"rb"); if(!fd) { fprintf(stderr,"Can't open %s !\n",argv); exit(1); } crc = crc_clear(); while((c = getc(fd)) != EOF) { crc_char = c; crc = crc_update(crc,crc_char); } crc = crc_finish(crc); printf("%04x",crc); fclose(fd); } /* crc */ int crc_clear() { return(0); } int crc_update(crc,crc_char) int crc; char crc_char; { long x; int i; x = ((long)crc << 8) + crc_char; for(i = 0;i < 8;i++) { x = x << 1; if(x & 0x01000000) x = x ^ 0x01A09700; } return(((x & 0x00ffff00) >> 8)); } int crc_finish(crc) int crc; { return(crc_update(crc_update(crc,'\0'),'\0')); } Publisher's Forum I've been reading documentation. It's no fun. Here's some advice from an experienced "how to" writer, who's also an experienced programmer, about how documentation should be structured to be useful. Include an extended procedural tutorial. This section is for the user who doesn't have enough prior experience with similar products to guess what to do next. Don't mix tips about advanced tricks into this section, or cautions about product limitations and quirks. If you do, the user won't be able to find those important tidbits later without re-reading the entire section. In every "how-to" piece, focus is everything: give the procedural outline and just the procedural outline. Include a goal-oriented "Tips & Techniques" section. I don't care what fruity name you give your product, there will be certain non-obvious tricks that make it more productive. Organize these by goals -- e.g. Printing Fields From A Join, Timestamping A File, Converting File Formats. This section should be rife with cross-references and redundancy. Each goal's discussion should at least cross-reference related material that appears elsewhere, and include all the other "extraneous" information you were tempted to toss in as asides in the procedural section. Short, well-targeted examples belong here. Even if your product is "truly unique", the goals should be stated in terms of commonly recognized paradigms so that my experience with similar projects can speed my adaptation to your product. Include a thorough technical specification. No, technical specifications don't help the beginner, but they are invaluable to an experienced user. Cross-reference the specs. Include hardware requirements, interface specifications, data structure templates, file specifications, and command-line syntax for subordinate modules (even for those modules that are normally invoked by some "integrated environment driver" -- don't presume to know better than the programmer what he needs to know to get the job done). Explain the design goals and philosophy. Virtually every product started in a specific environment with a specific, limited application in mind. Yes, marketing will want to promote the product as everything for everyone, but make room somewhere in the document for the truth. Sharing the design philosophy helps the programmer understand where the product fits and reduces the early frustration level. If I'm trying to use your tool in a development project, and I know the design goals that produced the tool, I stand a better chance of designing a project that can be built with the tool. Invest in a superb index. So what if the answer to my question is in the manual. How many times can I afford to read a 900-page primer to find the two lines that are critical? The answer is a very small integer; I'm going to be calling customer support. Get your ego and marketing's time-table out of the way and hire a professional to prepare a SUPERB index. Every dollar spent on an index will be returned ten-fold in reduced customer support costs. Explain the installation process for standard environments, and then explain what configuration options are available and how they interact. Give me this information even if you do bundle a whizbang installation utility. I've probably been at this long enough to have my own ideas about where to put my working files. In short, keep your reader in mind. Design your documentation to meet the user's needs over his entire life as a user: a detailed step-by-step to orient the beginner; well-packaged goal-organized information to support the exploration and growth of the intermediate user; and comprehensive, frank, and well-indexed reference material for the experienced and technically advanced user. I mean it. Robert Ward Editor/Publisher New Products Industry-Related News And Announcements UNIX Alternative Announced For The Apple Macintosh Technical Systems Consultants, Inc. has released a UNIX compatible, real-time operating system for the Apple Macintosh family. The system, UniFLEX, supports multi-tasking and multi-users and comes complete with all development tools, a C Compiler, TCP/IP Networking support and X Window System v11.3 software. A version has also been released for Force Computer's CPU-37-singleboard VMEbus computer with integrated Ethernet hardware. For the Apple Macintosh family, price for a single system development license is $595. The price includes 90 days phone support. For the Force CPU-37, the single system licensing price is $1000 for UniFLEX/RT or $1800 for UniFLEX/RN with networking. Contact Technical Systems Consultants, Inc., 111 Providence Road, Chapel Hill, NC 27514 (919) 493-1451; FAX (919) 490-2903. Stepstone Updates Objective C The Stepstone Corporation has released its Objective-C Compiler v4.0 running under MS-DOS and Microsoft's OS/2 The Compiler is a C-based hybrid object-oriented language and is ANSI C compatible. Objective C v4.0 requires a PC/AT or PS/2 class machine running MS-DOS and Microsoft C v5.0. The compiler, packaged with a basic data structures library (ICpak101) and built-in extended memory support is $249. Stepstone has also released its object-oriented user interface toolkit, ICpak101, for workstations running the X-Windows System v11. Product information is available from the Stepstone Corporation at (203) 426-1875, (800) 289-6253 or by mail to The Stepstone Corporation, 75 Glen Road, Sand Hook, CT 06482. Lattice's New 6.0 Release Features ANSI Compliance Lattice, Inc. is shipping v6.0 of its C compiler for MS-DOS & OS/2. The release features major enhancements to the compiler, a global optimizer, new programming utilities, and a number of new library functions. Both the compiler and libraries are now ANSI compatible. Version 6.0 contains a new global optimizer, automatic register variable support, in-line function support, optimized libraries, and upgrades to the compiler. The Lattice C Compiler v6.0 allows program modules compiled under different memory models to be linked into a single program. The Lattice v6.0 now includes LASM, a full-featured macro assembler with support for 386 systems. LASM is compatible with MASM, and its output is compatible with CodePRobe so assembly language programs can also be debugged at source level. Utilities now bundled with the compiler are an overlay linker, a MAKE facility, BIND Utility, and several UNIX-like tools including EXTRACT and BUILD, DIFF, GREP, SPLAT, TOUCH, and WC. Programmer's tools in the compiler package include the CodePRobe source level debugger, an integrated editor, object module disassembler, object module librarian, and an automatic installation program. In addition to the OS/2 API and special graphics libraries in the previous version, Lattice adds its Curses screen management library, communications library, the dBC III library of database functions, and a protected mode OS/2 library. The new list price of $250 includes unlimited free technical support through Lattice's telephone hotline, bulletin board, MIX network, or written correspondence. Lattice provides an unconditional, 30-day money-back guarantee with each product. For further information, contact: Lattice, Inc., 2500 South Highland Avenue, Lombard, IL 60148 (312) 916-1600; FAX: (312) 916-1190. Greenleaf CommLib, V3.0 Released Greenleaf Software has released a new version of its communications library, CommLib. Greenleaf CommLib v3.0 includes Kermit, XModem, XModem 1K, and YModem batch file transfer protocols. It fully supports automatic RTS-CTS hardware flow control, Hayes modem control functions, and XON/XOFF software flow control. CommLib automatically filters up to three codes from the receive stream, stores status along with data in a "WideTrack Receive" mode, and programmatically ignores or reacts to modem status at the interupt service level. The Greenleaf CommLib supports the PC, XT, AT, PS/2, and compatible machines using COM1 and COM2 ports, and COM3..COM8 on a PS/2. It also supports up to 35 ports when using multiport boards. It can serve several families of multi-port boards, including Digiboard, Stargate, Arnet, Contec, Quatec, and Quadram. CommLib v3.0 is $299. For additional information and a free Demo disk, contact Greenleaf Software, Inc.; 16479 Dallas Parkway, Suite 570; Dallas, TX 75218; (800) 523-9830; FAX (214) 248-7830. XVT Now Runs On MS-DOS, OS/2 And UNIX New character-based versions of GSS' XVT Extensible Virtual Toolkit are available for MS-DOS, OS/2 and UNIX programmers. XVT allows programmers to support character displays with applications that feature windowing, pull-down menus, dialog boxes, scroll-bars and other graphical user-interface features. The same application source code can support the Windows, PM and Mac GUIs. Versions for Windows, PM, Macintosh and UNIX list for $595. XVT carries no run-time redistribution royalties. The company is located at 9590 SW Gemini Drive, Beaverton, Oregon 97005 (503) 641-2200; FAX: (503) 643-8642. Helios Enhances Proteus System Helios Software has released a new version of its prototype/demo system, Proteus. Proteus v4.5 enables software developers to build functional prototypes, marketing demos, tutorials and other interactive presentations. Version 4.5 offers an integrated environment to build character-based demos and bitmapped demos in any of 23 graphics formats. Both full-screen and overlay images can be displayed, using 26 different video effects. Designers can create screens with the built-in Screen Painter or configure Proteus to execute any external paint program. Captured screens can also be incorporated into demos. Proteus is $199 for a three-disk set, with examples in source code. There is a 30-day money-back guarantee, no royalties for distribution and no sign-on screen. Required hardware configurations depend on the graphics mode used, ranging from monochrome text to super-VGA. The Helios order number is (800) 634-9986, or contact them at P.O. Box 22869, Seattle, WA 98112 (206) 324-7208. High C V1.6 Includes 486 Support MetaWare Inc. has released its ANSI-conformant High C compiler v1.6 for 386/DOS on the 80386 and the 80486 in protected mode. Protected mode on the 386 and 486 is supported in conjunction with MS-DOS extenders. Specific support for the 486 is provided under toggle control. MetaWare has also released High C v1.6 for OS/2 and real-mode MS-DOS. Version 1.6 features expanded libraries, new documentation, two editors, a disk cache utilily, a B-tree library, and a graphics library for the 80386/486 in protected mode. Users also get MetaWare's new make facility, and DOS Helper which is a set of UNIX-style utilities for the MS-DOS operating system. This upgrade comes with the GFX/386 Graphics library, produced in conjuction with C Source. GFX for the 80386 is a user-transparent port of the C Source GFX graphics package. The graphics package provides specific floating-point graphics function; MetaWare is providing additional libraries that support the 80387 and Weitek Abacus. High C also includes the EC editor from C Source, HyperDisk disk cache from HyperWare, and source code for the MicroEMACS editor. In addition, v1.6 will be bundled with two products from Sterling Castle: BlackStar/386 "C" Function Library and BPTPlus in C. These products provide data retrival capabilities and over 300 additional library functions. Sterling Castle's BlackStar/386 C libraries and the GFX/386 Graphics library are available only through MetaWare. Please refer inquiries to MetaWare Incorporated, 2161 Delaware Avenue, Santa Cruz, CA 95060-5706 (408) 429-6382; FAX (408) 429-9273. Prototyping Tools Combined Genesis Data Systems has consolidated its line of prototyping and presentation products, promerly sold as RADs and RPS, into a single system named "ProtoFinish." ProtoFinish is a versatile system for creating prototypes, demos, tutorials and other presentations. It includes a screen design module for building ASCII-based screens, a memory-resident utility for capturing text or CGA graphics screens, a music module for adding sound, a flexible 4th-generation language for accurately simulating the look and feel of a program, and a royalty-free run-time utility for distribution. Libraries of assembly language routines, primarily for incorporating screens in C, PASCAL, BASIC, and Clipper code, are included for the programmer. Contact Genesis Data Systems, 8415 Washington Place NE, Albuquerque, NM 87113 (505) 821-9425; FAX (505) 821-9695. LISP Objects Sapiens Software has released a beta test version of its Common LISP Object System (CLOS) implementation. CLOS supports generic functions and methods (rather than message passing), and multiple inheritance of object slots. Star Sapphire CLOS is embedded in the Star Sapphire LISP v3.1 run-time, written in C, which eliminates CLOS loading time. Star Sapphire LISP runs on any PC compatible with 640Kb and a hard disk; extended memory can be used if installed. The product is $99.95 from Sapiens Software Corporation, P.O. Box 3365, Santa Cruz CA 95063 (408) 458-1990. Faircom Offers 'Special Edition' The Faircom Corporation has released a new application development toolbox, which includes the d-tree development environment, file management system and report generation system. Faircom is introducing this product with a $695 "Special Edition" package and a 30-day, no-risk trial offer. For more information, contact Faircom at (800) 234-8180, 4006 West Broadway, Columbia, MO 65203; FAX (315) 445-9698. Oakland Updates Screen Tools Oakland Group, Inc. has released v3.1 of the Look & Feel screen designer and the C-scape interface management system. Look & Feel lets you prototype and simulate screens, and automatically turn screens into C source code that will run across MS-DOS, OS/2, UNIX, and VMS. The new version of C-scape allows for total portability, has fewer levels of indirection, and creates smaller executables. MS-DOS and OS/2 versions of C-scape with Look & Feel cost $399, including source code. Look & Feel costs $149; C-scape $299. UNIX versions begin at $999. Look & Feel source code costs $900. For more information, contact Oakland Group, Inc., 675 Massachusetts Avenue, Cambridge, MA 02139 (800) 233-3733 or (617) 491-7311. New Linker Pocket Soft, Inc., has released .RTLink/Plus, an advanced overlay linker which supports debugging of programs with multiple and nested overlays with Microsoft's CodeView debugger. .RTLink/Plus also provides a unique link-time Profiler, which gives a detailed performance analysis in timing intervals which are user-adjustable to thousandths of a second. Pocket Soft is an authorized licensee of Microsoft CodeView information. .RTLink/Plus has a list price of $495 and is available through most common distribution/reseller channels and direct from Pocket Soft, Inc., 7676 Hillmont, Suite 195, Houston, TX 77040 (713) 460-5600. Tool Writes Dialog Box Source Code The Software Organization, Inc. has released DialogCoder, a programming tool that eliminates as much as 95 percent of the coding normally associated with windows dialog box programming. DialogCoder automatically generates C source code from dialog templates to manage all controls in the dialog; it uses graphical metaphors to express the relationships between dialog controls and actions, which eliminates most of the conventional dialog control programming. It also allows users to interactively specify the state of each dialog control during initialization and command processing. DialogCoder requires a 286-or 386-based machine with Windows 2.X. A Microsoft-compatible mouse is optional. DialogCoder is $349. To order, contact the Software Organization, Inc. at (800) 696-2012. Trio Releases C-Index/PC Trio Systems has started shipping a new $195 C database library, C-Index/PC. The new product, based on their C-Index/Plus package, allows C programmers to incorporate database features into their applications running under Microsoft Windows, OS/2, and MS-DOS. The C-Index/PC database library supports single-user and multi-user LAN applications with full file management facilities. Complete source code is supplied with C-Index/PC and can be adapted for use with any PC compiler and operating system running on an Intel microcomputer. Product features include: precompiled libraries for Microsoft C and Turbo C, B + Tree indexing, variable-length records, direct and sequential access and multiple record formats per file. There are no application royalties. For more information, call (818) 798-5567. New Debug Tool Traces Memory References TUITS Inc. has introduced Dr. MD., a run time memory tracking utility that finds memory overwrite bugs before an application crashes. Dr. MD catches memory overwrites when they happen. It also catches 'free()s' on invalid pointers, and dangling pointers. Dr. MD will not allow you to overwrite allocated or automatic variables. When Dr. MD finds a problem it reports the source file and the line number where the problem was found as well as where the space was allocated. No heap walking is needed. Dr. MD comes as source and you compile it with your compiler to fit your environment. The vendor claims it should work with any ANSII standard compiler, and has worked successfully in MS-DOS and UNIX System V environments. Dr. MD supports all the string library functions as well as memset, memcpy, and limited support of sprintf. Dr. MD sells for $59.95, and includes source code, manual, and some hints on memory management. For more information contact TUITS Inc., 411 N. Shields, Fort Collins, CO 80521, or call at (303) 224-9070. AtLast Offers Overlay Tools AtLast Software has released two new products: Overlay Architect, which automates the process of overlay construction, and Overlay Optimizer, which analyzes the performance of the program's overlay structure, then determines how to rebuild the overlays for the best performance in a given amount of space. AtLast Software will also custom build an overlay structure for developers who do not want to build their own. Overlay Architect sells for $369; Overlay Optimizer for $269. They can be purchased together for $569. Quantity discounts are available. Custom built structures are priced individually. MicroWay 486 Compilers For C, Pascal & FORTRAN MicroWay has released its 80486-targeted series of compilers, NDP C-486, NDP Fortran-486, and NDP Pascal-486. Each of the NDP-486 compilers include a "scheduler/code generator" that aligns code and data on paragraph boundaries, detects and minimizes prefetch buffer starving, uses new code sequences that run faster on the 80486 than the 80386, and incorporates a new strategy for driving the Weitek 4167 high speed coprocessor. They also provide a library of 70 device-independent graphics, keyboard, and sound routines. C, NDP Fortran, and Pascal-486 generate globally optimized, 32-bit native code that runs in protected mode under UNIX 386 System V v3.0, SCO XENIX 386 v2.3, and Phar Lap extended DOS. The compilers support the 486's built-in FPU and the Weitek 4167 numeric coprocessor. NDP C-486 is a two-dialect compiler that passes 100 percent of the Plum Hall validation suite for UNIX System V C and 95 percent of the tests for the new ANSI C standard. It includes an inline assembly language interface that simplifies the writing of embedded code by allowing the programmer to specify register values and generate interrupts. The MS-DOS, UNIX, and XENIX versions of NDP C-486, NDP Fortran-486, and NDP Pascal-486 retail at $1195 each. The C + + preprocessor lists at $495. All of the compilers include one year of free updates. Users should contact MicroWay's Technical Support Staff at (508) 746-7341 for more information. DOS Extender Supports Turbo C Eclipse Computer Solutions, Inc.'s OS/286 MS-DOS extender now supports Borland's Turbo C v2.0 and will soon support Turbo Pascal as well. The MS-DOS extender products of Eclipse Computer Solutions, Inc. (formerly A.I. Architects) exploit the protected mode operation of the 80286 and 80386 processors and make it possible to create, with conventional development tools, applications that are not restricted by normal MS-DOS memory limits. Contact Eclipse Computer Solutions, Inc., One Intercontinental Way, Peabody, MA 01960 (508) 535-7510; FAX: (508) 535-7512. T & T Enhances Data Junction Tools & Techniques has released Data Junction v3.01. The new version adds formats, an improved user interface, an expanded EZ-Convert mode, 300 percent plus speed improvements, a built-in case translation, and new conversion filters. MS-DOS licenses are $99 for Data Junction: Standard, $199 for Data Junction: Professional, and $299 for Data Junction: Advanced. UNIX/Xenix and LAN licenses start at $495. Data Junction is written in C, and distribution/OEM licenses are also available. For more information, contact Micheal Hoskins at Tools & Techniques Inc., 1620 West 12th Street, Austin, TX 78703 (800) 444-1945, or (512) 482-0824. LALR Adds Scanner Generator To Version 3.2 LALR Research has released LALR v3.2 which features the following improvements over v3.0. A lexical scanner generator is included which provides a 10 percent increase in syntax checking speed over the previous hand-written scanner. An option has been added to generate 0-40 percent smaller parsers. Multiple parsers can exist in an application program. Parsers can read input files of unlimited size. The input grammar format for the new version is fully compatible with previous versions. LALR v3.2 is $249 and comes with a 60-day, money-back guarantee. Upgrades from LALR v3.0 are $150. Shipping is $6. For more information, contact LALR Research at PO Box 4722, Chico CA 95927 (916) 345-0916. Solbourne Updates OS/MP Solbourne Computer, Inc., has shipped the latest version of its multiprocessing operating system, OS/MP v4.0A, which is based on the SunOS v4.0.1, licensed from Sun Microsystems Inc. OS/MP v4.0A introduces a set of system administration tools, to handle user account maintenance, group account maintenance, network group account maintenance, network account maintenance, NFS client maintenance, NFS server configuration and modem installation. OS/MP v4.0 also includes two new X Window tools. Smail is a user-friendly interface to the standard UNIX mail environment. Sproperty displays the property of any visible X Window. Contact Solbourne Computer, Inc. at 1900 Pike Road, Longmont, CO 80501 (303) 722-3400; FAX: (303) 772-3646. Belief Maintenance Using The Dempster-Shafer Theory Of Evidence Dwayne Phillips The author works as a computer and electronics engineer with the U.S. Department of Defense and is a doctoral candidate in Electrical and Computer Engineering at Louisiana State University. His interests include computer vision, artificial intelligence, software engineering, and programming languages. He first used the Dempster Shafer theory of evidence in 1984 and uses it extensively in his PhD research into computer vision. An expert system makes a decision given an amount of evidence. Usually it must choose between several competing answers or hypotheses. The human expert keeps these answers in his mind while he thinks over the problem. He gathers evidence and shifts his thoughts from one answer to another. After gathering evidence, he chooses the most favorable answer. We all do this in our daily decisions, but we don't think about the process, and we certainly don't keep track of specific numbers in our head. An expert system needs a sub-system to pool evidence and reach decisions: a belief maintenance system. The belief maintenance system keeps track of the hypotheses and the degree of belief attributed to each hypothesis. When the expert system finishes gathering evidence, the belief maintenance system chooses the answer. In some expert systems a belief maintenance system is not necessary, because some expert systems make decisions based on a single, clear cut piece of evidence. For instance, suppose an expert system has the task of rolling up the windows in your car. The evidence is whether or not it is raining. The system would check the atmosphere and ask, "Is it raining?" If the answer were yes, it would roll up the windows. In other expert systems a belief maintenance system is essential. Suppose the expert system had to decide at 9.00 AM whether or not to roll up the windows at 3.00 PM. Now the question is tougher. Evidence would include the daily weather forecast, the wind speed and direction, the relative humidity, weather records from past years, forecasts from the Farmer's Almanac, satellite photographs, and other relevant sources. The expert system would pool all the evidence and arrive at an answer. Consider the nature of evidence. Some evidence is not reliable (the weatherman is wrong sometimes and right sometimes). Some evidence is uncertain (an intermittent atmospheric reading). Some is incomplete (the wind speed by itself does not tell us much). Some evidence is contradictory (the weatherman's forecast and the atmospheric conditions). Finally, some evidence is incorrect (a broken atmospheric sensor or a wrong weather forecast). The belief maintenance system must deal with these factors, taking the evidence, assigning a measure of belief to each hypothesis, and changing this belief as new evidence becomes available. The resulting decision must be the same regardless of the order in which the system gathers the evidence. The method of belief maintenance that most of us know is classical probability. The basic properties of this system are [Beyer]: A) P(’) = 0 (null set) B) P(Q) = 1 (entire sample set) C) P(A) = 1 - P(A') D) P(AB) = P(A) + P(B), if A and B are mutually exclusive E) P(A€B) = P(A) * P(B), if A and B are mutually exclusive Another belief maintenance system came from the MYCIN project (a pioneering medical expert system developed in the early seventies by Edward Shortliffe at Stanford.) MYCIN used a system of certainty factors to keep track of hypotheses. Shortliffe later dropped the certainty factor system for the Dempster-Shafer theory of evidence. The Dempster-Shafer (D-S) theory of evidence was created by Glen Shafer [Shafer, 1976] at Princeton. He built on earlier work performed by Arthur Dempster. The theory is a broad treatment of probabilities, and includes classical probability and Shortliffe's certainty factors as subsets. In the D-S theory of evidence, the set of all hypotheses that describes a situation is the frame of discernment. The letter Q denotes the frame of discernment. The hypotheses in Q must be mutually exclusive and exhaustive, meaning that they must cover all the possibilities and that the individual hypotheses cannot overlap. The D-S theory mirrors human resoning by narrowing its reasoning gradually as more evidence becomes available. Two properties of the D-S theory permit this process: the ability to assign belief to ignorance, and the ability to assign belief to subsets of hypotheses. An example provides the easiest way to understand these properties and how they differ from classical probability. Suppose we want to decide which of three persons in an office -- Adam, Bob, and Carol -- will come in early to turn on the lights and make coffee. In the D-S theory the set Q = {Adam or Bob or Carol}. The sets {Adam}, {Bob}, and {Carol} are the mutually exclusive and exhaustive hypotheses. They are singletons. In the frame of discernment there are 2Q or 8 possible interpretations. (Figure 1) Figure 1 contains two special sets in {’} and {Adam, Bob, Carol}. The first is the null set, which cannot hold any value. As later examples will show, the null set normalizes beliefs. The second special set is {Adam or Bob or Carol}, represented by Q. Assigning belief to Q does not help distinguish anything. Therefore, Q represents ignorance. Representing ignorance is a key concept. Humans often give weight to the hypothesis "I don't know", which is not possible in classical probability. Assigning belief to "I don't know" allows us to delay a decision until more evidence becomes available. This mirrors the human tendency to procrastinate. Suppose that given a piece of evidence, we make the assertion shown in Figure 2. The D-S theory calls an assertion a basic probability assignment. The M in Figure 2 represents the measure of belief. The assertion of Figure 2 says that we believe Adam is the best choice with a weight of 0.6. We'll give the other 0.4 of belief to Q or "I don't know," thus allowing us to delay deciding on Adam. We cannot make this type of assertion in classical probability. The classical system's property of complements given earlier forces us to give Adam' 0.4 (the complement of Adam) if we give 0.6 to Adam. In this case Adam' = {Bob or Carol}. Notice the difference between Q ={Adam or Bob or Carol} and Adam' ={Bob or Carol}. Adam' gives more belief to Bob and Carol than we want. Q allows us to express a true "no comment" on the situation. Assigning belief to subsets in the D-S theory allows us to assign belief to a general concept instead of being too specific. Suppose in our example that the local police advise us that we should not have women coming to work early by themselves. We would make an assertion like that, shown in Figure 3. This assertion gives a weight of 0.7 to the subset {Adam or Bob} and a weight of 0.3 to ignorance or no comment. Classical probability does not permit a subset assertion. Recall that property D requires P(Adam or Bob) = P(Adam) + P(Bob). That property would force us to assign specific beliefs to Adam and to Bob individually. We do not want to be that specific. We want to procrastinate and think it over some more. Also, property C would make us assert the 0.3 to the complement of {Adam or Bob} which is {Carol}. We do not want to assert 0.3 to {Carol}. Assigning belief directly to {Carol} would contradict the evidence the police gave us. The D-S theory employs Dempster's rule of combination to combine two assertions. The mathematical formulas may be found in the references. They confuse the best of us, but they are simple when illustrated. Figure 4 shows how the two assertions combine. The table in Figure 4 is an intersection tableau. Which lists one assertion across the top and one down the side. Inside the tableau are the intersections of the sets in the rows and columns, with the products assigned to the intersections. The measures of belief inside the table sum to the final values given below the table. Notice how combination narrows the decision process. The single set {Adam} now has the highest belief. The subset {Adam or Bob} comes in second with no comment last. Now suppose that we require the first person in the office in the morning to bring up the computer system. Carol is an expert at this so we make the assertion shown in Figure 5. This attributes most of the belief to {Carol}. This new requirement or piece of evidence contradicts the previous evidence given by the police. That is the nature of evidence. Dempster's rule of combination allows us to combine the contradictory evidence and draw a logical conclusion. Figure 6 shows the combination of the result of Figure 4 and the assertion of Figure 5. Inside the intersection tableau is the null set. There is no intersection between the set {Carol} and the set {Adam} and there is also no intersection between the set {Carol} and the set {Adam, Bob}. The null set cannot hold any value. Therefore, it normalizes the beliefs of the other subsets. The sum of the beliefs of the other subsets is divided by one minus the belief in the null set. The beliefs of all the subsets sum to one. The bottom of Figure 6 shows this extra step. As a result, Carol is now the choice for coming in early in the morning. If she is unable to do so, then Adam is the logical replacement. If Adam is unavailable, then Bob comes in early. Implementation The preceding examples show that no complex mathematics are involved in combining two assertions. Dempster's rule of combination uses simple addition, subtraction, multiplication, and division. The only tricky part is the intersections of the sets in the tableau. There are several ways to solve the intersection question. Since there are three singletons and 23 total interpretations, we'll represent the hypotheses with three bits as in Figure 7. Listing 2 shows the C function that combines two assertions. The inputs are two belief vectors, each holding an assertion. The belief vector is a one-dimensional array of floats. In our examples, the LENGTH_OF_BELIEF_VECTOR is eight because we have three singletons and 23=8. The belief vector has a space, or slot, for each hypothesis, ordered as in Figure 7. The belief vector is awkward to initialize since we would like Adam in slot one, not slot four, and Carol in slot three, not slot one. Nevertheless, a uniform belief vector allows a very simple subroutine to combine the assertions. The first for loop initializes the sum_vector, the belief vector which holds the sums of the values found inside the intersection tableau. sum_vector holds the sums for later when normalization occurs. The for a loop goes through the belief vectors, finds the intersections, and calculates the products. The two if > 0.0 statements reduce processing time by eliminating unnecessary multiplication by zero. The function uses the C bitwise AND operator & to find the intersection of sets. Without the bitwise AND, the function would be much longer and much more complex. The last for loop performs the normalization. The values in sum_vector are divided by one minus the value assigned to the null set. The answer is stored in vector1. The combine_using_dempsters_rule function is the meat of the program written in Turbo C v1.5. I used this compiler because it had a few functions that made the user interface more pleasant. Except for those functions, there is nothing in the program that is machine, compiler, or operating system specific. One important note about implementing Dempster's rule of combination. The number of calculations depends on 2Q . In our example there were eight hypotheses. Alternatively, 200 single hypotheses would produce 2200 subsets, 2200 slots in the belief vector, and 2200 floating point calculations. This gets out of hand rather quickly. Several of the references [Gordon, Shortliffe 1985] [Shafer 1985] [Shafer 1987] deal exclusively with this topic. The discussion and proposed solutions are beyond the scope of this article. Conclusion The Dempster-Shafer theory of evidence is one method that an expert system may use to keep score on competing hypotheses while it gathers evidence and draws a logical conclusion. It is more general and capable than the classical probability with which most of us are familiar. It is easy to implement and executes quickly as long as the number of hypotheses is manageable. I suggest you try it on your next expert system or AI-related project. References Beyer, William H., CRC Standard Mathematical Tables, 26th edition, CRC Press, 1983, pp. 503-559. Gordon, Jean, Edward H. Shortliffe, "The Dempster-Shafer, Theory of Evidence," pp. 272-292 of Shortliffe, Edward H., Bruce G. Buchanan, eds., Rule Based Expert Systems, Addison Wesley Publishing Company, 1984. Gordon, Jean, Edward H. Shortliffe, "A Method for Managing Evidential Reasoning in a Hierarchical Hypothesis Space," Artificial Intelligence, Vol. 26, No. 3, July 1985, pp. 323-357. Shortliffe, Edward H., Bruce G. Buchanan, eds., Rule Based Expert Systems, Addison Wesley Publishing Company, 1984. Shafer, Glen, A Mathematical Theory of Evidence, Princeton University Press, 1976. Shafer, Glen, "Hierarchical Evidence," The Second Conference on Artificial Intelligence Applications, IEEE Press, December 1985, pp. 16-21. Shafer, Glen, Roger Logan, "Implementing Dempster's Rule for Hierarchical Evidence," Artificial Intelligence, Vol. 33, No. 3, November 1987, pp. 271-298. Figure 1 Frame of Discernment for the Case of Adam, Bob, and Carol {Adam, Bob, Carol} {Adam, Bob,} {Adam, Carol} {Bob, Carol} {Adam} {Bob} {Carol} {0} Figure 2 An Assertion Showing the Use of Ignorance m{Adam} = 0.6 m{Q} = 0.4 Figure 3 An Assertion Showing Belief Assigned to a Subset m{Adam, Bob} = 0.7 m{Q} = 0.3 Figure 4 Combining Two Assertions Using Dempster's Rule of Combination Figure 5 A New Assertion m{Carol} = 0.9 m{Q} = 0.1 Figure 6 Combining Result of Figure 4 with Figure 5 Figure 7 Using Three bits to Represent the Hypotheses bits hypothesis 000 {0} 001 {Carol} 010 {Bob} 011 {Bob, Carol} 100 {Adam} 101 {Adam, Carol} 110 {Adam, Bob} 111 {Adam, Bob, Carol} or {Q} Listing 1 /******************************************************************* * file d:\tc\cujds.c * * Functions: This file contains * main * display_belief_vector * clear_belief_vector * enter_belief_vector * combine_using_dempsters_rule * * Purpose: * This program demonstrates how to implement Dempster's * rule of combination. * * NOTE: This is written for Borland's Turbo C * Version 1.5. This allows us to use some * nice user interface functions. The actual * combination code is compiler independent. * ******************************************************************/ extern unsigned int _stklen = 40000; #include "d:\tc\include\stdio.h" #include "d:\tc\include\io.h" #include "d:\tc\include\fcntl.h" #include "d:\tc\include\dos.h" #include "d:\tc\include\math.h" #include "d:\tc\include\graphics.h" #include "d:\tc\include\conio.h" #include "d:\tc\include\sys\stat.h" #define LENGTH_OF_BELIEF_VECTOR 8 main() { char response[80]; int choice, i, j, not_finished; short place; float a[LENGTH_OF_BELIEF_VECTOR], belief, v[LENGTH_OF_BELIEF_VECTOR]; textbackground(1); textcolor(7); clrscr(); not_finished = 1; while(not_finished){ clrscr(); printf("\n> You may now either:"); printf("\n 1. Start the process"); printf("\n 2. Enter more assertions"); printf("\n 3. Exit program"); printf("\n _\b"); get_integer(&choice); switch (choice){ case 1: clear_belief_vector(v); clear_belief_vector(a); clrscr(); enter_belief_vector(v, 1); clrscr(); enter_belief_vector(a, 1); clrscr(); printf("\n> Initial Belief Vector\n"); display_belief_vector(v); printf("\n> Second Belief Vector\n"); display_belief_vector(a); combine_using_dempsters_rule(v, a); printf("\n> Resultant Belief Vector\n"); display_belief_vector(v); break; case 2: clrscr(); clear_belief_vector(a); enter_belief_vector(a, 1); clrscr(); printf("\n> Initial Belief Vector\n"); display_belief_vector(v); printf("\n> Second Belief Vector\n"); display_belief_vector(a); combine_using_dempsters_rule ( v, a); printf("\n> Resultant Belief Vector\n"); display_belief_vector(v); break; case 3: not_finished = 0; break; } /* ends switch choice */ } /* ends while not_finished */ } /* ends main */ clear_belief_vector (v) float v[]; { int i; for(i=0; i 0.0001){ printf(" [%3d]=%6f",i, v[i]); j++; } } printf("\n Hit RETURN to continue"); read_string(response); } /* ends display_belief_vector */ enter_belief_vector(v, line) float v[]; int line; { int i, not_finished, y; float value; y = line; printf("\n> ENTER BELIEF VECTOR"); printf("\n> Enter the place (RETURN) and value (RETURN)"); printf("\n> (Enter -1 for place when you're finished)"); not_finished = 1; while(not_finished){ printf("\n [__]=___"); y = wherey(); gotoxy(5, y); get_integer(&i); gotoxy(10, y); get_float(&value); if(i != -1){ v[i] = value; } /* ends if i 1+ -1 */ else not_finished = 0; } /* ends while not_finished */ } /* ends enter_belief_vector */ /*************************************************************** * * This is the function that implements Demptser's rule * of combination. * vector1 holds the original beliefs and will hold the * result of the combination. * ***************************************************************/ combine_using_dempsters_rule(vector1, vector2) float vectorl[LENGTH_OF_BELIEF_VECTOR], vector2 [LENGTH_OF_BELIEF_VECTOR]; { float denominator, sum_vector[LENGTH_OF_BELIEF_VECTOR]; int a, i, place; /* set the sums to zero */ for(i=0; i 0.0){ for[i-0; i 0.0) sum_vector[place] = (vector1[i] * vector2[a]) + sum_vector [place]; } /* ends loop over i */ } /* ends if vector2[a] > 0.0 */ } /* ends loop over a */ denominator = 1.0 - sum_vector[0]; for(i=1; i= '0' && x <= '9') ? 1 : 0) #define is_blank(x) ((x == ' '] ? 1 : 0) #define to_decimal(x) (x - '0') #define NO_ERROR 0 #define IO_ERROR -1 #define NULL2 '\0' get_integer(n) int *n; { char string[80]; read_string(string); int_convert(string, n); } int_convert (ascii_val, result) char *ascii_val; int *result; { int sign = 1; /* -1 if negative */ *result = 0; /* value returned to the calling routine */ /* read passed blanks */ while (is_blank(*ascii_val)) ascii_val++; /* get next letter */ /* check for sign */ if (*ascii_val == '-' *ascii_val == '+') sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */ /* * convert the ASCII representation to the actual * decimal value by subtracting '0' from each character. * * for example, the ASCII '9' is equivalent to 57 in decimal. * by subtracting '0' (or 48 in decimal) we get the desired * value. * * if we have already converted '9' to 9 and the next character * is '3', we must first multiply 9 by 10 and then convert '3' * to decimal and add it to the previous total yielding 93. * */ while (*ascii_val) if (is_digit(*ascii_val)) *result = *result * 10 + to_decimal(*ascii_val++); else return (IO_ERROR); *result = *result * sign; return (NO_ERROR); } get_short(n) short *n; { char string[80]; read_string(string); int_convert(string, n); } short_convert (ascii_val, result) char *ascii_val; short *result; { int sign = 1; /* -1 if negative */ *result = 0; /* value returned to the calling routine */ /* read passed blanks */ while (is_blank(*ascii_val)) ascii_val++; /* get next letter */ /* check for sign */ if (*ascii_val == '-' *ascii_val == '+') sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */ /* * convert the ASCII representation to the actual * decimal value by subtracting '0' from each character. * * for example, the ASCII '9' is equivalent to 57 in decimal. * by subtracting '0' (or 48 in decimal) we get the desired * value. * * if we have already converted '9' to 9 and the next character * is '3', we must first multiply 9 by 10 and then convert '3' * to decimal and add it to the previous total yielding 93. * */ while (*ascii_val){ if (is_digit(*ascii_val)){ *result = *result * 10 + to_decimal(*ascii_val++); if( (sign == -1) && (*result > 0)) *result = *result * -1; } else return (IO_ERROR); } /* ends while ascii_val */ return (NO_ERROR); } get_long(n) long *n; { char string(80]; read_string(string); long_convert(string, n); } long_convert (ascii_val, result) char *ascii_val; long *result; { int sign = 1; /* -1 if negative */ *result = 0; /* value returned to the calling routine */ /* read passed blanks */ while (is_blank(*ascii_val)) ascii_val++; /* get next letter */ /* check for sign */ if (*ascii_val == '-' *ascii_val == '+') sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */ /* * convert the ASCII representation to the actual * decimal value by subtracting '0' from each character. * * for example, the ASCII '9' is equivalent to 57 in decimal. * by subtracting '0' (or 48 in decimal) we get the desired * value. * * if we have already converted '9' to 9 and the next character * is '3', we must first multiply 9 by 10 and then convert '3' * to decimal and add it to the previous total yielding 93. * */ while (*ascii_val) if (is_digit(*ascii_val)) *result = *result * 10 + to_decimal(*ascii_val++); else return (IO_ERROR); *result = *result * sign; return [NO_ERROR); } get_float(f) float *(f); { char string[80]; read_string(string); float_convert(string, f); } float_convert (ascii_val, result) char *ascii_val; float *result; { int count; /* # of digits to the right of the decimal point. */ int sign = 1; /* -1 if negative */ double pow10(); /* Turbo C function */ float power(); /* function returning a value raised to the power specified. */ *result = 0.0; /* value desired by the calling routine */ /* read passed blanks */ while (is_blank(*ascii_val)) ascii_val++; /* get the next letter */ /* check for a sign */ if (*ascii_val == '-' *ascii_val == '+') sign = (*ascii_val++ == '-') ? -1 : 1; /* find sign */ /* * first convert the numbers on the left of the decimal point. * * if the number is 33.141592 this loop will convert 33 * * convert ASCII representation to the actual decimal * value by subtracting '0' from each character. * * for example, the ASCII '9' is equivalent to 57 in decimal. * by subtracting '0' (or 48 in decimal) we get the desired * value. * * if we have already converted '9' to 9 and the next character * is '3', we must first multiply 9 by 10 and then convert '3' * to decimal and add it to the previous total yielding 93. * */ while (*ascii_val) if [is_digit(*ascii_val)) *result = *result * 10 + to_decimal(*ascii_val++); else if (*ascii_val == '.') /* start the fractional part */ break; else return (IO_ERROR); /* * find number to the right of the decimal point. * * if the number is 33.141592 this portion will return 141592. * * by converting a character and then dividing it by 10 * raised to the number of digits to the right of the * decimal place the digits are placed in the correct locations. * * 4 / power = (10, 2) ==> 0.04 * */ if (*ascii_val != NULL2) { ascii_val++; /* past decimal point */ for (count = 1; *ascii_val != NULL2; count++, ascii_val++) /************************************************* * * The following change was made 16 June 1987. * For some reason the power function below * was not working. Borland's Turbo C pow10 * was substituted. * *************************************************/ if (is_digit(*ascii_val)){ *result = *result + to_decimal(*ascii_val)/((float)(pow10(count))); /*********** *result = *result + to_decimal(*ascii_val)/power(10.0,count); ************/ } else return (IO_ERROR); } *result = *result *sign; /* positive or negative value */ return (NO_ERROR); } float power(value, n) float value; int n; { int count; float result; if(n < 0) return(-1.0); result = 1; for(count=1; count<=n; count++){ result = result * value; } Listing 2 C Code to Implement Dempster's Rule of Combination /* * This is the function that implements dempster's rule * of combination. * vector1 & vector2 are belief vectors. vector2 will * hold the result of the combination. */ #define LENGTH_OF_BELIEF_VECTOR 8 combine_using_dempsters_rule (vector1, vector2) float vector1[LENGTH_OF_BELIEF_VECTOR], vector2[LENGTH_OF_BELIEF_VECTOR]; { float denominator, sum_vector[LENGTH_OF_BELIEF_VECTOR]; int a, i, place; /* set the sums to zero */ for(i=0; i 0.0){ for(i=0; i 0.0) sum_vector[place] = (vector1[i] * vector2[a]) + sum_vector[place]; } /* ends loop over i */ } /* ends if vector2[a] > 0.0 */ } /* ends loop over a */ denominator = 1.0 - sum_vector[0]; for(i=1; i