home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
CP/M
/
CPM_CDROM.iso
/
mbug
/
mbug060.arc
/
CPM#006.LBR
/
DB2&3MAX.INF
< prev
next >
Wrap
Text File
|
1979-12-31
|
33KB
|
793 lines
Maximizing Performance with dBase II and III
by Alan Simpson
Reprinted from "Personal Systems",
The Journal of the San Diego Computer Society
Issues of November and December 1984 and January 1985
Typed and Formatted by Rick Yount, Oak Harbor, WA
(206) 675-9797
This article may be reprinted for any non-commercial purpose;
please credit the Author and "Personal Systems".
TABLE OF CONTENTS:
Trimming Minutes Down to Seconds ........... 1
Don't Resort to Re-Sorting ................. 2
Faster Sorts ............................... 4
Faster Searching ........................... 4
Faster Math ................................ 5
Faster Reports ............................. 6
Faster Copying ............................. 6
Faster Edits ............................... 7
Searching for Ranges ....................... 8
Estimating Performance ..................... 8
Managing Multiple Index Files .............. 9
Trade-Offs ................................ 10
Technical Rationale ....................... 12
Everyone wants the most out of their computer. Granted, it's
nice that the computer can trim down to minutes what usually
requires humans hours to perform. But then, nobody complains if
those computer minutes can be trimmed down to seconds. In this
paper we'll discuss and compare general techniques for maximizing
the performance of dBase. We'll trim some of those long dBase
processing minutes down to quick dBase seconds.
TRIMMING MINUTES DOWN TO SECONDS
We'll begin by benchmarking (comparing) a few techniques for
performing some basic tasks with dBase II. We'll use a simple
mailing list database as an example. Its name is MAIL.DBF, and it
has this structure:
FLD NAME TYPE WIDTH DEC
001 LNAME C 020 000
002 FNAME C 020 000
003 ADDRESS C 020 000
004 CITY C 020 000
005 STATE C 010 000
006 ZIP C 010 000
007 AMOUNT N 009 002
First, let's discuss various methods for sorting this database:
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 1
DON'T RESORT TO RE-SORTING
There are basically four methods to maintain a sorted list of
items in a database. The first is to simply CREATE the MAIL
database, add records to it, then sort it with the SORT command.
For example, suppose you CREATE MAIL and APPEND 999 records to it.
To add another record to it, and maintain an alphabetical listing
by last name, you would need to use the commands:
USE MAIL
APPEND
SORT ON LNAME TO TEMP
This procedure would require about 940 seconds to re-sort a
database with 1000 records in it.
Another method would be to USE MAIL, APPEND the new record,
then INDEX the file on the LNAME field, using the commands below:
USE MAIL
APPEND
INDEX ON LNAME TO NAMES
This procedure requires about 530 seconds to re-sort the
1000-record database back into last name order
A third method would be to locate the position in the
database that the new record belongs, and INSERT a new record into
its proper alphabetical place, as below:
USE MAIL
LIST (to find the insertion point)
INSERT
The time required would be however long it takes you to find
the appropriate place to insert the new record, plus about 124
seconds for the INSERT command to rearrange the database.
The fourth method is to create an index file of the field to
sort on, and keep that index file active while adding the new
record(s). In most cases, the best time to index a file is
immediately after creating it, as in the commands below:
CREATE MAIL
USE MAIL
INDEX ON LNAME TO NAMES
It only takes about two seconds to create the index file when
the database is empty. At this point, the MAIL.DBF database and
the LNAME.NDX files exist on disk, but both are empty. To add new
data, you would need to make the NAMES.NDX file active by typing
in the commands:
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 2
USE MAIL INDEX NAMES
If you always USE MAIL INDEX NAMES when you add new data, the
index file will automatically be updated, therefore the data will
always be sorted. So to add one record to the MAIL database with
999 records in it, you would type the commands:
USE MAIL INDEX NAMES
APPEND
There is no need to go through the INDEX ON procedure again,
because the NAMES.NDX file was active during the appending
procedure. The time required to automatically re-sort the index
file using this method is a scant 3 seconds. Add to that the
original two seconds for the INDEX ON command to create the
initial index file for a total of 5 seconds. Table 1 compares the
processing times for the four methods (using a 16-bit computer
with 256K RAM, floppies, and dBase II Version 2.4):
-------------------------------------------------------
Method Commands Used Time Required
1 USE, APPEND and SORT 940 seconds
2 USE, APPEND and INDEX ON 530 seconds
3 USE, INSERT 129 seconds
4 USE file1 INDEX file2 APPEND 5 seconds
-------------------------------------------------------
Table 1: Sorting Times; Four Different Approaches
Remember, you need to first create an index file based upon
the field(s) you wish the database to be sorted by. Use the INDEX
ON command to create the index (.NDX) file. For example, to
maintain an alphabetical listing of people on the MAIL database by
last name, type in the commands:
USE MAIL
INDEX ON LNAME TO NAMES
This creates and stores an index file called NAMES.NDX on
disk. To make the index file active, specify its name in the USE
command, as below:
USE MAIL INDEX NAMES
Once the file is made active in this way, any changes to the
database, whether they be through APPEND, EDIT, BROWSE, REPLACE,
PACK or READ will automatically re-sort and adjust the index file
accordingly.
Avoiding the re-sorting process on only one way in which
index files can greatly improve the speed of a dBase II software
system. Most types of processing that involve searching can also
be accelerated.
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 3
FASTER SORTS
In some cases, a database should be physically sorted rather
than indexed. For example, the UPDATE command works best if the
file you are updating FROM is physically pre-sorted. You could
use the SORT command to create a sorted database called TEMP from
the MAIL database using the commands:
USE MAIL
SORT ON LNAME TO TEMP
This approach takes about 940 seconds, or about 15 minutes of
processing time. If the NAMES index file already exists, you can
achieve the same result using the commands:
USE MAIL INDEX NAMES
COPY TO TEMP
Copying the contents of the indexed file only required about
326 seconds of processing time; about one-third the time. When
you COPY an indexed file to another database, the records on the
database you copy to will be physically sorted.
FASTER SEARCHING
Let's assume that the MAIL.DBF database already has 1,000
records on it. Ten individuals on this database have the last
name (LNAME field) "Miller". The question is: Just how long
does it take to LIST, COUNT, or COPY all the "Miller"'s to another
database; or how long does it take to print a formatted REPORT
with only "Miller"'s, or to SUM the amounts for the "Miller"'s?
The answer, of course, is: It depends on how you do it.
For our benchmark comparisons, we'll assume that the database
has already been indexed on the LNAME field, using the commands:
USE MAIL
INDEX ON LNAME TO NAMES
Let's begin by comparing processing times using two different
command files and approaches. The first method will use the
standard LIST FOR approach to fish out all the "Miller"'s. The
command file looks like this:
********** Method 1: LIST FOR approach
ERASE
USE MAIL INDEX NAMES
ACCEPT " List all with what last name? " to SEARCH
LIST FOR LNAME=SEARCH
When you DO this command file, it clears the screen and
displays the prompt:
List all with what last name?
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 4
Suppose you type in the name "Miller" and press the RETURN
key. The program will then display the records of all ten
"Miller"'s on the screen. The time required for the command file
to display the "Miller"'s and return to the dot prompt is about
148 seconds on a floppy disk system. Almost two and a half
minutes.
A second approach to solve this problem is to use the FIND
command to look up the first "Miller" in the NAMES index, then use
the WHILE option to display the remaining "Miller"'s in the
database, as in the program below:
********** Method 2: FIND and LIST WHILE approach
ERASE
ACCEPT " List all with what last name? " to SEARCH
FIND &SEARCH
LIST WHILE LNAME = SEARCH
Processing time for the second method to display all 10
"Miller"'s then redisplay the dot prompt on the screen, is less
than 9 seconds. Table 2 compares processing times for the two
different methods. Both methods perform exactly the same task,
but the processing times vary dramatically.
-------------------------------------------------------
Method Commands Used Time Required
1 LIST FOR 148.75 seconds
2 FIND and LIST WHILE 8.94 seconds
--------------------------------------------------------
Table 2: Comparison of Processing Times; Two Methods
FASTER MATH
The FIND and WHILE approach with indexed databases can offer
significant time savings with dBase commands other than LIST. For
example, if you want dBase to COUNT how many "Miller"'s are in the
database, you can use the commands:
USE MAIL
COUNT FOR LNAME = "Miller"
This approach requires about 55.5 seconds to display the fact
that there are 10 "Miller"'s in the database, then redisplay the
dot prompt. You can cut this time down significantly using the
commands:
USE MAIL INDEX NAMES
FIND Miller
COUNT WHILE LNAME = "Miller"
This approach performs the same task in about 3.20 seconds;
quite a significant time savings.
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 5
Suppose you wish to SUM the AMOUNT field for just the
"Miller"'s? You could use these commands:
USE MAIL
SUM AMOUNT FOR LNAME = "Miller"
Processing time using this method is about 58 seconds. The
alternative approach uses these commands:
USE MAIL INDEX NAMES
FIND "Miller"
SUM AMOUNT WHILE LNAME = "Miller"
This approach takes 5.28 seconds; less than one-tenth the
time.
FASTER REPORTS
You can use the WHILE command with the REPORT command, also.
For example, suppose you've already created a formatted report
called MAILIST using the REPORT command. To display all the
"Miller"'s on the formatted report, you could use the commands:
USE MAIL
REPORT FORM MAILIST FOR LNAME = "Miller"
This approach requires about 135.3 seconds to display all the
"Miller"'s on the report, then redisplay the dot prompt. The
faster approach uses these commands:
USE MAIL INDEX NAMES
FIND Miller
REPORT FORM MAILIST WHILE LNAME = "Miller"
These commands perform the same job in a slim 14.03 seconds.
FASTER COPYING
For copying portions of the MAIL database to a database
called TEMP, the commands:
USE MAIL INDEX NAMES
COPY TO TEMP FOR LNAME = "Miller"
require a hefty 200.6 seconds; more than three minutes. You can
perform the same job using these commands:
USE MAIL INDEX NAMES
FIND Miller
COPY TO TEMP WHILE LNAME = "Miller"
These commands trim the copying time down to a comfortable
18.7 seconds.
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 6
FASTER EDITS
Indexed files can also speed up the editing process. For
example, suppose you want to BROWSE through the database to edit
data for one of the "Miller"'s. One approach would be to type in
the commands:
USE MAIL
BROWSE
These commands will set up the dBase BROWSE screen with the
names and addresses in their original order, as displayed in
Figure 1:
LNAME--------------------FNAME------------------ADDRESS
Bond James 007 Spy St.
Kenney Dave 123 Clark St.
Newell Jeff 341 Lou Drive
Mohr Richard 350 W. Leadora St.
Rosiello Rick 999 Buddy Way
Wallace Doug 345 Killer St.
Miller Mike 601 Lemon Dr.
Figure 1: An unindexed BROWSE screen.
You will need to use lots of CTRL-C commands to scroll
through the database to find the "Miller" you're looking for.
There's no telling how long it might take you to find the
particular "Miller" you wish to edit, because the "Miller"'s will
be placed randomly throughout the database.
If, on the other hand, you use these commands to BROWSE:
USE MAIL INDEX NAMES
FIND Miller
BROWSE
The BROWSE screen will display the first "Miller" on the
database, and all the remaining "Miller"'s immediately beneath, as
shown in Figure 2:
LNAME--------------------FNAME------------------ADDRESS
Miller Mollie 601 Mission Blvd.
Miller Shiela 1234 Genessee
Miller Ms. Stephanie S. 734 Rainbow Dr.
Miller Patti 626 Mazda Way
Miller George P.O. Box 2802
Miller Julie 999 Love St.
Miller Caron 123 Princess Way
Miller Ms. Chrissie 321 Hynde St.
Miller Mrs. Sally S. 325 Seco Ct.
Miller Dr. James T. 701 Newport Dr.
Figure 2: A BROWSE screen with an indexed database
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 7
No need to go scrolling through pages and pages of BROWSE
screens to find the "Miller" you wish to edit, because all ten
"Miller"'s are displayed immediately and simultaneously on one
BROWSE screen.
Similarly, if you wish to use the EDIT command to change the
data for a particular "Miller", you can use the commands:
USE MAIL INDEX NAMES
FIND Miller
EDIT WHILE LNAME = "Miller"
These commands will quickly display the first "Miller" in the
database on the EDIT screen. Each subsequent CTRL-C command will
immediately position you to the next "Miller" in the database. So
once again, you can save a great deal of time by not having to
scroll around and search for individual "Miller"'s.
SEARCHING FOR RANGES
The FIND and WHILE approach can also be used for searching
for ranges of data. For example, if a database had a DATE field
with dates stored in MM/DD/YY format, and the database was indexed
on the date field, the following command file would list all
records between the requested starting and ending dates:
USE file INDEX datefield
STORE " " TO START, FINISH
@ 2,2 SAY "Enter Start Date " GET START PICTURE "99/99/99"
@ 4,2 SAY "Enter End Date " GET FINISH PICTURE "99/99/99"
READ
FIND &START
LIST WHILE DATE <= FINISH
(This command file assumes that all dates have the same
year. It's a little trickier when the data is spread across
several years
ESTIMATING PERFORMANCE
Performance can be an important issue in developing a custom
dBase II software system. Generally, processing times tend to
increase linearly with the size of a database. Therefore, on a
database with 5,000 records in it, displaying all the "Miller"'s
could take as long as 740 seconds, a little over 12 minutes, with
the LIST FOR approach. Using an indexed file with the FIND and
LIST WHILE approach will perform the same task in about 45
seconds, less than a minute. Double those times for a database
with 10,000 record in it: A whopping 24 minutes (almost a half
hour) with the FOR approach, vs. 88 seconds (under a minute and a
half) with the FIND and WHILE approach.
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 8
Keep in mind that any dBase command that allows the FOR
option will allow the WHILE option instead (e.g. LIST, DISPLAY,
COPY, REPORT, SUM, COUNT, REPLACE, DELETE, TOTAL). Therefore, any
of these processes can be greatly expedited with an index file and
the FIND and WHILE commands.
MANAGING MULTIPLE INDEX FILES
In the previous examples, we compared processing times using an
index file called NAMES. This index file contains only the LNAME
field. Realistically, a mailing list will probably require two
separate sort orders; 1) a sort by last name and first name for
printing a directory listing, and 2) a sort by zip code for bulk
mailings. In that case, you need to create two index files to
manage the two different sort orders. One index file, which we'll
call NAMES, will keep the mailing list data in last name and first
name order. A second index file, which we'll call ZIPS, will
maintain a zip code order for handling bulk mailings. To create
both of these index files, you'll need to first CREATE the
MAIL.DBF database with the CREATE command, then immediately create
the two index files using the commands:
USE MAIL
INDEX ON LNAME + FNAME TO NAMES
INDEX ON ZIP TO ZIPS
The MAIL.DBF database now has two index files associated with
it; NAMES.NDX and ZIPS.NDX. You can keep both index files active
by using the command:
USE MAIL INDEX NAMES,ZIPS
By specifying two index files in this fashion, all future
modifications to the database with the APPEND, EDIT, BROWSE, READ,
or REPLACE commands will * * AUTOMATICALLY * * update both
index files.
If you LIST or DISPLAY ALL the records in the database, they
will be displayed in last name order, since NAMES is the first-
listed index file in the INDEX portion of the command line.
Furthermore, you can only use the FIND command with the first
listed index, NAMES. Therefore, to display all the records in the
database in zip code order, you need not go through the INDEX ON
ZIP TO ZIPS procedure again. Instead, you can simply type in the
command:
SET INDEX TO ZIPS
This eliminates the time required to sort the file again.
Before adding new records or editing the database, be sure to use
the commands:
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 9
USE MAIL INDEX NAMES,ZIPS
or
USE MAIL INDEX ZIPS,NAMES
This is required to make both index files active again.
Otherwise, you'll be likely to get a RECORD OUT OF RANGE error
sometime in the not too distant future.
TRADE-OFFS
There are some trade-offs to contend with when using multiple
index files. Generally, the more index files you have active at
any given moment, the longer it takes to perform an APPEND, EDIT
or REPLACE procedure. For example, if you want to add records to
the MAIL database without any active index files, you can just use
the commands:
USE MAIL
APPEND
As you type in each new individual's data, the screen will
immediately accept each new record and re-display the next APPEND
screen. However, if you decide to get carried away and create
four index files, and keep them all active as:
USE MAIL INDEX NAMES, ZIPS, CITIES, STATE
You will notice a definite delay between the time you fill one
APPEND screen and the appearance of the next blank APPEND screen.
On a large data file with over 1,000 records in it, the delay
could be as much as 20 seconds, depending on how many fields are
in each index file and the RAM capacity of your computer.
In general, one or two active index files are sufficient for
most databases. The delays caused by one or two active index
files are relatively insignificant, and are more than compensated
for by the time savings that the FIND and WHILE commands offer, as
well as by the time savings gained by avoiding re-indexing.
Another disadvantage to indexed files occurs during global
replaces. For example, if for some reason you wished to set all
the AMOUNT fields back to zero in your hypothetical MAIL database,
you could use the commands:
USE MAIL INDEX NAMES,ZIPS
REPLACE ALL AMOUNT WITH 0
On a database with 1000 records in it, this process could
easily take 45 minutes. However, the commands above waste
processing time by updating the index files when it is not
necessary to do so.
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 10
Recall that the NAMES index contains the LNAME and FNAME
fields, and the ZIPS index contains the ZIP field. The AMOUNT
field is not used in either index file. Therefore you can use the
NOUPDATE option (supplied in dBase Versions 2.4 and higher) to
perform the replace. These commands:
USE MAIL INDEX NAMES,ZIPS
REPLACE NOUPDATE ALL AMOUNT WITH 0
perform the update in about six minutes. The NOUPDATE option
informs dBase that, even though there are two active index files,
there is no need to update them while performing this REPLACE
command.
There are some important points to keep in mind about the
INDEX and FIND commands. First, the FIND command only works on an
indexed field. If a database is in use with multiple index files,
the FIND command only works with the first listed index file. For
example, if you open the MAIL data base with the two index files
as below:
USE MAIL INDEX ZIPS, NAMES
the FIND command can only be used to locate a zip code.
If the data to look up in a database is stored in a variable,
then the variable name must be "macro-ized" to be used with the
FIND command, as below:
ACCEPT "Look up whom? " to SEARCH
FIND &SEARCH
Also, FIND does not support functions or operators. That is,
you cannot use the commands: FIND Miller .OR. Smith or FIND
!(&SEARCH) nor FIND LNAME > &SEARCH.
If you create two index files, but later add, edit, or delete
data with only one or neither of the index files active, the index
files will be corrupted, and you will most likely get a RECORD OUT
OF RANGE error at a later time. In this case, both index files
must be re-created by typing in the commands:
USE MAIL
INDEX ON LNAME + FNAME TO NAMES
INDEX ON ZIP TO ZIPS
or . . .
USE MAIL INDEX NAMES,ZIPS
REINDEX
Again, you can avoid the re-indexing by always keeping both
index files active with the command USE MAIL INDEX NAMES, ZIPS
when working with the database.
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 11
TECHNICAL RATIONALE
From a technical standpoint, the reason that the FIND and
WHILE approach always dramatically outperforms the FOR approach is
quite simple. Whenever you use the FOR option to perform a
search, dBase always starts accessing the records from record
number 1 and reads every single record directly from disk.
Therefore, if you had a database with 10,000 names in it, ten of
which were "Miller", dBase would perform 10,000 disk accesses to
display the 10 "Miller"'s. Ten thousand disk accesses takes a
very, very long time.
Many unnecessary disk accesses can be avoided by the fact
that dBase always stores an active index file in RAM (at least, as
much of it as will fit in RAM). Furthermore, the index file is
always in sorted order in RAM.
When you use the FIND command with an index file, dBase finds
the first "Miller" in the index file in RAM, which requires no
disk accesses. Furthermore, the WHILE option only searches WHILE
(as long as) the search condition is true. Therefore, dBase will
stop searching as soon as it encounters the first non-"Miller" in
the index file. Since the index is already in sorted order,
"Miller"'s are all clumped together and disk accesses will stop as
soon as all the "Miller"'s have been displayed.
So the FOR command requires 10,000 disk accesses to display
the ten "Miller"'s, while the FIND and WHILE approach only
requires 10 disk accesses; thereby eliminating 9,990 unnecessary
disk accesses and about 20 or 30 minutes of time depending on the
size of the database, the amount of RAM your computer has, and the
speed of your disk drives.
- EOF -
----------------------------------------------------------------
Maximizing dBase II and III Performance - by Alan Simpson
Page 12
√