home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Simtel MSDOS 1992 December
/
simtel1292_SIMTEL_1292_Walnut_Creek.iso
/
msdos
/
database
/
tara201.arc
/
TARA.DOC
< prev
next >
Wrap
Text File
|
1987-05-12
|
25KB
|
569 lines
Tara Datafile Utilities, Version 2.0
USER'S MANUAL
by
David C. Oshel
Copyright (c) 1987 by David C. Oshel
ALL RIGHTS RESERVED
Private individuals are granted free license to copy and
distribute this complete set of Tara Datafile Utilities, provided
my copyright notice is not removed, and provided you distribute
both programs and documentation without charge.
Corporate or governmental use requires a license fee of $25.00.
Send check or money order payable to:
MicroConsulting Services
1219 Harding Avenue
Ames, IA 50010
I. Introduction
These files are included in Tara Datafile Utilities, version 2.0:
File Purpose
-------------------------------------------------------------------------
BROWSE.BAT demonstration, uses STARS.DAT
CRYPTIC.DOC (doc)
CRYPTIC.EXE sophisticated file encipher, decipher
ENTER.DOC (doc)
ENTER.EXE data entry for MailMerge-type datafiles
FIELD.DOC (doc)
FIELD.EXE selected fields, with format for browse
FNKEY.DOC (doc)
FNKEY.EXE assign macro strings to PC function keys
NAMES.FLD sample field definition file for NAMES.DAT
NSORT.DOC (doc)
NSORT.EXE numeric sort on any field or fields
PICK.DOC (doc)
PICK.EXE select records on any field or fields
SETNAMES.BAT define Fn keys for use with NAMES.DAT
SETSTARS.BAT define Fn keys for use with STARS.DAT
STARS.DAT demonstration data file, list of bright stars
STARS.FLD field definition file for STARS.DAT
TARA.DOC you're reading it
TSORT.DOC (doc)
TSORT.EXE text sort on any field or fields
The Tara Datafile Utilities are a set of small, powerful tools
which were originally intended just to make life with MailMerge a
little easier. Until now, information retrieval from this kind
of data file meant you had to use WordStar's Merge Print facility
to restrict, select out, and display your data. If you wanted to
sort (or resort) your data before using it, you were out of luck.
But with these tools, you now have direct access to your own data
and you no longer need to fire up WordStar and a printer just to
see what's what.
Most of these programs are simple, one-task tools. They are used
in combination with each other and with the MS-DOS command line
i/o redirection facility. In general, Tara Datafile Utilities
allow you to:
a) restrict the mass of data you have to the few records and
fields you actually need to look at, based on the contents of any
field or combination of fields;
b) sort this smaller subset of information in either ascending
or descending order, on either text or numeric data, on any field
or combination of fields;
c) project the results to another file for use by WordStar or
some other program, or to the screen, where you may view selected
and/or formatted fields at leisure.
Tara Datafile Utilities also include:
d) a quick data entry module, using a field definition file which
you create;
e) a sophisticated file encryption program;
f) a program which assigns macro strings to PC function keys.
###
II. MS-DOS i/o redirection and pipes
You should have a firm grasp of what "i/o redirection" is all
about in order to use the Tara Datafile Utilities. (There is
also a brief discussion of this issue in your MS-DOS manual.)
In general, "i/o" means INPUT TO and OUTPUT FROM any particular
program. Most commonly, input comes from your computer keyboard
and output goes to your computer screen.
Less commonly, input may come from a text file, and the stream of
characters from that file is treated exactly as though someone
were rapidly typing on the computer's keyboard.
Similarly, output may be sent to some other destination than the
computer screen -- either to a line printer or to a text file.
Less commonly still, the OUTPUT from one program can be the INPUT
to another program! This is called a "pipe".
So, the three things to be aware of are 1) redirected input, 2)
redirected output, and 3) pipes. These three demons are invoked
on the MS-DOS command line, and nowhere else.
The DEFAULT input is "<CON:" and the DEFAULT output is ">CON:".
You do not need to specify the DEFAULT input or output on any
command line. However...
Input from a FILE has the form "<file1.dat" and output to a file
has the form ">file2.dat". Output to the line printer would
typically be written as ">PRN:", but you may also see ">LPT1:" or
">LPT2:" in the case of serial printers or modems.
A PIPE is specified with the vertical bar character, "|",
surrounded by spaces fore and aft, " | ", between two COMMANDS:
C>dir | sort | more
This example comes entirely from MS-DOS; i.e., "dir", "sort" and
"more" are all MS-DOS commands, and they are discussed in your
MS-DOS manual.
Note that MS-DOS sort typically REQUIRES i/o redirection:
C>sort <inputfile >outputfile
Note also that the SOURCE file and the DESTINATION file must NOT
be the SAME file! I.e., this is a serious error which typically
destroys your source document:
*** DANGER *** C>sort <abc.dat >abc.dat
###
III. Features common to all Tara Datafile Utilities
A. Record type is MailMerge-compatible
All Tara Datafile Utilities assume that the data they work with
is compatible with MicroPro, Inc.'s WordStar program -- in
particular with Merge Print (a.k.a. MailMerge in older versions).
That means, in general, that records consist of a single line of
characters terminated with a carriage return/line feed pair
("newline"), and that fields within each record are delimited by
commas, except for the last field. If a field contains a comma,
the field is enclosed in double quotation marks. If a field
contains both a comma and quotes, the field is enclosed in
apostrophes. If neither of these quotation schemes is adequate
to mark off the fields in the record, the delimiting character
can be changed from comma to something else.
For example:
"Oshel, Ph.D.",David,C.,1219 Harding,Ames,IA,50010,,,yes
There are THREE fields following the zip code comma in this
example. The last field is terminated by the same newline pair
that terminates the record. The first field contains a comma, so
that field is quoted.
Tara Datafile Utilities refer to the fields in a record by FIELD
NUMBER. The nth field in each record is to the immediate left of
the nth comma (or chosen delimiting character). In the example,
the word "Harding" lies in the fourth field, there is no home
phone number in the eighth field, and the word "yes" occupies the
last, tenth field. The ENTER program provides field numbers for
ready reference -- you don't HAVE to count your commas! You do
have to refer to fields by the numbers.
There is NO DISTINCTION between upper or lower case in any of the
Tara Datafile Utilities. This applies generally and everywhere.
However, upper and lower case IN DATA are preserved when data is
written to a new file or to the screen.
All of these utilities STRIP THE HIGH BIT out of characters when
writing data to standard output. This does not alter the source
data, but does produce a file which contains no "negative ASCII"
characters -- on IBM PC's, these are the letters with diacritical
marks, box characters, greek letters, etc. WordStar uses them to
hide formatting information in DOCUMENT MODE text.
This fixup allows PICK, TSORT and NSORT to work with data that
was inadvertently edited in WordStar document mode. You can
detect "funny letters" in your data by using the MS-DOS TYPE
command to examine your file. Negative ASCII characters would
prevent the pattern-matching utilities from finding "obvious"
matches. Tara Datafile Utilities do not alter your source data,
unless you write it back to the original file through a pipe.
Notice that MailMerge-type datafiles are somewhat idiosyncratic.
Another useful definition of an "ASCII data file" is that fields
with text data are quoted, fields containing quotes double the
quotation character (e.g., """" defines a field which contains a
single " character!), but numeric data is not quoted. Some
programs that support some version of this other definition are
BASIC and R:Base 5000. The difference lies in how to handle the
ambiguity caused by quoting the quote character. WordStar tries
to duck the issue by adding ' as another quote character.
However, there is no hard and fast rule or widely held standard.
These other formats are chiefly used only to import and export
data to and from expensive data base management programs, which
do not themselves use the format internally. "MailMerge format"
is extremely and immediately useful (to WordStar as well as other
programs) so that is the format chosen here. Data imported from
R:Base 5000 or dBase III ("DELIMITED BY ,") will probably be
acceptable to Tara Datafile Utilities, and probably acceptable to
WordStar Merge Print, but you may need to do some preliminary
fixing up. (If you, like me, have either of those pricey
database programs you're probably using Tara for the same reason
I do -- Tara is quick and dirty and easy to live with. But I
wouldn't do a fancy job with just these simple tools. Yet...!)
This kind of record has one distinct advantage -- it is variable
length -- and one obvious disadvantage -- it takes longer to get
information out of the file (especially numeric information). In
general, MailMerge-type data files should not contain more than a
a few hundred records, or processing time will be tedious. But
you can always PICK a smaller set of data from a larger file.
The largest record Tara Datafile Utilities can handle contains
4095 characters (with any number of fields within that limit).
###
B. Switches
The "-H" or "/H" switch always invokes a help screen for each of
the utilities.
Examples: C>enter -h
C>pick /h
C>fnkey -h
The "-F" or "/F" switch changes the field delimiter character
from comma to something else, in those utilities that read data.
Examples: C>pick -f\ <abc.dat 1 has aardvark
C>field <temp /f* /s. 1 12 3 8: | more
C>nsort "-f|" 16 d
But: C>fnkey -f9 "pick -f\ <abc.dat"
The "-S" or "/S" switch only occurs in FIELD.EXE, and is used to
change the fill character from BLANK to something else when
right- or left-justifying formatted output.
The "-E" or "/E", and "-D" or "/D", switches are used in CRYPTIC
and nowhere else, to indicate mode: encipher or decipher.
All program switches begin with either hyphen or slash. Switches
alter the usual behavior of the utility in some way. There are
only one or two switches at most for each utility, in addition to
the help switch.
Undefined switches usually, but not always, invoke the utility's
help message. FNKEY will not try to interpret any switches but
its own, viz., -H and the first instance of -F. Negative number
arguments are never interpreted as switches.
Note that a switch which contains <, |, or > must be QUOTED.
###
C. Standard Output
All of the utilities support redirected i/o using standard input
and standard output. This feature is not especially useful with
ENTER or FNKEY but it's there if you can figure out a use for it.
("Standard input" and "standard output" refer to the method which
a program uses to receive or transmit data. The "standard" way
to do things supports MS-DOS i/o redirection and pipes. The
various non-standard i/o schemes used by almost all commercial
programs are much faster ... and also much less flexible.)
###
D. ANSI.SYS, only for FNKEY.EXE
None of these utilities need to have ANSI.SYS installed on your
system -- except for FNKEY, a program that assigns meanings to
your PC's function keys. See your MS-DOS manual for instructions
on installing ANSI.SYS. What you need to do is place the command
DEVICE=ANSI.SYS into the CONFIG.SYS file on your startup disk.
###
E. Numeric and dollar data types
Utilities that recognize numeric data will correctly recognize
the dollar format, e.g., $123,456.78. A minus sign may appear
anywhere in its proper scan field. All numeric data is converted
to floating point for purposes of comparison. Scientific formats
like "1.3e-2" are supported, but "2e" is not considered a number
(because there is no digit following the E); the exponentiation
operator may be either E or D. Scientific and dollar notations
are mutually incompatible in the same field. The FIELD utility
relaxes the strict interpretation of what is a number, and allows
multiple hyphens, parentheses and slash -- this allows data like
social security numbers and phone numbers to be right-justified
in formatted output. All non-numeric data is Text.
###
IV. The CRYPTIC Program
Usage: C>cryptic { -D | -E } password <inputfile >outputfile
Examples: C>cryptic absinthe <myfile.dat >coded.dat
C>cryptic -D absinthe <coded.dat | more
The -D switch selects Decipher Mode. The -E switch selects
Encipher Mode, and is the default mode. You must supply the same
password each time you use Cryptic with a particular file.
*** WARNING: Do NOT forget your password! ***
This is a simple "filter" which scrambles the contents of text
files. Coded files are indecipherable by normal means. To
decode the scrambled file, run it through CRYPTIC once again
using the same password as before.
If you forget your own password, you're out of luck. The
password is not recoverable. Remember it! You should keep a
backup copy of any data file you use, in your possession and off
the premises. This anticipates the problem that someone might
maliciously encrypt your data for you.
This utility provides a first level of data security only. It
will prevent unauthorized access by average persons, but will not
withstand expert analysis.
Be advised that your physical disk medium probably retains an
image of some part of your plain text, even if you have erased
the file, unless you reformat the disk (after copying your coded
data to another disk, of course!).
Using a PIPE with plain text in it will also leave a transient
image on physical disk media! If security is crucial, use your
floppy drive and be sure to format the working diskette after a
session. This physical data image, not associated with any file,
can be viewed (and security compromised!) by any number of garden
variety programs including Norton Utilities and PC-Tools.
The encryption algorithm used here is more sophisticated than the
usual "xor" type of scramble. In its day, this cipher could not
be cracked, but no doubt things have changed a bit since the
Crimean War.
To illustrate the potential difficulty of cracking this cipher,
Tables 1 and 2 compare the frequency of occurrence of characters
found in a plain-text data file, against the frequency found in
an especially cryptic version of the same data.
The enciphered data file was run through Cryptic four times using
a different password each time. Ciphered data has "smeared" over
the range of all printable characters, while the gap between most
and least frequent characters is far less than in the plain text.
Both the plain text and the ciphered data contain 117 records and
a total of 8,602 printing characters. Tables follow.
Table 1. Frequencies of 79 characters found in a plain text
data file delimited by commas.
, = 1088, 12.6482% R = 48, 0.5580%
= 563, 6.5450% k = 46, 0.5348%
e = 532, 6.1846% v = 46, 0.5348%
r = 339, 3.9409% P = 41, 0.4766%
o = 335, 3.8944% H = 40, 0.4650%
a = 314, 3.6503% f = 40, 0.4650%
0 = 302, 3.5108% O = 40, 0.4650%
n = 275, 3.1969% ) = 37, 0.4301%
s = 268, 3.1156% ( = 37, 0.4301%
t = 260, 3.0226% b = 34, 0.3953%
2 = 251, 2.9179% * = 32, 0.3720%
i = 241, 2.8017% L = 30, 0.3488%
1 = 207, 2.4064% U = 28, 0.3255%
A = 204, 2.3715% W = 27, 0.3139%
l = 173, 2.0112% J = 24, 0.2790%
5 = 170, 1.9763% G = 23, 0.2674%
3 = 168, 1.9530% N = 23, 0.2674%
m = 139, 1.6159% E = 22, 0.2558%
C = 138, 1.6043% & = 20, 0.2325%
d = 129, 1.4997% T = 19, 0.2209%
4 = 122, 1.4183% x = 17, 0.1976%
- = 119, 1.3834% F = 16, 0.1860%
I = 118, 1.3718% ? = 16, 0.1860%
S = 110, 1.2788% / = 14, 0.1628%
9 = 104, 1.2090% K = 12, 0.1395%
8 = 104, 1.2090% V = 11, 0.1279%
c = 101, 1.1741% Y = 10, 0.1163%
u = 100, 1.1625% # = 7, 0.0814%
h = 94, 1.0928% z = 6, 0.0698%
6 = 92, 1.0695% ; = 4, 0.0465%
7 = 88, 1.0230% : = 2, 0.0233%
p = 79, 0.9184% ' = 2, 0.0233%
M = 69, 0.8021% X = 1, 0.0116%
. = 66, 0.7673% ! = 1, 0.0116%
y = 66, 0.7673% Z = 1, 0.0116%
D = 62, 0.7208% j = 1, 0.0116%
" = 62, 0.7208% q = 1, 0.0116%
B = 58, 0.6743% \ = 1, 0.0116%
g = 58, 0.6743% Q = 1, 0.0116%
w = 53, 0.6161%
Table 2. Frequencies of 95 characters found in a 4-ply cipher of
the same data.
0 = 155, 1.8019% p = 90, 1.0463%
. = 131, 1.5229% K = 89, 1.0346%
6 = 129, 1.4997% m = 88, 1.0230%
, = 127, 1.4764% w = 88, 1.0230%
% = 123, 1.4299% D = 87, 1.0114%
' = 123, 1.4299% H = 87, 1.0114%
# = 122, 1.4183% e = 86, 0.9998%
2 = 121, 1.4066% g = 86, 0.9998%
* = 121, 1.4066% 5 = 85, 0.9881%
+ = 119, 1.3834% o = 84, 0.9765%
v = 118, 1.3718% M = 83, 0.9649%
x = 117, 1.3601% ^ = 83, 0.9649%
( = 116, 1.3485% G = 83, 0.9649%
| = 116, 1.3485% n = 81, 0.9416%
8 = 115, 1.3369% > = 81, 0.9416%
3 = 114, 1.3253% ; = 80, 0.9300%
4 = 113, 1.3136% { = 79, 0.9184%
! = 113, 1.3136% J = 79, 0.9184%
~ = 112, 1.3020% Q = 78, 0.9068%
} = 111, 1.2904% N = 76, 0.8835%
" = 110, 1.2788% X = 76, 0.8835%
= = 109, 1.2671% a = 75, 0.8719%
r = 109, 1.2671% f = 75, 0.8719%
z = 109, 1.2671% l = 73, 0.8486%
C = 107, 1.2439% ` = 72, 0.8370%
) = 106, 1.2323% F = 71, 0.8254%
u = 106, 1.2323% Z = 71, 0.8254%
A = 105, 1.2206% j = 70, 0.8138%
& = 104, 1.2090% h = 69, 0.8021%
< = 103, 1.1974% \ = 69, 0.8021%
? = 101, 1.1741% k = 67, 0.7789%
E = 101, 1.1741% c = 67, 0.7789%
/ = 100, 1.1625% V = 66, 0.7673%
9 = 100, 1.1625% T = 65, 0.7556%
: = 97, 1.1276% d = 64, 0.7440%
$ = 96, 1.1160% L = 64, 0.7440%
y = 95, 1.1044% O = 64, 0.7440%
i = 94, 1.0928% U = 63, 0.7324%
1 = 92, 1.0695% [ = 61, 0.7091%
t = 92, 1.0695% R = 60, 0.6975%
@ = 92, 1.0695% S = 59, 0.6859%
= 92, 1.0695% W = 56, 0.6510%
q = 92, 1.0695% P = 56, 0.6510%
- = 91, 1.0579% Y = 55, 0.6394%
B = 91, 1.0579% b = 55, 0.6394%
I = 90, 1.0463% _ = 54, 0.6278%
7 = 90, 1.0463% ] = 52, 0.6045%
s = 90, 1.0463%
###
V. Errors and problems:
Problem: Nothing happens, the computer just sits there.
You have forgotten to name the <input file. As a result, the
program is correctly (but stupidly) waiting for you to type
characters on the keyboard. (Same as <CON:)
Solution: Type Ctrl-C or Ctrl-Break.
This problem does not occur in Version 2.0.
Problem: File not found.
You have not spelled the name of your <input file correctly, or
you have given an incorrect or incomplete path name.
Solution: Check the directory for exact spelling.
Problem: No result.
The first thing that should suggest itself is an empty data file.
As unlikely as this seems, it might be true. Your directory will
indicate 0 bytes for the file size, if this is the case. (When
you see the evidence, you will remember how you caused the
problem yourself sometime last week.) A doctor I know created
this situation by using WordStar to split data into two files; he
then DELETED all the records from the original file instead of
erasing it. It can take quite a while to diagnose this kind of
error, so go "by the book" -- check your file size as a matter of
course.
Secondly, you might be asking for something that does not exist.
Either PICK cannot find an exact match in a field, or FIELD
cannot find the 6th field in records that only have 5 fields,
etc.
Another possibility, you want PICK to find the pattern 150 but
PICK thinks 150 is a field number..! Use PICK 0 HAS 150 instead
of PICK 150. Also valid, PICK 0 150.
Solution: Check your directory. Check your spelling.
Try to PICK using the HAS operator instead of EQ.
Problem: Output contains garbage from a help screen.
You have run the output from a utility through a pipe into FIELD.
However, a command early in the chain bombed out and printed its
help message into the pipe. When FIELD got it, it tried to
format the output. The result is bits and pieces of a help
screen, possibly TSORT-ed!
Solution: Put complex commands in a batch file. Use FNKEY.
Problem: No room on the disk
MS-DOS pipes are actually FILES -- hidden, system-level files,
but files nonetheless. Every pipe you create will demand disk
space until its job is done. In addition, MS-DOS will allocate
at least one entire CLUSTER to a file no matter how small the
file actually is in byte count. Compare available disk space
before and after deleting a very small file. Are you surprised
by the result? Erasing one small file will free up at least 8k
of disk space! So a command chain that uses an input file, a
pipe, and an output file can demand up to THREE TIMES the size of
the original file by itself. And the actual amount allocated by
MS-DOS for any fractional part of a cluster is 8k -- that is, a
file with 8,193 actual bytes takes up 16k of space! Your
reaction to this bit of news is probably the same as mine was.
Solution: Erase some files and try again. Get a hard disk.
Get <input from Drive B, while logged onto Drive A.
###