home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
CP/M
/
CPM_CDROM.iso
/
cpm
/
utils
/
squsq
/
squeezer.doc
< prev
next >
Wrap
Text File
|
1994-07-13
|
24KB
|
545 lines
USAGE AND RECOMPILATION DOCUMENTATION FOR: 8/29/81
SQ.COM 1.5 File squeezer
USQ.COM 1.5 File unsqueezer
FLS.COM 1.1 Ambiguous file name expander
DISTRIBUTION RIGHTS:
I allow unrestricted non-profit distribution of this
software and invite users groups to spread it around.
However, any distribution for profit requires my permission
in advance. This applies only to the above listed programs
and their program source and documentation files. I do sell
other software.
PURPOSE:
The file squeezer, SQ, compresses files into a more compact
form. This provides:
1. Faster transmission by modem.
2. Fewer diskettes to distribute a program package.
(Include USQ.COM and instructions, both unsqueezed.)
3. Fewer diskettes for archival storage.
Any file can be squeezed, but program source files and text
files benefit the most, typically shrinking by 35%. Files
containing only a limited character set, such as dictionary
files, may shrink as much as 48%. Squeezed files look like
gibbersh and must be unsqueezed before they can be used.
The unsqueezer, USQ, expands squeezed files into exact
duplicates of the original or provides a quick, unsqueezed
display of the tops of (or all of) squeezed files.
Unsqueezing requires only a single pass.
Both SQ and USQ accept batches of work specified by lists of
file names (with drives if needed) and miscellaneous
options. They accept these parameters in any of three ways:
1. On the CP/M command line.
2. From the console keyboard.
3. From a file.
The FLS program can be used (on the same command line!) to
expand parameter lists containing wild-card (ambiguous) file
names into lists with the specific file names required by SQ
and USQ.
This combination of programs allows you to issue a single
command which will produce many squeezed or unsqueezed files
from and to various diskettes. For example, to unsqueeze all
squeezed ASM files on drive B and send the results to drive
C and also unsqueeze all squeezed TXT files on drive A and
send the results to drive D:
A>fls c: b:*.aqm d: *.tqt |usq
For detailed instructions see USAGE.
This DOES run under plain old vanilla CP/M! Many of the
smarts are buried in the COM files in the form of library
routines provided with the BDS C package (available from
Lifeboat).
The above example simulates a "pipe" (indicated by the "|")
by sending the "console" output of the fls.com program to a
temporary file and then running the sq.com program with
options which cause it to read its parameters from its
"console" input, which is really redirected to come from the
temporary file.
Note that programs written in BDS C tend to be GOable. That is
if you do A>save 0 GO and run a C program (just one - no pipes)
then you can rerun it without reading it from disk by using GO
as its name and giving the usual parameters. This works because
BDS C doesn't support initialized static variables. The program
has to initialize everything dynamically, so it cleans up for
each rerun.
THEORY:
The data in the file is treated at the byte level rather
then the word level, and can contain absolutely anything.
The compression is in two stages: first repeated byte values
are compressed and then a Huffman code is dynamically
generated to match the properties of each particular file.
This requires two passes over the source data.
The decoding table is included in the squeezed file, so
squeezing short files can actually lengthen them. Fixed
decoding tables are not used because English and various
computer languages vary greatly as to upper and lower case
proportions and use of special characters. Much of the
savings comes from not assigning codes to unused byte
values.
More detailed comments are included in the source files.
USAGE TUTORIAL:
As usual, you have to learn how to tell the programs what to
do (i.e., what parameters to type after the program name).
First I will introduce the various possibilities by example.
Then I will summarize the rules.
In the simplest case either SQ or USQ can simply be given
one or more file names (with or without drive names):
A>sq xyz.asm
A>sq thisfile.doc b:thatfile.doc
will create squeezed files xyz.aqm, thisfile.dqc and
thatfile.dqc, all on the current drive, A. The original
files are not disturbed. Note that the names of the squeezed
files are generated by rules - you don't specify them.
Likewise,
A>usq xyz.aqm
will create file xyz.asm on the A drive, overwriting the
original. (The original name is recreated from information
stored in the squeezed version.) The squeezed version is not
disturbed.
Each file name is processed in order, and you can list all
the files you can fit in a command. The file names given to
SQ and USQ must be specific. You will learn below how to use
the FLS program to expand patterns like *.asm (all files of
type asm) into a list of specific names and feed them into
SQ or USQ.
The above examples let the destination drive default to the
current logged drive, which was shown in the prompt to be A.
You can change the destination drive as often as you like in
the parameter list. For example,
A>sq x.aqm b: y.aqm z.aqm c: d:s.aqm
will create x.aqm on the current drive, A, y.aqm and z.aqm
on the B drive and s.aqm on the C drive. Note that the first
three originals are on drive A and the last one is on drive
D. Remember that each parameter is processed in order, so
you must change the destination drive before you specify the
files to be created on that drive.
Eventually you will have diskettes with many squeezed files
on them and you will wonder what is in which file. If they
weren't squeezed you would use the TYPE command to look at
the comments at the beginning of the files. But squeezed
files just make a mess on your CRT screen when you TYPE
them, so I have provided the required feature as a preview
option to the USQ program.
A>usq -10 x.bqs b:y.aqm
will not take the time to create unsqueezed files. Instead
it will unsqueeze the first 10 lines of each file and
display them on your console. The display from each file
consists of the file names, the data and a formfeed (FF).
Also,
A>usq - c:xyz.mqc
will unsqueeze and display the first 65,535 lines of any
files listed. That's the biggest number you can give it, and
is intended to display the whole file.
This preview option also ensures that the data is
displayable. The parity bit is stripped off (some Wordstar
files use it for format control) and any unusual control
characters are converted to periods. You'll see some of
these at the end of the files as the CP/M end of file is
treated as data and the remainder of the sector is
displayed.
You are now familiar with all of the operational parameters
of SQ and USQ. But so far you have always typed them on the
command line which caused the program to be run. For reasons
which will become apparent later, I have also provided an
interactive mode. If there are no parameters (except
directed i/o parameters, described later) on the command
line, SQ and USQ will prompt with an asterisk and accept
parameters from the console keyboard. Each parameter must be
followed by RETURN and will be processed immediately. An
empty command (just RETURN) will cause the program to exit
back to CP/M. Try it - it will help you understand what
follows.
Now lets get into directed i/o, which will be new to most of
you, but will save you so much work you will wonder how you
ever got along without it.
Perhaps you frequently squeeze or unsqueeze the same list of
files and you would like to type the list once and be done
with it. Use an editor (or FLS, described below) to create a
file with one parameter per line. For example call it
commands.lst.
Then,
A>sq <commands.lst
will cause the command list file to be read as if you were
typing it!
That was redirected console input. Now assume that you have
a very long list of files to squeeze or unsqueeze and while
you are taking a nap the progress comments and maybe some
error comments scroll off the screen. Redirecting the
console output will let you capture the progress
information in a file so you can check it later. The error
comments will have the screen to themselves.
For example,
A>sq <commands.lst >out
will send the progress comments to the file "out", which you
can TYPE later. The routine display of the program name and
version, etc., will still go to the console.
A more practical example is to send that information to the
console and to the file.
A>sq <commands.lst +out
will do that.
Redirected input and output are independent - you can do
either, both or neither.
There is one more form of redirection called a "pipe". It is
by far the most important to you. Recall that I promised to
tell you how to use ambiguous file names such as *.asm (all
files of type asm on the current default drive) or *.?q?
(all files having a "q" as the second letter of their type).
That last example just happens to mean "all squeezed files",
assuming you don't have any other files with such a silly
name (I hope).
I have provided a program called FLS which is intended
primarily for use in pipes. Here is an example:
A>fls c: x.asm y*.asm >temp.$$$
will simply pass the first two parameters through to the
console output, which is being redirected to a file called
temp.$$$. But the third parameter will be replaced by all
the files on the current drive which are of type asm and
have names beginning with y.
FLS is smart enough to know that a letter followed by a
colon and nothing else is a destination drive name intended
for SQ or USQ. It will also treat any parameter beginning
with a - (minus sign) as an option to be passed through.
Anything else is considered a file name or pattern and is
checked against the directory of the appropriate drive.
Therefore you could use:
A>fls b: c:*.aqm *.aqm -10 stuff.dqc >temp.$$$
A>usq <temp.$$$
A>era temp.$$$
to unsqueeze all files of type aqm on drives C and A and put
the unsqueezed files on drive B, and then preview the first
10 lines of file stuff.dqc.
Here is where the pipe comes in. The above three commands
can be abbreviated as:
A>fls b: c:*.aqm *.aqm -10 stuff.dqc |usq
That little "|" is the pipe option and it causes the FLS
output to be redirected to a temporary file and when that is
done it actually runs USQ for you with the proper input
redirection and then erases the temporary file.
If that isn't enough, you can still use the + or >
redirection option at the end of that line to capture the
console output from USQ.
A>fls b: c:*.aqm *.aqm -10 stuff.dqc |usq >out
If you plan your comments carefully you can produce a single
file containing an abstract of an entire library of squeezed
files in one step!
A>fls -25 *.?q? |usq >abstract
One final point. Anywhere you specify a file name you can
specify a drive in front of it. That applies to redirection
and well as files to be squeezed and unsqueezed. If a name
begins with a - (minus sign) it will look like an option to
FLS unless you put a drive name in front of it (b:-sq.077).
USAGE SUMMARY:
The previous section gradually presented the various options
by example. This section gives a condensed and more abstract
description and is intended for reference. If you couldn't
see the forest for the trees, maybe this will give you a
better view.
The parameter handling of these programs is straightforward.
Parameters fall into two classes: directed i/o options and
operational parameters . Note that parameters read from files
or from the console are not forced to upper case, but the
internal file handling routines all treat lower case as
upper case.
When a file to be written already exists, it is quietly
overwritten.
Directed I/O parameters:
The first action taken by these programs is to process
directed i/o parameters from the CP/M command line. These
parameters are optional and take the forms:
<file read console input from file
>file send most console output to file
+file send most console output to file and console
|pgm ... send most console output to a temporary file
then run PGM.COM and take console input
from the temporary file. "..." represent the
parameters for PGM. This is called "piping".
Only one input and one output redirection can apply to each
program. After the program has arranged for any directed i/o
parameters to be obeyed they are deleted from the parameter
list seen by the rest of the program.
Operational parameters:
The program then checks if there are any remaining
parameters from the CP/M command line. If there are, they
are obeyed. If and only if there are no remaining parameters
on the command line, the program prompts for them at the
console. If console input has been directed to a file one
parameter is read and obeyed from each line of the file.
Otherwise, the user follows each typed parameter with a
RETURN and an empty command exits the program.
Each operational parameter is obeyed without looking ahead
to other parameters, so options should precede the file
names to which they apply.
SQ operational parameters are a list of the following types:
drive: set the current destination drive
filename file to be squeezed
drive:filename " " " "
- Toggle debug mode (dumps tables)
SQ does not change the files being squeezed. New, squeezed
files are created on the destination drive (defaults to the
current drive) with names derived from the original name but
with the second letter of the file type (extention) changed
to Q. When there is no type, QQQ is used. The original name
is saved in the squeezed file.
USQ operational parameters are a list of the following
types:
drive: set the current destination drive
filename file to be squeezed
drive:filename " " " "
-count Preview (display on the console) the first
"count" lines of each file, where
"count" is a number from 1 to 65535.
If the -count option IS NOT in effect then USQ creates
unsqueezed versions of the listed files on the destination
drive, which defaults to the current logged drive. Each
unsqueezed file is CRC checked against the CRC value of the
original file, which is part of the squeezed file.
The -count option is for previewing squeezed files. It
allows you to skim through a group of squeezed files,
peeking at the first "count" lines in each. The > or +
output redirection option could be used to capture this
information in a file, along with the corresponding file
names, thus forming an abstract of the files on a disk.
When the -count option is used the CRC check is cancelled
and the output is forced into printable form by stripping
the parity bit and changing most unprintable characters to
periods. The exceptions are CR, LF, TAB and FF. The output
from each file is terminated by an FF. PIP can be used to
strip FFs and provide formatted printing if desired. "Count"
defaults to the maximum value, 65,535, in case you want to
look at a whole file.
FLS operational parameters: FLS is a "filter", which means
it accepts input from the console input or command line and
transforms the input according to a set of rules to produce
console output. That's fine for getting familiar with FLS,
but to make it useful you "pipe" its output to the input of
SQ or USQ.
Any FLS parameter which is of the form:
drive:
or -anything
is copied to console output unchanged.
Any other FLS operational parameter is treated as a file
name and is checked against the directory of the appropriate
drive. If it contains * or ? it is replaced by a list of all
the files which fit the pattern. If nothing is found in the
directory an error comment is sent to the console, even if
normal console output has been redirected to a file.
IMPORTANT: when using a pipe from FLS or any other input
redirection to get the file list, etc., on which USQ or SQ
are to operate you must NOT put any parameters other than
redirection following the program name. The operational
parameters must be all together in the input parameter list.
Example:
A>fls -10 b:*.cq |usq +saveout
is the proper way to preview the top (first 10 lines) of
each squeezed .C file on the B drive. The -10 is passed
through FLS to USQ. The results will be displayed on the
console and saved in file "saveout" on the A drive. The
saveout file lets you confirm the list of processed files
even if the display scrolls off the screen while running
unattended.
In summary, i/o redirection parameters (those prefixed by +,
<, >, or |) always follow the command to which they apply,
but operational parameters (destination drive, -options)
must be with the file name list.
EXAMPLES:
1. Unsqueeze all squeezed files on the current drive and put
the resulting unsqueezed files on the same drive.
A>fls *.?q? |usq
2. Look at the first 10 lines of every squeezed file on
drive B.
A>fls -10 b:*.?Q? |usq
note that since the file names for USQ came from FLS, the
count option had to come from there too.
4. Squeeze all .ASM files on the B and C drives and put the
squeezed files on the D drive.
A>fls d: b:*.asm c:*.asm |sq
Note that if d: had not been first the squeezed files would
have gone to the A drive.
5. Squeeze file xyz.c on the A drive and put the results on
the A drive.
A>sq xyz.c
6. Build a parameter list of all ASM files on drive C in
file XX.PAR and view it on the console.
A>fls c:*.asm +xx.par
7. Use the above list to squeeze the files to the A drive.
A>sq <xx.par
8. As above, but results to the B drive.
A>b:
B>a:sq <a:xx.par
9. Squeeze all ASM and C files on the A drive and put the
results on the B drive. Capture the progress comments in the
file "out" without displaying them.
A>fls b: *.asm *.c |sq >out
10. Preview the first 24 lines of each squeezed ASM file
THEN unsqueeze them (unless stopped via cntl-C).
A>fls -24 *.aqm a: *.aqm |usq
Note that specification of a destination drive cancels
previewing.
RECOMPILATION:
These programs are written in C and the instructions are for
the BDS C compiler. The libraries must have been adapted for
directed i/o as described in DIO2.C.
The procedures below indicate the various C language source
files (file type .C) required to recompile. Those files
contain #include statements which cause header files (file
type .H) to be read and compiled. The BDSCIO.H header file
contains information about your system, including how much
space to reserve for file buffers. You should use your own
version of this file.
The source files DIO2.C, SQDIO.C and USQDIO.C are identical!
If you only get one, just use PIP to create the rest. They
are separate only to provide separate CRL files, which are
needed because of the different external variable options.
Note that they do not include all the header files,
therefore the other source files must include the dio
related headers first.
DIO.C is supplied with BDS C. The above three files differ
from the official version only by a change to the dioflush
function to ensure TEMPIN.$$$ is deleted before another file
is renamed to that name. (CP/M is stupid enough to make two
files of the same name!).
The procedure for building the SQ.COM, USQ.COM and FLS.COM
files from their source files follows. Note that I have
renamed the first phase of the BDS C compiler to CC.COM.
Also I will assume the BDS C package is on drive D and the
SQ and USQ related files are on B along with BDSCIO.H and
DIO.H.
Each CC command produces a CRL file with specific addresses
for external variables. If you recompile a file with the
same value in the -e option you don't have to recompile the
other files, just do the desired CC and then repeat the
entire CLINK.
CLINK's -s option prints statistics. Top of memory means the
current TPA. Stack space is what's left over. These programs
require stack space for local variables, including some
healthy i/o buffers. Also some functions are recursive. If
SQ doesn't have several K of stack space it will probably go
crazy and do almost anything.
If you have .CQ and .HQ files instead of .C and .H files you
must use USQ, probably with FLS, as described above to make
the .C and .H files.
For SQ (note not all use -o):
D>cc b:sq.c -o -e3600
D>a:pip b:sqdio.c=b:dio2.c
D>cc b:sqdio.c -e3600
D>cc b:tr1.c -o -e3600
D>cc b:tr2.c -o -e3600
D>cc b:io.c -o -e3600
D>cc b:sqdebug.c -e3600
D>clink b:sq sqdio tr2 tr1 io sqdebug -s
The linker will display some statistics. Check that the last
code address is less than the start address of the external
variables (3600 in this example). If not, repeat the above
with a higher address in the -e options.
For USQ (note not all use -o):
D>cc b:usq.c -o -e2900
D>a:pip b:usqdio.c=b:dio2.c
D>cc b:usqdio.c -e2900
D>cc b:utr.c -o -e2900
D>clink b:usq usqdio utr -s
Check the addresses as described above.
For FLS:
D>cc b:fls.c
D>cc b:dio2.c
D>clink b:fls dio2
IN CASE OF TROUBLE:
I welcome suggestions and bug reports, but you must
understand that some of the ideas I get would involve almost
as much program development as the original package. I have
what I want and (I hope) what most users want, so I am not
motivated to spend many more months creating something
entirely different which just happens to involve data
compression. The data compression routines are probably less
than half of this package, and are designed to operate on
large blocks of data, such as files.
The - option recently added to SQ can be used to dump critical
tables if you are having trouble and need to ask for help. Just
run the program with control-P on the command line to get hard
copy. The last table gives the lengths of the bit codes used.
Dick Greenlaw
251 Colony Ct.
Gahanna, Ohio 43230
614-475-0172 weekends and evenings