home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
CP/M
/
CPM_CDROM.iso
/
cpm
/
utils
/
squsq
/
compress.txt
< prev
next >
Wrap
Text File
|
1994-07-13
|
13KB
|
249 lines
How to Use File Compression
Rick Weerts
Fred, a novice modem user, heads towards the download section of his
favorite Bulletin Board System. He pulls off about four files and logs
off an hour later.
Fred then attempts to execute the programs, expecting marvelous things
on his microcomputer. But no matter what he does or who he calls, the
files he received refused to work.
Fred is the victim of file compression. Numerous utilities exist today
which allow the modem user to compress a file before transmission to a
board or another user. This compression saves on users' time and phone
bills, and the BBS itself receives more room for download files.
Nevertheless, unless the files are converted properly after they are
downloaded, they are useless, as the computer cannot read them. This
article attempts to cut through some of the fog regarding file
compression and its effects on the bulletin board community...
The Concept
-----------
Most files on a bulletin board (and most others on microcomputers in
general) make use of a lot of text (alphabetic) information. This
information (when stored in a standard format known as ASCII) can be
analyzed by virtually any other computer system operating today.
However, with the invention of 8 bit protocols that are used for most
telecommunications, this storage format is wasteful. It is also much
more expensive when telephone usage is paid for by the hour.
Before I go on, it is important to explain the bit protocols and the
information they attempt to represent. As most people know, each piece
of information in a computer is coded with a series of 1's and 0's. The
computer can act and interpret on these codes. A combination of seven
1's and 0's is called a 7 bit protocol (one bit for each number in the
combination). This protocol makes use of the ASCII character standards.
For you math wizards out there, the combination of seven binary bits of
1's and 0's allows for 128 numeric combinations. Each number in the 128
numeric combinations stands for a specific alphanumeric code. These
codes much up the ASCII table and include all alphabetic and some
special characters the computer generates. The numeric representation
of characters is what the computer deals in, not the characters
themselves.
However, my IBM can generate 256 CHR$(x) codes! And I just said there
were only 128?
IBM and most other modern computer makers have also included a 8th
(high) bit in the character codes for their system. This additional bit
adds another 128 possible combinations (1+2+4+8+16+32+64+128) to the
previous 128 on the original ASCII table. The additional 128 numerals
can represent additional symbols or even block graphic characters.
I am not going to get into the specifics on bit protocols. The idea
behind file compression (squeezing, compressing, crunching, whatever you
want to call it) is to make use of these extra 128 characters that are
used less than the first 128. These extra characters can stand for
twelve spaces, sixteen hyphens, or whatever else the compression
software allows. This coding allows more information to be included in
the same amount of space. Since the computer is using all 256
characters in such an environment, you must transmit using the 8 bit
modem protocol (8-N-1, referring to eight bits, no parity, and one stop
bit).
Putting it into Action
----------------------
When a file is compressed, it passes through a utility program and the
"white space" is deleted from the source document, resulting in a
compact file. In some cases, this space savings can be more than 50% of
the previous file size. However, the file is now in a format that the
computer cannot read "stand-alone". It requires special software to
interpret how to unsqueeze the file into standard storage specs. In
addition, the file may be stored in a library or archive before or after
it is compressed. The file is stored together with other files under
one single filename on the disk. This process can then be reversed by
the end user. The time savings involved in this clever process becomes
readily apparent. Compare the time it takes to download one 200K file
versus twenty 10K files, each with it's own distinct filename.
File Extensions
---------------
Most disk files have an eight letter name and a three letter extension
(separated by a period or slash). Often, the extension indicates what
method of compressing or archiving was used. Below are some common file
extensions BBSs and archiving software uses to denote files which use
packing techniques.
File Name Packing Method
=============================================================
FIREFLY.EXE Original File None
FIREFLY.EQE Squeeze (SQ)
FIREFLY.LBR Library (LU)
FIREFLY.LQR (SQ), (LU)
FIREFLY.ARC Archive (ARC)
FIDONWS.DOC Original Text File None
FIDONWS.DQC Squeeze (SQ)
FIDONWS.LBR Library (LU)
FIDONWS.LQR (SQ), (LU)
FIDONWS.ARC Archive (ARC)
BOOMERS.BAS Original BASIC Program Binary
BOOMERS.BQS Squeeze (SQ)
BOOMERS.LBR Library (LU)
BOOMERS.LQR (SQ), (LU)
BOOMERS.ARC Archive (ARC)
Squeeze and Unsqueeze
---------------------
SQ and SQPC are two of the first available software packages designed to
compress files into the smallest possible form. AUSQ, UNSQ, and NUSQ
are their counterparts. They put files back into expanded format on
request. Squeezed files usually have a Q as the second letter of their
three-letter file name extension. Simply typing AUSQ or SQPC alone on a
command line at the DOS level brings up a small help screen that shows
how to operate the system. Since there are many such programs on the
market, my object is to explain the concept behind them, not how a
specific package operates. Nevertheless, the documentation on these
packages is usually enough to operate it successfully.
The Data LIBRARY
----------------
LU.EXE is the original library (LBR) utility. It allows the packing
(and unpacking) of files into one large file. LUed files usually have a
LBR file extension. In the same respect, a LQR extension indicates that
the file must be unsqueezed (using AUSQ or NUSQ) BEFORE it is converted
with the library utility. Also, libraries may consist of one or more
libraries residing inside of each other, some squeezed beforehand. As
you can see, the standards for the file extensions are important to
follow when dealing with such a variety of systems.
Usually, typing LU will give a brief command line summary of the
function. The library utility usually provides a command which will
remove all the files from the library file to stand-alone files. The
LBR file will then serve as a compressed backup of the information you
have unpacked.
It is usually handy to unpack a library by putting it in its own
subdirectory (DOS MKDIR Command). In this way, it becomes clearly
evident which files have been removed from the library. You will not
get them confused with other files with similar or identical names. You
can then move the files (DOS COPY Command) outside of the subdirectory
or onto the disk of your choosing.
Archiving Systems
-----------------
Finally, we come to ARC.EXE, short for Archive. This handy little
utility takes all the guesswork out of squeezing and packing files into
an archive (or library). ARC automatically decides the best way to
compress a file and then adds it to the archive. ARC also unpacks the
file in the same way, eliminating the squeeze step of the process. The
archive utility is compact and it makes the other packing schemes
obsolete. Of course, if you have only one file to pack, you may only
want to squeeze it. In this case AUSQ comes into play.
Typing ARC at the DOS command line prompt causes the program to supply
you with an informative help screen. To unpack all the files from an
archive, you type ARC E archive_name. Addition of files to an archive
is just as simple.
Once a file has been archived, there is no need for further squeezing.
The file has been squeezed as tightly as possible and any further
attempts at compression will only add to the file's size. Files with
the file name extension .ARC are archive files.
The instructions about putting the archive data files in separate
directories still stands. This technique certainly makes for a much
easier time of packing and unpacking.
What All This Means to Me
-------------------------
Archiving and squeezing are not requirements (in most cases) before a
file before is transmitted to a bulletin board. However, most BBS
system operators will ARC or squeeze the files they receive from users.
Compressing the files ahead of time saves time for the sysop and also
allows more room on the BBS for additional download files, good for
everyone involved. Also, a sysop of a Commodore board, for example,
will probably not have the capability to squeeze files intended for IBM
systems.
Compressing files is also a good idea from the KISS (Keep It Simple,
Stupid!) concept of file transfer. All the files necessary for a
software system to run should be placed under one archive name. The
users of the BBS are much more likely to get a working system than if
they have to sift through 500 files until they find all the correct ones
for that particular system. It also allows for easy updates when you
improve the software. Archiving makes sure there is only one filename
to delete and one to add.
Finally, compressing files SAVES MONEY! Compressed files can be shrunk
more than 50%, cutting AT&T's share of a long distance call in half.
They are also convenient for pay services such as CompuServe or The
Source, where they make sure every second costs. And it saves money in
both directions, as both the sender and receivers benefit from lower
bills.
So the next time you send a file to your favorite BBS, do everyone a
favor and do a squeeze play on Ma Bell.
Other Confusing Items
---------------------
After writing this piece, I noticed that I fluttered for one verb usage
to another with uncanny regularity. So below is a list of terms and
their (my) definitions.
Archive File(s) that are squeezed and lumped under a
single heading by the ARC.EXE package.
ASCII Seven bit protocol standard agreed upon by
all major microcomputer manufacturers.
Compress To make a file smaller by shrinking the
space the file occupies.
Crunch Same as compress.
Library File(s) lumped under a single heading by
LU.EXE or a similar package.
Pack Placing numerous files under a single
heading by either the ARC or LU utilities.
Protocol Code of 1's and 0's indicating
characters stored in a computer's
memory.
Squeeze Same as compress.
Unpack Remove from library or archive one or more
sections into stand-alone files.
Unsqueeze Return file to original structure
I hope you will find the above article helpful. If you have any
questions, comments, additions, corrections, gripes, etc., please send
them to me. I will make an attempt to respond as soon as possible. Try
to leave a Fido, GEnie or CompuServe address.
December 4, 1985
Rick Weerts