home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Simtel MSDOS 1992 June
/
SIMTEL_0692.cdr
/
msdos
/
txtutl
/
breakup.doc
< prev
next >
Wrap
Text File
|
1986-04-26
|
5KB
|
101 lines
23-Apr-86 19:32:57-PST,5277;000000000001
Date: Wed, 23 Apr 86 20:04:12 EST
From: Edward_Vielmetti%UMich-MTS.Mailnet@MIT-MULTICS.ARPA
To: info-ibmpc@USC-ISIB.ARPA
Subject: Breakup.doc
Documentation for BREAKUP
December, 1983
Charles Roth
BREAKUP is a utility for "breaking up" large files on MSDOS systems. It
is intended primarily as a companion utility for The FinalWord editor, but it
may have other uses as well. BREAKUP was written in C by Charles Roth, and
is in the public domain.
Many text editors for microcomputers can only deal with files of a
certain maximum size. We know, by Murphy's Law, that some files will
always be larger than any given size. Thus, there is a need to be able
to break up large files in a convenient way. Also, when moving a collection
of files from one machine to another, it is sometimes easier to concatenate
all of the files together, ship them as one file, and then break them up
again.
BREAKUP uses Unix-ish command arguments to allow the user to break up
a file in a variety of ways. Each "command" is actually a pair of
arguments that specify the next place to break the file. The user can tell
BREAKUP to break after so many bytes, or after so many lines, or when a
particular string is encountered. The general syntax looks like:
BREAKUP File.Ext -C1 A1 -C2 A2 -C3 A3 etc....
where "file.ext" is the name of the file to be broken, and each "-Cn An" pair
specifies a breaking point (as described below). The pieces of the broken-up
file are put in the files File.000, File.001, and so on. Note that there is
usually one more file than the number of breaking points specified.
The command specifiers for the "-Cn An" pairs are:
-B nnnn Break after nnnn Bytes, where nnnn is a decimal number
-L nnnn Break after nnnn Lines, where nnnn is a decimal number
-S string Break after the next occurrence of "string"
-LB nnnn Break at the first end-of-line after nnnn bytes
-LS string Break at the first end-of-line after next occurrence of "string"
-R Repeat the last command specifier indefinitely
EXAMPLES:
BREAKUP File.Ext -b 1000 -b 1000
breaks "file.ext" into three pieces. File.000 would contain the first 1000
bytes, File.001 would contain the second 1000 bytes, and File.002 would
contain everything else that was in File.Ext.
BREAKUP File.Ext -l 1000 -r
would chop File.Ext after every 1000 lines. (The last piece might be
smaller than 1000 lines, of course.)
BREAKUP File.Ext -l 200 -s Mom -s "Apple Pie"
breaks File.Ext at 3 points: at the (end of the) 200th line; at the next
occurrence of the string "Mom" in the text; and at the first occurrence of
the string "Apple Pie" after "Mom". (Quotes are optional and are not part
of the string searched for. They are required if the string contains one
or more blanks.)
NOTES:
1) Breaking at a point is inclusive. That is, breaking at 200 bytes
means the first piece will contain the 200th byte. Ditto for lines and
strings, i.e. breaking at "Mom" means the piece will end with "Mom".
2) The size of a file in bytes has two slightly different meanings.
To programs written in C (BREAKUP, FinalWord) the end of a line is marked by
a single character. Inside MSDOS, the end of a line is marked by the two
characters Carriage-Return and Line-Feed. Thus, breaking off at piece at
100 bytes may result in a file that (according to DIR) is slightly larger.
3) The -s strings may include control characters. Of course, you
can't just type the control characters as part of the -s string; MSDOS will
try and interpret them right away. So instead, BREAKUP uses a special
notation (borrowed from the C language) for control characters that always
begins with a "\" (backslash). Similarly, since " and \ already mean
something special, we must have a way to represent a single " or \.
These special notations are listed below.
\ddd is the character with the OCTAL value ddd. Must be 3 digits.
\\ is a single backslash
\" is a double-quote character
\n is a newline (end-of-line character)
The last sequence is particularly useful. Breaking at -s "\nA" would mean
"break at the next place where there is an A at the beginning of the line".
Warning: do NOT try to break about a null character, i.e. \000. Since the
C string routines use \0 as a string terminator, BREAKUP will not understand
its use as a breakpoint.
4) BREAKUP prints out the filenames of the pieces as they are produced.
You can redirect this output to a file, if you wish, by placing
>filename
after the list of breakpoint specifiers. (You do not need MSDOS 2.x to do
this.)