home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Simtel MSDOS 1992 December
/
simtel1292_SIMTEL_1292_Walnut_Creek.iso
/
msdos
/
info
/
dostips2.arc
/
DOSFLTRS.TXT
< prev
next >
Wrap
Text File
|
1985-11-24
|
26KB
|
509 lines
Designing DOS Filters
(PC Magazine Vol 3 No 20 Oct 16, 1984 M. Abrash/D. Illowsky)
The most useful utility programs are not necessarily the
most complex or powerful. A simple utility can be very handy if
it saves a few minutes a day, or if it lets you perform a needed
function with a minimum of effort. DOS versions 2.0 and higher
provide three programs in the form of filters MORE, FIND and SORT
that make it easy to manipulate data files and to pass information
between programs. Only a few filters are provided with PC-DOS,
but new features, such as enhanced batch file processing and the
redirection of I/O, make it a snap for you do design your own
filter programs for various uses.
We present two "home-made" filters: one filter guarantees
that all carriage returns in a file are paired with linefeeds,
while the other ensures that a file has an end-of-file marker.
These filters are elegantly simple and run with the speed of
assembly language, and are fully functional and easy to use.
DOS 2.0 lets the user send input to a program from any file,
just as if that input had been typed at the keyboard. This is
known as redirection of the standard input. The standard input
defaults to reading from the keyboard, but a less-than sign (<)
on the command line is all that's required to redirect the standard
input away from the keyboard. For example, the command line:
LINK < LINKFILE.DAT
runs the LINK program, taking the instructions for the linker from
the file LINKFILE.DAT.
The standard output from any program -- that is, the
interactive output that normally goes to the screen -- can likewise
be redirected to any file by using a greater-than sign (>). For
instance, the command line:
TREE > SUBDIR.LST
sends the list of all the subdirectories on the default disk to
the file SUBDIR.LST.
For both input and output, the default standard device is
the console device, CON:. On input, the console is the keyboard,
and on output, it is the video display.
A new feature introduced with DOS 2.0 is the filter. A filter
is a program that accepts information from the standard input,
modifies that data in some way, and then sends the transformed
information on to the standard output. For example, the FIND
filter, provided with DOS, accepts input from any text file and
passes on to the standard output only those lines of text that
contain the string of characters you specify. This allows you
to pick out certain lines of interest. Either one or both of a
filter's input and output may be redirected away from the console
to any file.
You can visualize a filter as sitting between the standard
input and standard output; it modifies the information passed
from the input to the output according to a unique set of rules.
As an example, you should look at one that filters all bare
carriage returns into carriage return/linefeed (CR/LF) pairs.
Many users have been frustrated trying to use a file with
only a bare carriage return marking the end of each line, rather
than the CR/LF pair that most DOS programs require. This problem
is particularly common when working with files transferred from
other computers via a modem or direct connection. For example,
files transferred from an Apple II typically contain no linefeeds
and cannot be properly listed or used with most IBM software
without being modified. In fact, both EDLIN and WordStar treat
such a file as if it consisted of one long line.
In the past, programs to fix files that contained bare
carriage returns could be written in BASIC, but these were
agonizingly slow. Alternatively, such programs could be written
in assembly language, but it was no small undertaking. The
redirection features and new functions provided by DOS 2.0 make it
simple to design a compact, easy-to-use filter program that changes
all bare carriage returns to CR/LF pairs with the speed of assembly
language. The great advantage of filters is that they make it easy
to massage information as it passes between programs and to perform
a whole series of file manipulations with a single command line.
A BASIC program is provided to create the filter program
CRLF.COM. The file CRLF.COM will be present on the default disk
and ready for use.
To use the CR/LF filter, you redirect the input from the
file with bare carriage returns and redirect the output to the
file in which you want to store the corrected text. If you do
not redirect the output, the corrected text is displayed on the
screen. We strongly suggest that you don't filter a file back onto
itself because this action simply destroys the original file.
For example, if you try to type file BARECR.TXT, which has a
program listing with each line terminated with a bare carriage
return, then each line will overwrite the previous line because
there are no linefeeds to advance the cursor to the next row of
the screen. This is easily set right with the command line:
CRLF < BARECR.TXT
When executed, this command reads all the characters from the
file BARECR.TXT, changes all bare carriage returns to carriage
return/linefeed pairs, and sends the corrected text to the screen
which is the default standard output. Because all carriage
returns have been paired with linefeeds, the text will display
legibly on separate lines.
Similarly, the command line:
CRLF < BARECR.TXT>CRLFPAIR.TXT
takes input from the file BARECR.TXT, passes it through the CR/LF
filter to correct all bare carriage returns, and sends the
corrected text on to the file CRLFPAIR.TXT. You can then use the
file CRLFPAIR.TXT as you would any normal DOS file.
That's really all there is to using the CR/LF filter. A
single command line, with redirection of the standard input and
output, ensures that every carriage return in any file is properly
paired with a linefeed. CR/LF works well with the piping features
of DOS 2.0 as well.
One nice feature of the CR/LF filter is that any carriage
return that is properly paired with a linefeed is left alone. You
can filter either a normal file or one that has both bare and
paired carriage returns, and no harm will be done to the carriage
returns already paired. However, some programs that set high
bits may make linefeeds unrecognizable to CR/LF. Files created by
such programs should first be passed through another filter to
strip the high bits. Alternatively, you could modify CR/LF to
ignore high bits.
Let's create a small file with only bare carriage returns so
that we can see why the CR/LF filter is needed and how it works.
Use the DEBUG program shown below to create the file TEXTCRLF.DAT
on the default disk, containing four lines of text -- each
terminated with a bare carriage return. To verify that there are
no linefeeds in this file, enter the command line:
TYPE TESTCRLF.DAT
You will see that text lines display one atop the other, so only
the last line is visible. If you edit this file, you may find it
does not display properly; EDLIN, for example, does not treat the
lines as separate.
Now enter the command line:
CRLF < TESTCRLF.DAT
to pass this file through the CR/LF filter and send it to the
screen. The file will display correctly because a linefeed is
inserted at the end of each line.
To create a corrected version of the file TESTCRLF.DAT, you
should enter the command line:
CRLF < TESTCRLF.DAT > CORRECTED.DAT
The filtered output, with all carriage returns properly paired
with linefeeds, is stored in the file CORRECTED.DAT. You can edit
or display this file as you would any normal text file.
The procedure is just as simple for any file of any size.
Just redirect the input from the file that contains bare carriage
returns and redirect the output to the file in which you want the
corrected text to be placed.
A handy feature of the CR/LF filter is that it inserts an
end-of-file (EOF) marker at the end of any file that lacks one.
Ctrl-Z (value 26, or hexadecimal 1A) is generally used to mark
the end of text files. Most text editors and word processors
look for this EOF marker when they load a file, but EDLIN is an
exception to this rule. However, not all files contain an end-
of-file marker; for instance, files created with the COPY CON:
command and those created with the DEBUG program lack the EOF
marker. If the marker is not present, most programs assume that
all of the last sector of information read from the disk is a
valid part of the file, but it is not.
If the character Ctrl-Z (hexadecimal 1A) is not the last
byte of any file filtered with CR/LF, then a Ctrl-Z is added to
the end of that file so that it can be edited properly. For
example, put a disk with space for a file in the default drive
and enter the command lines:
COPY CON: NOEOF.DAT
THIS FILE IS NOT
TERMINATED WITH AN EOF MARKER
and strike the F6 key. The file NOEOF.DAT is now created, with
no EOF marker. To verify that the EOF marker is missing, edit
NOEOF.DAT with your favorite word processor or editor (e.g.,
WordStar), and you'll probably see a row of "@" characters at the
bottom of the file. These characters are garbage and do not
properly belong in the file, but they are loaded because no EOF
marker was present to tell the software where the file ended.
Now pass the file through the CR/LF filter with the command:
CRLF < NOEOF.DAT > ISEOF.DAT
This creates the file ISEOF.DAT, which is identical to the file
NOEOF.DAT except for a Ctrl-Z to mark the end of the file. Then
if you edit the file ISEOF.DAT, you will see that the garbage at
the end of the file has been eliminated.
Because CR/LF can improperly modify files created by
programs that set high bits, it is not an ideal tool for simply
ensuring the EOF marker is present. The BASIC program that creates
the filter program MARKEOF.COM does nothing but add an EOF marker
to the end of any file lacking one.
As an example of the use of the MARKEOF filter, you can place
an EOF marker at the end of the NOEOF.DAT file that we created
above with the command line:
MARKEOF < NOEOF.DAT > ISEOF.DAT
Apart from the possible addition of a Ctrl-Z as an EOF marker, no
change is made to the text of the filtered file.
Figure: Creating TESTCRLF.DAT with DEBUG. If any response differs
from that shown (other than the segment address 6BF8),
exit with the "Q" command and start over.
A>DEBUG
-F 100 L1C "LINE 1"0D "LINE 2" 0D "LINE 3" 0D "LINE 4" 0D
-D100 11B
6BF8:0100 6C 69 6E 65 20 31 0D 6C-69 63 65 20 32 0D 6C 69
6BF8:0110 6E 65 20 33 0D 6C 69 6E-65 20 34 0D
-RCX
CX 0000
:1C
-RBX
BX 0000
:0
-N TESTCRLF.DAT
-W
Writing 001C bytes
-Q
A>
-----------------------------------------------------------------
Custom-Made DOS Filters
(PC Magazine Vol 3 No 21 Oct 30, 1984 M. Abrash/D. Illowsky)
The design of the MARKEOF filter is simpler than the CR/LF
filter's, so we'll examine its assembly language source code
first (Figure 1).
The key to the MARKEOF filter is the use of the DOS functions
3F (hex) and 40 (hex). Function 3F reads one or more characters,
and function 40 writes one or more characters. Each function is
called by typing the number of bytes to be read in register CX,
the location at which the bytes are to be placed in register DX,
and the functin number in register AH. Register BX has a file
handle, which allows the device to be read from or written to.
DOS lets you set up a file handle to refer to any file in any
subdirectory, but we'll use only a small part of the file-handle
feature. DOS provides several built-in file handles that are
always automatically available. Two of these built-in file
handles refer to the standard I/O, and that's all we need.
If register BX contains 0 (for file-handle number zero), the
standard input is used; if register BX contains 1, the standard
output is used. When registers, AH, BX, CX and DX are set, DOS
is invoked to execute the function with software interrupt 21.
MARKEOF is interested in only the last character of the input
file, so it just loops continually, passing characters from the
standard input to the standard output, until the last character
is reached; then an end-of-file marker is inserted if none is
present. This loop extends across lines 29 to 51 (Figure 1).
(Line numbers are used only for explanatory purposes and should
not be included when entering the program.) Lines 30 through 36
read a character from the standard input with function 3F. Lines
43 through 49 immediately write the character to the standard
output with functino 40. This loop continues until function 3F
returns a 0 in register AX, indicating that the standard input
has no more text.
When the standard output runs out of text, MARKEOF checks on
lines 57 through 60 to check the last character of the standard
input. If the character is Ctrl-Z (hex 1A), then the end of the
file is properly marked, and MARKEOF is done. If the character
is not Ctrl-Z, then the end is unmarked, and a Ctrl-Z character
is appended to the standard output by lines 65 through 71 so
other programs will correctly detect the end of the file. Finally,
DOS function 4C (hex) ends MARKEOF and returns control to DOS.
The command sequence show in Figure 2 both assembles and links
the file MARKEOF.ASM, as shown in Figure 1, into the executable
filter program MARKEOF.COM. You must have the IBM Macro Assembler
(the program MASM.EXE) in order to assemble this program, and you
will need to use the LINK and EXE2BIN programs provided with DOS.
LINK will produce the error message "Warning: No STACK segment."
This is of no concern, since the program uses the STACK segment
set by DOS.
The assembler source listing for the CR/LF filter is shown
in Figure 3. The bulk of CR/LF is simply the loop from MARKEOF,
in which each character is read from the standard input and sent
to the standard output. However, carriage returns are handled
specially on lines 50 through 64.
After a carriage return has been sent to the standard output,
a linefeed is automatically sent to the standard output as well.
This ensures that all carriage returns are paired with linefeeds.
Of course, if the next character from the standard input is a
linefeed, which means that the carriage return was already
paired, then there would be two linefeeds. To avoid this, the
character following the carriage return is read on line 54.
If the next character is a linefeed, it is discarded; thus, the
carriage return remains paired with a single linefeed. If the
next character is not a linefeed, then the carriage return was
bare and now has been corrected, and, as a result, the next
character is saved.
The pairing of every carriage return with only one linefeed
is the sole difference between CR/LF and MARKEOF and is the only
modification made to the text from the standard input as it flows
through the filter. When all the text has been filtered, lines
91 through 102 of Figure 3 guarantee that a Ctrl-Z is present to
mark the end of the file. Finally, DOS function 4C ends the
program.
The command sequence shown in Figure 4 assembles and links
the file CRLF.ASM, as shown in Figure 3, into the executable filter
program named CRLF.COM.
After looking at the MARKEOF and CR/LF filters, you can see
how easily a file can be modified with filters running under DOS
2.0 and how simple it is to make these filters. With the new
redirection features of DOS 2.0, filter programs can be written in
assembly language without file control blocks, open and close
functions, and complex function calls. With DOS 2.0, even the
neophyte assembly language programmer can easily design his own
custom filter programs.
- - - - - - - - - -
Figure 1: The assembly language listings for MARKEOF.
[1] ;*
[2] ;* Assembly-language source code listing for MARKEOF,
[3] ;* a filter to copy the standard input
[4] ;* to the standard output, making sure that the
[5] ;* text is terminated with Ctrl-Z (hex 1A) to
[6] ;* mark the end of the file.
[10] cseg segment
[11] assume cs:cseg,ds:cseg
[12] org 100h ;COM files start at offset 100h
[13] markeof proc far
[14] jmp short read_char
[16] ; Equates and storage area
[18] eof equ lah ;Ctrl-Z character that marks
[19] ; the end of a text file
[20] tchar db ? ;temporary storage for
[21] ; character read from standard input
[22] end_of_file db eof ;storage for end-of-file marker
[24] ; Top of loop to read a character from the standard input
[25] ; and write it to the standard output.
[26] ; Read the next character from the standard input.
[29] read_char:
[30] sub bx,bx ;file handle for the standard input
[31] mov cx,1 ;one character is to be read
[32] mov dx,offset tchar ;character read is to be
[33] ; stored in tchar
[34] mov ah,3fh ;we want DOS function 3F (hex),
[35] ; which reads a character
[36] int 21h ;invoke DOS to read a character
[37] ; from the standard input
[38] and ax,ax ;is the standard input out of text?
[39] jz done ;if so, then finish up
[41] ; Write the character to the standard output.
[43] mov bx,1 ;file handle for the standard output
[44] mov cx,bx ;one character is to be written
[45] mov dx,offset tchar ;character to be written is
[46] ; stored in tchar
[47] mov ah,40h ;we want DOS function 40 (hex),
[48] ; which writes a character
[49] int 21h ;invoke DOS to write a character
[50] ; to the standard output
[51] jmp short read_char ;read the next character
[53] ; All text transferred - add an end of file marker
[54] ; exists.
[56] done:
[57] cmp [tchar],eof ;was the last character read
[58] ; the end of file marker?
[60] jz eof_set ;if so, then we're done
[62] ; The last character was not an end of file marker, so add
[63] ; the marker to the standard output.
[65] mov bx,1 ;file handle for standard output
[66] mov cx,bx ;one character is to be written
[67] mov dx,offset end_of_file ;end-of-file marker
[68] ; to be written is
[69] ; stored here
[70] mov ah,40h ;DOS function 40 (hex) to write
[71] int 21h ;invoke DOS to write the end of file
[72] ; marker to the standard output
[74] ; The end-of-file marker is all set, so we're done.
[76] eof_set:
[77] mov ah,4ch ;DOS function 4C (hex) to terminate
[78] int 21h ;invoke DOS to end the program
[79] markeof endp
[80] cseg ends
[81] end markeof
- - - - - - -
Figure 2: Assemble, link and conversion steps for making the
source code of the filter MARKEOF, which is stored in
the file MARKEOF.ASM, into the runnable filter program
MARKEOF.COM.
A>MASM MARKEOF;
The IBM Personal Computer MACRO Assembler
Version 1.00 (C)Copyright IBM Corp 1981
Warning Severe
Errors Errors
0 0
A>LINK MARKEOF;
IBM Personal Computer Linker
Version 2.00 (C)Copyright IBM Corp 1981, 1982, 1983
Warning: No STACK segment
There was 1 error detected
A>EXE2BIN MARKEOF.EXE MARKEOF.COM
A>ERASE MARKEOF.EXE
- - - - - -
Figure 3: The assembly language listings for the filter program
CR/LF.
[1] ;* Assembly-language source code for CRLF, a filter to copy
[2] ;* the standard input to the standard output, making sure
[3] ;* that every carriage return is paired with a linefeed,
[4] ;* as with normal DOS files. Also, Ctrl-Z (hex 1A) is
[5] ;* added to mark the end of the file if no end-of-file
[6] ;* marker is present.
[12] cseg segment
[13] assume cs:cseg,ds:cseg
[14] org 100h ;COM files start at offset 100h
[15] crlf proc far
[16] jmp short read_char
[18] ; Equates and storage area.
[20] cr equ 0dh ;carriage return character
[21] lf equ 0ah ;linefeed character
[22] eof equ 1ah ;Ctrl-Z character
[24] tchar db ? ;temporary storage for character
[25] ; read from standard input
[26] linefeed db lf ;storage for linefeed character
[27] end_of_file db eof ;storage for end-of-file marker
[29] ; Top of loop to read a character from standard input and
[30] ; write it to the standard output, making sure that all
[31] ; carriage returns are paired with a linefeed.
[33] read_char:
[34] call read1 ;get the next character from
[35] ; the standard input
[36] save_char:
[37] mov dx,offset tchar ;point to character read
[38] call write1 ;write it to the standard
[39] ; output
[40] cmp [tchar],cr ;was the character a CR?
[42] jz handle_cr ;if so, make sure it is
[43] ; paired with a linefeed
[44] jmp short read_char ;read the next character
[46] ; Make sure that a CR is followed by a LF.
[49] handle_cr:
[50] mov dx,offset linefeed ;point to the LF charactr
[52] call write1 ;write a LF for the CR
[54] call read1 ;get the next character, to
[55] ; make sure the the CR was
[56] ; not already paired with a
[57] ; linefeed
[59] cmp [tchar],lf ;is the next character from
[60] ; the standard input a LF?
[62] jnz save_char ;if it is not a LF, then
[63] ; then save it normally
[64] jmp short read_char ;if it is a LF, then read
[65] ; the next character; the
[66] ; LF just read is discarded
[68] crlf endup
[70] ; Read the next character from the standard input, checking
[71] ; whether the standard input has run out of text.
[74] read1 proc near
[75] sub bx,bx ;file handle for the standard input
[76] mov cx,1 ;one character is to be read
[77] mov dx,offset tchar ;character read is to be
[78] ; stored in tchar
[79] mov ah,3fh ;we want DOS function 3F (hex),
[80] ; which reads a character
[81] int 21h ;invoke DOS to read a character from
[82] ; the standard input
[83] and ax,ax ;is the standard input out of text?
[84] jz done ;if so, then finish up
[85] ret
[87] ; All text transferred - add an EOF marker if none exists
[90] done:
[91] cmp [tchar],eof ;was the last character read
[92] ; the EOF marker?
[94] jz eof_set ;if so, then we're done
[96] ; The last character was not an EOF marker, so add the
[97] ; marker to the standard output.
[99] mov dx,offset end_of_file ;EOF marker to be
[100] ; written is stored
[101] ; here
[102] call write1 ;write the EOF marker
[104] ; The EOF marker is all set, so we're done.
[106] eof_set:
[107] mov ah,4ch ;DOS function 4C to terminate
[108] int 21h ;invoke DOS to end the program
[109] read1 endup
[111] ; Write the chracter pointed to by register DX to the
[112] ; standard output.
[114] write1 proc near
[115] mov bx,1 ;file handle for the standard output
[116] mov cx,bx ;one character is to be written
[117] mov ah,40h ;we want DOS function 40 which
[118] ; writes a character
[119] int 21h ;invoke DOS to write a character to
[120] ; the standard output
[121] ret
[122] write1 endp
[123] cseg ends
[124] end crlf
- - - - - - -
Figure 4: Assemble, link and conversion steps for making the
source code of the filter CR/LF, which is stored in
the file CRLF.ASM, into the runnable filter program
CRLF.COM.
A>MASM CRLF;
The IBM Personal Computer MACRO Assembler
Version 1.00 (C)Copyright IBM Corp 1981
Warning Severe
Errors Errors
0 0
A>LINK CRLF;
IBM Personal Computer Linker
Version 2.00 (C)Copyright IBM Corp 1981, 1982, 1983
There was 1 error detected
A>EXE2BIN CRLF.EXE CRLF.COM
A>ERASE CRLF.EXE
-----------------------------------------------------------------