home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Simtel MSDOS 1993 May
/
SIMTEL_0593.ISO
/
msdos
/
starter
/
transgid.txt
< prev
next >
Wrap
Text File
|
1990-04-23
|
28KB
|
600 lines
[File; TRANSGID.TXT Revision date; April 23, 1990]
A SHORT GUIDE TO NETWORKING AND FILE TRANSMISSION
Erich Neuwirth
Institute of Statistics and Computer Science
University of Vienna
Austria
(A4422DAB@AWIUNI11.BITNET)
GENERAL PRINCIPLES OF SENDING FILES IN ELECTRONIC NETWORKS
Networking is mainly used in 2 ways:
Electronic mail
Sending (binary) files
This paper tries to explain what some of the differences are and how
one of the two transmission methods sometimes can be (mis)used for
tasks which seem to belong to the other method.
Electronic Mail
Electronic mail means you are sending text from one computer site to
another site. Letters of text are coded as numbers internally within
computers. Problems arise from the fact that the same letter may be
represented by different numbers on different computer systems and vice
versa the same number may yield a different letter on different computer
systems. Mostly we are concerned with two such representation systems
for letters by numbers.
ASCII (which is used on IBM-compatible PCs and on most non-IBM
mainframe computers)
EBCDIC (which is used on IBM (and compatible) mainframe computers)
When you are sending text from one computer to another computer the
computers "think" they only are sending numbers. People reading or
writing text, on the other hand, expect characters, so some
interpretation of the numbers producing the text must take place. Simply
transferring the text file as a sequence of numbers (which is what it
looks like to the computers involved) would result in an unreadable file
on the receiving computer system. Therefore when using computers with
different character representation systems the transmission usually
involves a "translation process" which has the net effect of yielding a
different "sequence of numbers" (= file) on the receiving machine, but
this file usually gives the same letters when read as a text file.
Usually these translation processes work quite well for letters
(lowercase and uppercase) and digits. Quite often you will encounter
problems with special characters like parentheses, brackets, tildes,
carets and so on. If you are interested in merely transferring texts
this is not much of a problem, because even if some special characters
get scrambled it is usually not too hard to reconstruct the original
text by normal editing. If you are setting up a new communications link
it is a good idea to send a file containing all printable characters
with descriptions and to test if they arrive at the other end as they
should. At the end of this paper you will find an example of how such a
test file could look. Of course such a file should be sent from both
ends of the line because the scrambling process in many cases is
asymmetrical, so different transpositions happen in the two different
communication directions. Closely inspecting the file you receive will
show you which characters are changed during the transmission process.
Now three different events can happen:
1) You receive all the characters as they should be:
Action: Don't worry, be happy
2) Some characters are not what they should be, but different characters
still are different (even when not identical with their original)
Action: Do worry, but not too much. In this case you can use the FIND
and REPLACE function of your text editing program to restore the
original meaning of the file. You even could program a macro in your
text editor (if you don't know what that means just ignore this
sentence) which automatically performs the "retranslation" process.
3) Some characters are scrambled and different characters in the source
text file come out as identical characters at the receiving end.
Action: Do worry, because this is the worst possible situation. It is
not possible to construct an automatic "retranslation" process. As
long as you are only concerned with text you will not have too many
problems, because letters, digits, commas and periods usually are not
scrambled when sent between different computer systems. If these
characters also are scrambled the transmission process does not
deserve the name "communication process" any more and you should talk
to the technical people in charge of the transmission channel to take
care of these problems.
Things become more difficult when you want to send data files or program
source files. Files of this kind usually contain special characters
like parentheses and to reconstruct the original text of the file you
usually have to edit the file you received by hand and to infer from the
context the original meaning of a recognizably incorrect character.
The automatic file transfer usually takes place between mainframe
computers. So the most simple situation with text file transfer is that
you use the editor on your mainframe computer to create your text and
then you use the mailing program on the mainframe to send the text file
(sometimes called e-mail or note) to its destination. At the
destination site the receiver then can receive the file and read it with
the help of the text editor program on the receiving mainframe computer.
Sometimes the situation is more difficult. The file you want to send
may exist on your PC, but not yet on the mainframe which is your
entrance to the international computer networks.
There is an important detail you have to take care of here.
Usually you can write texts on a PC using two different kinds of
programs to write with:
Text editor programs or
word processing programs
Text files produced by text editing programs usually give no problems
when you try to send them over a network. With most word processor files
you will experience difficulties. But most word processing programs have
a special way of saving your text as a "plain ASCII file". Remember to
save your texts with this option if you intend to send them over
networks. And if you are still considering which word processing program
you should select for your personal use, only select a program which
offers this option. If you do not know yourself how to verify the
existence of such an option ask somebody more experienced than you to
help you to find out.
Now you have to find a way to transfer the file from your PC to your
mainframe computer. For this purpose you need a file transfer program on
the PC and on the mainframe. Different varieties of programs of this
kind exist, but the prevalent program in an academic environment at the
moment is KERMIT. To use KERMIT to transfer files you need the version of
KERMIT for your PC and an installed version of KERMIT on the mainframe.
The mainframe KERMIT is not your responsibility, you just have to
find out from the staff of your computing center if they already have
installed this program. If they have not done so yet you should tell them
to do so because KERMIT is one of the very few hardware independent
standards and it should be supported. Additionally, all KERMIT versions
are in the Public Domain, so they do NOT COST MONEY. Your local
computing center also should help you to find the version of KERMIT you
need for your PC.
KERMIT is a program used for 2 purposes; namely for using your PC as a
terminal to your mainframe computer and for transferring files between
these two systems.
Now things start to be complicated (even more complicated? I hear you
complain!).
In this paper we will not deal with using KERMIT as a terminal emulator.
There are many ways to do this and it mainly depends on which kind of
mainframe you are using. You should try to get some help from the people
from you local computing center who can show you exactly how to use
KERMIT for this purpose.
An additional remark: If you only want to use KERMIT as a "terminal
emulator", which means using your PC as a terminal, you do not need
KERMIT on the mainframe computer you are connecting to. The mainframe
version is only needed for file transfer between the mainframe and your
PC.
Now things become really complicated! The PC KERMIT has only one way of
transferring files. But the mainframe version usually has two ways
(called "modes" by computer scientists). One way is text mode, the other
way is binary mode. Text mode is used to transfer text files. E-mail
consists of text files so it is this mode you need for downloading e-
mail from your mainframe to your PC. Usually you need not care too
much because practically all mainframe versions of KERMIT use text mode
for file transfer if not told otherwise explicitly.
So simply transferring a text file from your PC to the PC of somebody
else you want to send it to can be done using the following steps:
1) Upload the text file from your PC to your mainframe with KERMIT in
text mode
2) Use the mail facilities of your mainframe to send the text file as
mail to the intended receiver
3) The receiver finally has to download this mail file (it still is
text) with KERMIT in text mode to his/her PC
In most cases the received file is identical with the original file.
Letters and digits arrive as they should.
The idea behind text mode of KERMIT is that the meaning of characters is
preserved, so when transferring in text mode KERMIT automatically
adjusts for different systems of character representations on the
mainframe and on the PC.
You might find that some of the special characters do not arrive as they
should, but this usually is no problem when the text is only intended
for reading and not as input to some computer program.
Later we will see what you can do if you have to send a text file
containing special characters and want to make sure that these
characters arrive unchanged.
TRANSFERRING NON-TEXT FILES
It is becoming even more difficult in this section, but if you want to
send programs and data files usable on other machines it is important
that you understand this section.
Networks can also be used to send PC programs over the network. If you
want to send a program to somebody with the same kind of PC you have, the
basic procedure is very much like the procedure for transferring text
files from your PC via the network to somebody else's PC.
The steps involved are:
Uploading to a mainframe
Using the sending facilities of the network
Downloading from the target mainframe to the target PC
The difficulties arising with program files are that programs contain
more different symbols than text files. They especially contain lots of
so called "nonprintable" characters. You can see this if you try to
look at your program file with a text editor program or a word
processing program.
The simplest solution to transferring program files and like things
(called binary files in computer terminology) is to use the binary
transfer mode of your mainframe KERMIT to upload the program to your
mainframe. Binary mode means that no translation whatsoever takes place
while sending the file (remember, sending text files often involves a
translation process). Now you can use the facilities of your mainframe
for sending files over the network. Sending a file is not the same as
sending a text as mail. Mailing implies that your text is put into the
electronic equivalent of an envelope. Sending a files does not add the
envelope, so the file being sent is (almost) identical with what you
have on your PC. The receiver then can download the file to his/her PC
also using the binary transfer mode of his/her mainframe KERMIT and the
PC version of KERMIT.
This file transfer quite often does not work. Some reasons may be: the
two mainframes involved come from different manufacturers, some
intermediate mainframe makes problems or the file is passing through
different networks. One situation where it makes sense to try this way
of sending binaries is when both mainframes are members of the EARN,
BITNET or NETNORTH networks. It usually does not work when the
mainframes belong to different networks like EARN and JANET.
Now what can we do when we want to send a program or a data file from
an EARN site to a JANET site?
The main idea is translating your binary file (the one you cannot read
because it contains nonprintable characters) into a file consisting only
of printable characters. The most popular scheme for doing such a
translation is the UUENCODE/UUDECODE process. It implies 2 programs,
one usually called UUENCODE and the other one UUDECODE. UUENCODE takes
a binary file and converts it into a file consisting only of printable
characters. UUDECODE reverses this process and restores the original
binary files from the encoded file. So what do you need these programs
for?
You UUENCODE the binary file and upload it to your mainframe (using the
text mode of your mainframe KERMIT). Since it consists of printable
characters only, you can incorporate it into a mail file you send. This
mail file hopefully arrives at its destination and the receiver can
download the mail from his/her mainframe to the local PC. Then it is
mandatory to remove the "electronic envelope" from the mail file. An
appendix will describe how an UUENCODEd file looks and how to recognize
the parts forming the "envelope". Then the UUDECODE program can be used
to translate the UUENCODEd version of the file back into its binary
version.
If you want to use this process you have to get hold of a copy of the
UUENCODE and UUDECODE program. It is not possible (at least not in an
easy way) to send this programs over networks if you have no experience
with encoding and decoding binary files. These programs are binary files
themselves and we cannot send unencoded binary files. So we would need
the binary files already to translate the encoded versions into the
binary version. It is a "who is first, the hen or the egg" kind of
situation. There are ways of solving these problems, but the solutions
involve a nontrivial amount of technical knowledge and also depend very
much on the circumstances of the PCs and mainframes involved.
(For the more technically inclined: we could send the source files of
the translation programs as text files, but then we have to be sure that
the recipient has a compiler for the programming language we are using.)
So quite often the easiest way of setting up an environment where file
transfer is possible involves sending a disk with the UUNCODE/UUDECODE
programs to the sites involved. Once the programs are available file
transfer can start.
Now let us look what an UUENCODED file looks like:
------- the file starts directly below this line ------------
begin 644 erich.com
MZV>0("`@("`@("`@("`@("`@4V\@>6]U(&UA;F%G960@=&\@555$14-/1$4@
M=&AE('1E<W0@9FEL92X-"B`@("`@("`@("`@("`@("`@("`@("`@0V]N9W)A
;='5L871I;VYS(2$-"B0:N@,!M`G-(;@`3,TA
`
end
------ the UUENCODED file ended just above this line -------
The first line always contains the word 'begin' starting in the first
column. The next item is a number which you can safely ignore and the
last item is the name of the UUENCODEd file. The last line of the
encoded file consists of the word "end" starting in the first column and
nothing else. Some encoding programs add a line containing size
information about the encoded file, but this is not really necessary.
If you use the UUENCODing program on your PC the encoded version of the
file usually has the same first part of the file name as the file being
encoded and the file extension .UUE So encoding a program ERICH.COM
would produce a file ERICH.UUE . This file ERICH.UUE is the one that
should be uploaded and sent using the mail facilities of the network.
At the receiving site the mail file sent can be downloaded to the PC.
The downloaded file usually looks similar to the following example:
---------------- this line is not part of the file -----------
Date: Sat 14 Jan 89 06:51:59-EST
From: John R. Somebody <SOMEBODY@SOMESITE>
Subject: File transfer demonstration
To: The catcher in the rye <CATCHRYE@MYSITE>
begin 644 erich.com
MZV>0("`@("`@("`@("`@("`@4V\@>6]U(&UA;F%G960@=&\@555$14-/1$4@
M=&AE('1E<W0@9FEL92X-"B`@("`@("`@("`@("`@("`@("`@("`@0V]N9W)A
;='5L871I;VYS(2$-"B0:N@,!M`G-(;@`3,TA
`
end
John R. Somebody 1/14/89
SOMEBODY@SOMESITE CATCHRYE@MYSITE 1/14/89 file transfer demonstration
--------- this line does not belong to the file any more ---
From this example it should be easy to see what the next step is: Every
line above the "begin" line and every line below the "end" line has to be
removed. The remaining file the can be decoded using UUDECODE. If no
additional problems occurred the decoded program is identical with the
binary program the sender wanted to send. Now for possible
difficulties: UUENCODEd files contain special characters like brackets.
Now when you are reading a text file you usually can recognize the
intended special character even if it has been changed in a file
transfer process. But it is not possible to recognize changed
characters in an UUENCODEd file. So you have to find out if all the
characters arrived unchanged. For this you can use the method described
at the beginning of this paper, namely sending a file with all
characters together with a verbal description of the characters. All
remarks from the earlier part of the paper apply. Inspecting such a file
closely might help you to find out which characters were changed and
into what and with luck you can reverse this exchange process. The main
problem with the UU scheme is that the set of characters being used
contains special characters. So a variant of this method has been
devised. It is call the XXENCODE/XXDECODE process. Essentially it
functions like UUENCODE/UUDECODE, but the encoded file only contains
letters, digits, and the plus and the minus sign. The advantage is that
these characters usually are not changed when passed through different
computers, so chances are higher that such a file will arrive unchanged.
As with UUENCODE/UUDECODE you need the programs before you can start
transmission of binary files. The XX scheme is relatively new, so
usually it is easier to find programs for the UU scheme than for the XX
scheme.
It is important to be aware of the fact that UUENCODEd and XXENCODEd
files are more than 30 percent larger than the original file. This is
the price we have to pay for better transportability.
There is one more important concept you should be aware of when
transferring more than one file at a time and/or transferring big files.
It is the concept of an archive. An archive essentially in one file
created by pasting together and compressing one or more files. Usually
when transferring a few files you use an archiving program which creates
just one file out of a few files. This archived file also is smaller
than all the "source" files together. In the archiving process you need
two programs: the archiving program creating the archive and the
dearchiving program reconstructing the original files. The advantages of
using archives are:
1) It is impossible to forget a file belonging to a set of files when
transferring copies of an archive
2) The amount of data to be transferred is smaller and therefore uses
less disk space and less connect time for transferring them
electronically.
So if you want to send a few files belonging together it is quite common
to create an archive, then to send the archive and then have the
recipient reconstruct the original files by archiving. When you receive
a file with file name extension ARC it is highly probable that it is an
archive file. In this case the extension ARC denotes a special
archiving (= pasting together and compressing) scheme. There is a new
scheme around now which usually can be recognized by the file name
extension ZIP. The 2 programs needed to be able to work with the ZIP
scheme are PKZIP and PKUNZIP.
Let us look at an example of how to use this set of programs.
Let us assume we want to send 3 file named FILEA.TXT, FILEB.DTA and
FILEC.COM.
If we execute the command line
PKZIP ARCHIVE FILEA.TXT FILEB.DTA FILEC.COM
PKZIP will create a file ARCHIVE.ZIP. This file is our archive and
contains all 3 "source" files in a condensed form.
To reconstruct the original files we execute the command line
PKUNZIP ARCHIVE
which will create the 3 original file FILEA.TXT, FILEB.DTA and
FILEC.COM.
There are different programs around for the ARC variant of the process.
ARC and ARCX are a pair performing essentially the same function as
PKZIP and PKUNZIP, PKARC and PKXARC are another pair. There also is a
program called LHARC which performs archiving and dearchiving functions
with just one program. The difference is that PKZIP and PKUNZIP use the
ZIP scheme whereas ARC, ARCX, PKARC and PKXARC use the ARC scheme and
LHARC uses the LZH scheme. All these different schemes are
incompatible.
If you want to create an LZH-archive similar to the ZIP archive of
the previous example you can do so with the following command:
LHARC A ARCHIVE FILEA.TXT FILEB.DTA FILEC.COM
This will create a file ARCHIVE.LZH.
Extracting the files from the archive is done with the following
command:
LHARC E ARCHIVE
There is a special variant of archive files, so-called self extracting
archives. In this special case the archive and the dearchiving program
are pasted together. The result is an executable file (usually with
extension EXE) which, when executed, reconstructs the original files
contained in the archive. It is not possible to recognize
self-extracting archives from the file name extension, so you have to be
told that a certain file is a self-extracting archive.
So we have met two important concepts:
Encoding for creating "mailable" files
Archiving for creating smaller files
It is quite common to combine these 2 processes. So if we want to send
a set of files, first we create an archive containing all the files and
then encode this archive. This hybrid product is sent via E-mail. The
recipient first decodes the mail file into the archive file and then
dearchives the archive into the original files. In this way we combine
the advantages of compressing for reducing costs and of encoding to
allow better transportability.
APPENDIX A: CHARACTER TABLE
Next is a list of all printable characters together with
descriptions:
Characters of the ASCII table
blank
! exclamation mark
" double quote
# number sign
$ dollar sign
% percent sign
& ampersand
' (closing) single quote
( left parenthesis
) right parenthesis
* star
+ plus
, comma
- minus
. period
/ slash
digits
0123456789
: colon
; semicolon
< less
= equal
> greater
? question mark
@ at-sign
uppercase letters
ABCDEFGHIJKLMNOPQRSTUVWXYZ
[ left bracket
\ backslash
] right bracket
^ caret
_ underscore
` left single quote
lowercase letters
abcdefghijklmnopqrstuvwxyz
{ left curly brace
: vertical bar
} right curly bracket
~ tilde
ASCII 127 is nonprintable
APPENDIX B: TECHNICAL DETAILS OF ENCODING AND DECODING
The rest of the paper is very technical, so you should read it only if
you have some knowledge of the mathematics underlying the functioning of
computers.
How do UUECODE and UUDECODE work?
For UUENCODing, the bytes forming the file are grouped in groups of
three. Every byte is an 8-bit binary number, so every group of three
bytes is a 24-bit binary number. This number then is split into four
groups of 6 bits each, i.e. into 4 6-bit binary numbers. The 6-bit binary
numbers give all decimal numbers from 0 to 63. To every such 6-bit
number 32 (decimal) is added, giving numbers in the range from 32 to 95.
Every number then is replaced by the ASCII character associated with
this value. (32 becomes (a blank), 33 becomes !,... 95 becomes _ (an
underscore)). So the translation process converts each group of 3 bytes
into 4 printable characters.
Additionally every group of 45 bytes (giving 60 characters) is grouped
into a line in the file to be sent. Then a leading character is added to
this line. The leading character is calculated by using the encoding
scheme we just discussed onto the number of bytes represented by the
line. (45+32=77, so for a line representing 45 bytes the leading
character is M (M is ASCII character 77)). Usually the last line is
shorter and therefore the leading character of the last line also is
different from M. Finally a first line containing "begin", a 3 digit
number (giving access privileges on UNIX systems and meaningless on
other systems) and the name of the original file and a last line
containing the word "end" is added.
The decoding program then mainly has to convert each group of 4
characters back into a group of three bytes (using the byte count given
by the first character of each line for consistency checks).
There are some problems with this scheme. We already discussed the
possibility of special characters being scrambled. Additionally some
"smart" mailing programs assume that trailing blanks always are
unnecessary. Therefore they strip trailing blanks from every mail file.
If it is only text you want to read you will not notice the difference.
But an UUDECODing program will find out that the lines are too short
(the first character of the line gives information about the line
length!).
There are different solutions for this problem.
1) Replace blanks by ` (the single opening quote having ASCII value
32+64=96)
2) Add an additional nonblank character at the end of each line
3) Make the decoding program smart enough to produce the missing
blanks by itself.
All the solutions are nonstandardized, so if you have some troubles when
decoding you have to analyze them carefully. Solution number 2 usually
works better than the two other solutions. So you should try to get an
encoding program adding that additional character. Using an editor also
makes it possible to transform the different "extended" formats of
UUENCODEd files into one another.
How do XXENCODE and XXDECODE work?
XXENCODE uses the same splitting technique as the UU scheme (3 bytes
into 4 6-digit binary numbers). Then every such number is converted
into a character according to the following sequence:
+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
So (decimal) 0 becomes +, 1 becomes -, (number) 2 becomes (character) 0,
.... 63 becomes z.
The mechanism for adding byte counts to lines is identical to the UU
scheme with the difference the the numbers again are coded according to
the above sequence of letter, digits, + and -. So it even is possible
to convert UUENCODEd files into XXENCODEd files using the replace
feature of a text editor.
ACKNOWLEDGEMENTS
The author wishes to thank Ted Werntz whose comments and suggestions
helped enourmously to improve the paper.