home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Hacker Chronicles 2
/
HACKER2.BIN
/
483.TECHREF.DOC
< prev
next >
Wrap
Text File
|
1987-06-12
|
26KB
|
533 lines
Amateur Radio Internet Technical Reference
This file describes the "guts" of the Internet package for the benefit
of programmers who wish to write their own applications, or adapt the
code to different hardware environments.
The code as distributed includes both the functions of an IP packet switch
and an end-host system, including several servers. The implementation is highly
modular, however. For example, if one wants to build a dedicated packet switch
without any local applications, the various applications and the TCP and UDP
modules may easily be omitted to save space.
The package allows multiple simultaneous applications, each supporting
multiple simultaneous users, each using TCP and/or UDP. The only limit is
memory space, which is getting quite tight on the 820; the C compiler for the
IBM PC seems to generate much more compact code (typically 1/2 as large as for
the Z-80) so the PC seems more promising as a large-scale server.
Data Structures
To increase portability, the pseudo-types "int16" and "int32" are used to mean
an unsigned 16-bit integer and a signed 32-bit integer, respectively.
Ordinarily these types are defined in machdep.h to be "unsigned int"
and "long".
The various modules pass data in chained structures called mbufs, with the
following format:
struct mbuf {
struct mbuf *next; /* Links mbufs belonging to single packets */
struct mbuf *anext; /* Links packets on queues */
char *data; /* Pointer to start of actual data in buffer */
int16 cnt; /* Length of data in buffer */
};
Although somewhat cumbersome to work with, mbufs make it possible to avoid
memory-to-memory copies that limit performance. For example, when user data
is transmitted it must first traverse several protocol layers before
reaching the transmitter hardware. With mbufs, each layer adds its
protocol header by allocating an mbuf and linking it to the head of the
mbuf "chain" given it by the higher layer, thus avoiding several copy
operations.
A number of primitives operating on mbufs are available in mbuf.c.
The user may create, fill, empty and free mbufs himself with the alloc_mbuf
and free_mbuf primitives, or at the cost of a single memory-to-memory copy he
he may use the more convenient qdata() and dqdata() primitives.
Timer Services
TCP and IP require timers. A timer package is included, so the user must
arrange to call the single entry point "tick" on a regular basis. The constant
MSPTICK in timer.h should be defined as the interval between ticks in
milliseconds. One second resolution is adequate. Since it can trigger a
considerable amount of activity, including upcalls to user level, "tick"
should not be called from an interrupt handler. A clock interrupt should
set a flag which will then cause "tick" to be called at user level.
Internet Type-of-Service
One of the features of the Internet is the ability to specify precedence
(i.e., priority) on a per-datagram basis. There are 8 levels of precedence,
with the bottom 6 defined by the DoD as Routine, Priority, Immediate,
Flash, Flash Override and CRITICAL. (Two more are available for internal
network functions). For amateur use we can use the lower four as
Routine, Welfare, Priority and Emergency. Three more bits specify class
of service, indicating that especially high reliability, high throughput
or low delay is needed for this connection. Constants for this field
are defined in internet.h.
The Internet Protocol Implementation
While the user does not ordinarily see this level directly, it is described
here for sake of completeness. Readers interested only in the interfaces seen
by the applications programmer should skip to the TCP and UDP sections.
The IP implementation consists of three major functions: ip_route, ip_send
and ip_recv.
IP Gateway (Packet Router) Support
The first, ip_route, is the IP packet switch. It takes a single argument,
a pointer to the mbuf containing the IP datagram:
void
ip_route(bp,rxbroadcast)
struct mbuf *bp; /* Datagram pointer */
int rxbroadcast; /* Don't forward */
All IP datagrams, coming or going, pass through this function. After option
processing, if any, the datagram's destination address is extracted. If it
corresponds to the local host, it is "kicked upstairs" to the upper half of
IP and thence to the appropriate protocol module. Otherwise, an internal
routing table consulted to determine where the datagram should be forwarded.
The routing table uses hashing keyed on IP destination addresses, called
"targets". If the target address is not found, a special "default" entry,
if available, is used. If a default entry is not available either, an ICMP
"Destination Unreachable" message containing the offending IP header is
returned to the sender.
The "rxbroadcast" flag is used to prevent forwarding of broadcast packets,
a practice which might otherwise result in spectacular routing loops. Any
subnet interface driver receiving a packet addressed to the broadcast address
within that subnet MUST set this flag. All other packets (including locally
originated packets) should have "rxbroadcast" set to zero.
ip_route ignores the IP destination address in broadcast packets, passing them
up to the appropriate higher level protocol which is also made aware of their
broadcast nature. (TCP and ICMP ignore them; only UDP can accept them).
Entries are added to the IP routing table with the rt_add function:
int
rt_add(target,gateway,metric,interface)
int32 target; /* IP address of target */
int32 gateway; /* IP address of neighbor gateway to reach this target */
int metric; /* "cost" measurement, available for routing decisions */
struct interface *interface; /* device interface structure */
"target" is the IP address of the destination; it becomes the hash index key
for subsequent packet destination address lookups. If target == 0, the
default entry is modified. "metric" is simply stored in the table; it is
available for routing cost calculations when an automatic routing protocol
is written. "interface" is the address of a control structure for the
particular device to which the datagram should be sent; it is defined
in the section "IP Interfaces".
rt_add returns 0 on success, -1 on failure (e.g., out of memory).
To remove an entry from the routing table, only the target address need
be specified to the rt_drop call:
int
rt_drop(target)
int32 target;
rt_drop returns 0 on success, -1 if the target could not be found.
IP Interfaces
Every lower level interface used to transmit IP datagrams must have an
"interface" structure, defined as follows:
/* Interface control structure */
struct interface {
struct interface *next; /* Linked list pointer */
char *name; /* Ascii string with interface name */
int16 mtu; /* Maximum transmission unit size */
int (*send)(); /* Routine to call to send datagram */
int (*output)(); /* Routine to call to send raw packet */
int (*recv)(); /* Routine to kick to process input */
int (*stop)(); /* Routine to call before detaching */
int16 dev; /* Subdevice number to pass to send */
int16 flags; /* State of interface */
#define IF_ACTIVE 0x01
#define IF_BROADCAST 0x04 /* Interface is capable of broadcasting */
};
Part of the interface structure is for the private use of the device driver.
"dev" is used to distinguish between one of several identical devices (e.g.,
serial links or radio channels) that might share the same send routine.
A pointer to this structure kept in the routing table. Two fields in the
interface structure are examined by ip_route: "mtu" and "send". The maximum
transmission unit size represents the largest datagram that this device can
handle; ip_route will do IP-level fragmentation as required to meet this
limit before calling "send", the function to queue datagrams on this
interface. "send" is called as follows:
(*send)(bp,interface,gateway,precedence,delay,throughput,reliability)
struct mbuf *bp; /* Pointer to datagram */
struct interface *interface; /* Interface structure */
int32 gateway; /* IP address of gateway */
char precedence; /* TOS bits from IP header */
char delay;
char throughput;
char reliability;
The "interface" and "gateway" arguments are kept in the routing table and
passed on each call to the send routine. The interface pointer is passed
again because several interfaces might share the same output driver (e.g.,
several identical physical channels). "gateway" is the IP address of the
neighboring IP gateway on the other end of the link; if a link-level address
is required, the send routine must map this address either dynamically (e.g.,
with the Address Resolution Protocol, ARP) or with a static lookup table.
If the link is point-to-point, link-level addresses are unnecessary, and the
send routine can therefore ignore the gateway address.
The Internet Type-of-Service (TOS) bits are passed to the interface driver
as separate arguments. If tradeoffs exist within the subnet between these
various classes of service, the driver may use these arguments to control
them (e.g., optional use of link level acknowledgments, priority queuing,
etc.)
It is expected that the send routine will put a link level header on the front
of the packet, add it an internal output queue, start output (if not already
active) and return. It must NOT busy-wait for completion (unless it is a very
fast device, e.g., Ethernet) since that blocks the entire system.
Any interface that uses ARP must also provide an "output" routine. It is a
lower level entry point that allows the caller to specify the fields in the
link header. ARP uses it to broadcast a request for a given IP address. It may
be the same routine used internally by the driver to send IP datagrams
once the link level fields have been determined. It is called as follows:
(*output)(interface,dest,src,type,bp)
struct interface *interface; /* Pointer to interface structure */
char dest[]; /* Link level destination address */
char src[]; /* Link level source address */
int16 type; /* Protocol type field for link level */
struct mbuf *bp; /* Data field (IP datagram) */
IP Host Support
All of the modules described thus far are required in all systems. However,
the routines that follow are necessary only if the system is to support
higher-level applications. In a standalone IP gateway (packet switch)
without servers or clients, the following modules (IP user level, TCP and
UDP) may be omitted to allow additional space for buffering; define the
flag GWONLY when compiling iproute.c to avoid referencing the user level
half of IP.
The following function is called by iproute() whenever a datagram arrives
that is addressed to the local system.
void
ip_recv(bp,rxbroadcast)
struct mbuf *bp; /* Datagram */
char rxbroadcast; /* Incoming broadcast */
ip_recv reassembles IP datagram fragments, if necessary, and calls the input
function of the next layer protocol (e.g., tcp_input, udp_input) with the
appropriate arguments, as follows:
(*protrecv)(bp,protocol,source,dest,tos,length,rxbroadcast);
struct mbuf *bp; /* Pointer to packet minus IP header */
char protocol; /* IP protocol ID */
int32 source; /* IP address of sender */
int32 dest; /* IP address of destination (i.e,. us) */
char tos; /* IP type-of-service field in datagram */
int16 length; /* Length of datagram minus IP header */
char rxbroadcast; /* Incoming broadcast */
The list of protocols is contained in a switch() statement in the ip_recv
function. If the protocol is unsupported, an ICMP Protocol Unreachable
message is returned to the sender unless the packet came in as a broadcast.
Higher level protocols such as TCP and UDP use the ip_send routine to generate
IP datagrams. The arguments to ip_send correspond directly to fields in the IP
header, which is generated and put in front of the user's data before being
handed to ip_route:
ip_send(source,dest,protocol,tos,ttl,bp,length,id,df)
int32 source; /* source address */
int32 dest; /* Destination address */
char protocol; /* Protocol */
char tos; /* Type of service */
char ttl; /* Time-to-live */
struct mbuf *bp; /* Data portion of datagram */
int16 length; /* Optional length of data portion */
int16 id; /* Optional identification */
char df; /* Don't-fragment flag */
This interface is modeled very closely after the example given on page 32
of RFC-791. Zeros may be passed for id or ttl, and system defaults will
be provided. If zero is passed for length, it will be calculated automatically.
The Transmission Control Protocol (TCP)
A TCP connection is uniquely identified by the concatenation of local and
remote "sockets". In turn, a socket consists of a host address (a 32-bit
integer) and a TCP port (a 16-bit integer), defined by the C structure
struct socket {
long address; /* 32-bit IP address */
short port; /*16-bit TCP port */
};
It is therefore possible to have several simultaneous but distinct connections
to the same port on a given machine, as long as the remote sockets are
distinct. Port numbers are assigned either through mutual agreement, or more
commonly when a "standard" service is involved, as a "well known port" number.
For example, to obtain standard remote login service using the TELNET
presentation-layer protocol, by convention you initiate a connection to TCP
port 23; to send mail using the Simple Mail Transfer Protocol (SMTP) you
connect to port 25. ARPA maintains port number lists and periodically
publishes them; the latest revision is RFC-960, "Assigned Numbers".
They will also assign port numbers to a new application on request if it
appears to be of general interest.
TCP connections are best modeled as a pair of one-way paths (one in each
direction) rather than single full-duplex paths. A TCP "close" really
means "I have no more data to send". Station A may close its path to station
B leaving the reverse path from B to A unaffected; B may continue to send data
to A indefinitely until it too closes its half of the connection. Even after
a user initiates a close, TCP continues to retransmit any unacknowledged
data if necessary to ensure that it reaches the other end. This is known as
"graceful close" and greatly simplifies certain applications such as FTP.
TCP Module Overview
This package is written as a "module" intended to be compiled and linked with
the application(s) so that they can be run as one program on the same machine.
This greatly simplifies the user/TCP interface, which becomes just a set of
internal subroutine calls on a single machine. The internal TCP state (e.g.,
the address of the remote station) is easily accessed. Reliability is
improved, since any hardware failure that kills TCP will likely take its
applications with it anyway. Only IP datagrams flow out of the machine across
hardware interfaces (such as asynch RS-232 ports or whatever else is available)
so hardware flow control or complicated host/front-end protocols are
unnecessary.
The TCP supports five basic operations on a connection: open_tcp, send_tcp,
receive_tcp, close_tcp and del_tcp. A sixth, state_tcp, is provided
mainly for debugging. Since this TCP module cannot assume the presence of
a sleep/wakeup facility from the underlying operating system, functions
that would ordinarily block (e.g., recv_tcp when no data is available)
instead set net_error to the constant EWOULDBLK and immediately return -1.
Asynchronous notification of events such as data arrival can be obtained
through the upcall facility described earlier.
Each TCP function is summarized in the following section in the form of C
declarations and descriptions of each argument.
int net_error;
This global variable stores the specific cause of an error from one of the
TCP or UDP functions. All functions returning integers (i.e., all except
open_tcp) return -1 in the event of an error, and net_error should be
examined to determine the cause. The possible errors are defined as constants
in the header file netuser.h.
/* Open a TCP connection */
struct tcb *
open_tcp(lsocket,fsocket,mode,window,r_upcall,t_upcall,s_upcall,tos,user)
struct socket *lsocket; /* Local socket */
struct socket *fsocket; /* Remote socket */
int mode; /* Active/passive/server */
int16 window; /* Receive window (and send buffer) sizes */
void (*r_upcall)(); /* Function to call when data arrives */
void (*t_upcall)(); /* Function to call when ok to send more data */
void (*s_upcall)(); /* Function to call when connection state changes */
char tos; /* Internet Type-of-Service */
int *user; /* Pointer for convenience of user */
"lsocket" and "fsocket" are pointers to the local and foreign sockets,
respectively.
"mode" may take on three values, all defined in net.user.h. If mode is
TCP_PASSIVE, no packets are sent, but a TCP control block is created that will
accept a subsequent active open from another TCP. If a specific foreign socket
is passed to a passive open, then connect requests from any other foreign socket
will be rejected. If the foreign socket fields are set to zero, or if fsocket
is NULLSOCK, then connect requests from any foreign socket will be accepted.
If mode is TCP_ACTIVE, TCP will initiate a connection to a remote socket that
must already have been created in the LISTEN state by its client. The foreign
socket must be completely specified in an active open. When mode is TCP_SERVER,
open_tcp behaves as though TCP_PASSIVE was given except that an internal "clone"
flag is set. When a connection request comes in, a fresh copy of the TCP control
block is created and the original is left intact. This allows multiple
sessions to exist simultaneously; if TCP_PASSIVE were used instead only the
first connect request would be accepted.
"r_upcall", "t_upcall" and "s_upcall" provide optional upcall or pseudo-
interrupt mechanisms useful when running in a non operating system environment.
Each of the three arguments, if non-NULL, is taken as the address of a
user-supplied function to call when receive data arrives, transmit queue space
becomes available, or the connection state changes. The three functions are
called with the following arguments:
(*r_upcall)(tcb,count); /* count == number of bytes in receive queue */
(*t_upcall)(tcb,avail); /* avail == space available in send queue */
(*s_upcall)(tcb,oldstate,newstate);
Note: whenever a single event invokes more than one upcall the order in
which the upcalls are made is not strictly defined. In general, though,
the Principle of Least Astonishment is followed. E.g., when entering the
ESTABLISHED state, the state change upcall is invoked first, followed by
the transmit upcall. When an incoming segment contains both data and FIN,
the receive upcall is invoked first, followed by the state change to
CLOSE_WAIT state. In this case, the user may interpret this state change
as a "end of file" indicator.
"tos" is the Internet type-of-service field. This parameter is passed along
to IP and is included in every datagram. The actual precedence value used is
the higher of the two specified in the corresponding pair of open_tcp calls.
open_tcp returns a pointer to an internal Transmission Control Block
(tcb). This "magic cookie" must be passed back as the first argument
to all other TCP calls. In event of error, the NULL pointer (0) is
returned and net_error is set to the reason for the error.
The only limit on the number of TCBs that may exist at any time (i.e., the
number of simultaneous connections) is the amount of free memory on the
machine. Each TCB on a 16-bit processor currently takes up 111 bytes;
additional memory is consumed and freed dynamically as needed to buffer send
and receive data. Deleting a TCB (see the del_tcp() call) reclaims its space.
/* Send data on a TCP connection */
int
send_tcp(tcb,bp)
struct tcb *tcb; /* TCB pointer */
struct mbuf *bp; /* Pointer to user's data mbufs */
"tcb" is the pointer returned by the open_tcp() call. "bp" points to the
user's mbuf with data to be sent. After being passed to send_tcp, the user
must no longer access the data buffer. TCP uses positive acknowledgments with
retransmission to ensure in-order delivery, but this is largely invisible to
the user. Once the remote TCP has acknowledged the data, the buffer will
be freed automatically.
TCP does not enforce a limit in the number of bytes that may be queued
for transmission, but it is recommended that the application not send any
more than the amount passed as "cnt" in the transmitter upcall. The package
uses shared, dynamically allocated buffers, and it is entirely possible
for a misbehaving user task to run the system out of buffers.
/* Receive data on a TCP connection */
int
recv_tcp(tcb,bp,cnt)
struct tcb *tcb;
struct mbuf **bp;
int16 cnt;
recv_tcp() passes back through bp a pointer to an mbuf chain containing any
available receive data, up to a maximum of "cnt" bytes. The actual number of
bytes received (the lesser of "cnt" and the number pending on the receive
queue) is returned. If no data is available, net_error is set to EWOULDBLK
and -1 is returned; the r_upcall mechanism may be used to determine when data
arrives. (Technical note: "r_upcall" is called whenever a PUSH or FIN bit is
seen in an incoming segment, or if the receive window fills. It is called
before an ACK is sent back to the remote TCP, in order to give the user an
opportunity to piggyback any data in response.)
When the remote TCP closes its half of the connection and all prior incoming
data has been read by the local user, subsequent calls to recv_tcp return 0
rather than -1 as an "end of transmission" indicator. Note that the local
application is notified of a remote close (i.e., end-of-file) by a state-change
upcall with the new state being CLOSE_WAIT; if the local application
has closed first, a remote close is indicated by a state-change upcall to
either CLOSING or TIME_WAIT state. (CLOSING state is used only when the
two ends close simultaneously and their FINs cross in the mail).
/* Close a TCP connection */
close_tcp(tcb)
struct tcb *tcb;
This tells TCP that the local user has no more data to send. However, the
remote TCP may continue to send data indefinitely to the local user, until
the remote user also does a close_tcp. An attempt to send data after a
close_tcp is an error.
/* Delete a TCP connection */
del_tcp(tcb)
struct tcb *tcb;
When the connection has been closed in both connections and all incoming
data has been read, this call is made to cause TCP to reclaim the space
taken up by the TCP control block. Any incoming data remaining unread is lost.
/* Dump a TCP connection state */
state_tcp(tcb)
struct tcb *tcb;
This debugging call prints an ASCII-formatted dump of the TCP connection
state on the terminal. You need a copy of the TCP specification (ARPA RFC
793 or MIL-STD-1778) to interpret most of the numbers.
The User Datagram Protocol (UDP)
UDP is available for simple applications not needing the services of a
reliable protocol like TCP. A minimum of overhead is placed on top of the
"raw" IP datagram service, consisting only of port numbers and a checksum
covering the UDP header and user data. Four functions are available to the
UDP user.
/* Create a UDP control block for lsocket, so that we can queue
* incoming datagrams.
*/
int
open_udp(lsocket,r_upcall)
struct socket *lsocket;
void (*r_upcall)();
open_udp creates a queue to accept incoming datagrams (regardless of
source) addressed to "lsocket". "r_upcall" is an optional upcall mechanism
to provide the address of a function to be called as follows whenever
a datagram arrives:
(*r_upcall)(lsocket,rcvcnt);
struct socket *lsocket; /* Pointer to local socket */
int rcvcnt; /* Count of datagrams pending on queue */
/* Send a UDP datagram */
int
send_udp(lsocket,fsocket,tos,ttl,bp,length,id,df)
struct socket *lsocket; /* Source socket */
struct socket *fsocket; /* Destination socket */
char tos; /* Type-of-service for IP */
char ttl; /* Time-to-live for IP */
struct mbuf *bp; /* Data field, if any */
int16 length; /* Length of data field */
int16 id; /* Optional ID field for IP */
char df; /* Don't Fragment flag for IP */
The parameters passed to send_udp are simply stuffed in the UDP and IP
headers, and the datagram is sent on its way.
/* Accept a waiting datagram, if available. Returns length of datagram */
int
recv_udp(lsocket,fsocket,bp)
struct socket *lsocket; /* Local socket to receive on */
struct socket *fsocket; /* Place to stash incoming socket */
struct mbuf **bp; /* Place to stash data packet */
The "lsocket" pointer indicates the socket the user wishes to receive a
datagram on (a queue must have been created previously with the open_udp
routine). "fsocket" is taken as the address of a socket structure to be
overwritten with the foreign socket associated with the datagram being
read; bp is overwritten with a pointer to the data portion (if any)
of the datagram being received.
/* Delete a UDP control block */
int
del_udp(lsocket)
struct socket *lsocket;
This function destroys any unread datagrams on a queue, and reclaims the
space taken by the queue descriptor.
Phil Karn, KA9Q
12 June 1987