home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Fresh Fish 8
/
FreshFishVol8-CD2.bin
/
bbs
/
dev
/
oberon-a-1.4ß.lha
/
Oberon-A
/
texts
/
SymbolFiles.Text
< prev
next >
Wrap
Text File
|
1994-02-09
|
55KB
|
1,367 lines
Departement Informatik
Institut für Computersysteme
Eidgenössische Technische Hochschule
Zürich
Robert Griesemer
On the Linearization of Graphs and Writing Symbol Files
Cuno Pfister (ed.)
Oberon Technical Notes
Beat Heeb
Josef Templ
March 1991
Address of the authors:
Computersysteme
ETH-Zentrum
CH-8092 Zurich, Switzerland
e-mail:
griesemer@inf.ethz.ch
pfister@inf.ethz.ch
Copyright (c) 1991 Departement Informatik, ETH Zürich
On the Linearization of Graphs and Writing Symbol Files
Robert Griesemer
Abstract
The linearization of general graphs represented by pointer and record
data structures is a problem often arising in computer programs.
Whenever a graph has to be stored on or transmitted over a sequentially
organized carrier, a form of linearization is used. A simple algorithm
for this purpose is presented and a special application - the writing of
symbol files as required by modern language compilers - is described in
more detail.
Keywords: Linearization, Graphs, Symbol Files.
1. Linearization of Graphs
General graphs occur in various forms in computer science, sophisticated
data structures may often be interpreted as graphs. Whenever such a data
structure has to be stored on a file or has to be transmitted over a
network, the linearization problem occurs. For special forms of
non-linear data structures (e.g. trees) well-known solutions exist and
are comprehensively described in the basic literature. Nevertheless, for
more general data structures the wheel has to be reinvented and
literature is difficult to find [ReMö86]. The problem has a twofold
nature: firstly, the graph has to be linearized by means of a write
algorithm , and secondly, it has to be rebuilt out of its linear
description by a read algorithm . In the following, linearization means
writing a graph to a file.
1.1 Preconditions
First of all, we remember that every general (undirected) graph may be
represented by a directed graph, by means of having two directed edges
instead of an undirected one. Hence, we concentrate on directed graphs
only. Secondly, we consider only rooted and connected graphs; i.e.
graphs with a special root node and a path from the root to every other
node. Unconnected graphs or graphs with several root nodes are easily
extended such, that they obey the above preconditions. Thirdly, as an
(academic) restriction only finite graphs are considered.
In computer programs, graphs may be represented in various forms,
depending on the available notational support (the programming language)
and the way the data structure is used (the program). Here we consider
only graphs described in terms of pointers and records; i.e. every graph
node is represented by a record and every edge is represented by a
pointer. Note that possible information attached to the edges of a graph
( labeled edges ) may be interpreted as additional node data.
The nodes of such a graph may not all have the same type; i.e. the
amount of data in every node may be different and the number of directed
edges to other nodes may vary. It depends on the application and
available language how different nodes are specified. For the sake of
simplicity we identify each node with a positive tag -number (> 0), so
that every node type is clearly determined by its tag and vice versa.
Then, every node contains a certain amount M(tag) of data and (at most)
a certain number N(tag) of directed edges to other nodes. Hence, a
general graph node and its edges may be described in the following way
(written in Oberon [Wi88]):
TYPE
Node = POINTER TO NodeDesc;
NodeDesc = RECORD
tag: INTEGER; (* determines node type; tag > 0 *)
data: ARRAY M(tag) OF INTEGER; (* M depends on the node type (tag) *)
link: ARRAY N(tag) OF Node (* N depends on the node type (tag) *)
END;
Note that any data structure represented in any way in computer memory
may be written to a file by simply writing out sequentially the
corresponding memory area. Reading requires "only" the data to be read
into the same memory area (otherwise pointers would be incorrect). But
normally this is an inappropriate solution for several reasons: often
the data structure is distributed over the entire available memory and
writing would lead to an immense waste of space. For reading, the
destination memory addresses must be specified which is almost never
possible and a file created this way is inherently not portable (to
another computer system).
Hence, writing of graphs requires pointers to be transformed into
another form of reference and vice versa for reading. In addition,
reading and writing should be as efficient as possible considering both
memory and time; e.g. each node description should appear once and only
once on the file.
1.2 The Write Algorithm
The write algorithm resembles closely a naive recursive mark-algorithm
for garbage collectors, actually it is nearly the same task to be done:
all nodes have to be traversed and marked; marking is necessary to avoid
a second traversal and to refer to the node's first occurence. While
traversing the graph its structure is written to a file. Hence, the node
data structure has to be extended by a mark field, which initially must
be zero:
TYPE
Node = POINTER TO NodeDesc;
NodeDesc = RECORD
mark: INTEGER; (* used for linearization; initially = 0 *)
tag: INTEGER; (* determines node type; tag > 0 *)
data: ARRAY M(tag) OF INTEGER; (* N depends on the node type (tag) *)
link: ARRAY N(tag) OF Node (* M depends on the node type (tag) *)
END;
We say a node q is marked iff q.mark > 0. The WriteNode procedure (see
below) traverses the graph in pre-order, starting with the root node.
Whenever an unmarked node is encountered, the node is numbered and thus
marked (line 4), the node tag and data are written (lines 4 and 5) and
its successor nodes are traversed (line 6). Note that a node must be
numbered before its subtrees are traversed because they may refer to it.
Numbering is done using the counter NofNodes which is one initially and
is then incremented by one with every processed node. Hence, NofNodes is
always greater than zero. Whenever an already marked node is traversed,
only its negative node number is written out as reference number (line
2). If the node doesn't exist (i.e. q = NIL) zero is written out (line
1):
VAR
NofNodes: INTEGER; (* node number; initially = 1 *)
PROCEDURE WriteNode(q: Node);
VAR i: INTEGER;
BEGIN
(1) IF q = NIL THEN WriteInt(0)
(2) ELSIF q.mark > 0 THEN (* already marked *) WriteInt(-q.mark)
(3) ELSE (* first occurence *)
(4) WriteInt(q.tag); q.mark := NofNodes; INC(NofNodes);
(5) i := 0; WHILE i < M(tag) DO WriteInt(q.data[i]); INC(i) END;
(6) i := 0; WHILE i < N(tag) DO WriteNode(q.link[i]); INC(i) END
(7) END
END WriteNode;
During execution of WriteNode with argument root all nodes are numbered
in the order of their first occurence. References to already written
nodes are the negative node numbers. While reading the file this
information will be used to reconstruct the graph (see 1.3 The Read
Algorithm). Before WriteNode can be called a second time for the same
graph or parts of it, the preconditions have to be established again,
i.e. the nodes must be unmarked. When using time-stamps, no such unmark
pass is necessary (due to an idea of J. Templ, see [PfiHeTe91]: A
Symmetric Solution to the Load/Store Problem ). The WriteNode procedure
leads to the following file structure (in EBNF, terminal symbols are
described in double quotes):
Graph = NodeRef | NoNode | NodeDesc.
NodeRef = "negative node number (< 0)".
NoNode = "0".
NodeDesc = "node tag (> 0)" "node data" {Graph}.
It is obvious that the algorithm terminates for all graphs: there is
only a finite number of nodes and every procedure call works on at least
one node. The recursion is stopped when an already marked node and hence
a loop is found (line 2) and the algorithm returns to its predecessor
node. In line 4 the assignment q.mark := NofNodes is executed only if
q.mark = 0, i.e. each node is numbered at most once (because NofNodes >
0). On the other hand, since after numbering of the node