Fresh Fish 8

home *** CD-ROM | disk | FTP | other *** search

/ Fresh Fish 8 / FreshFishVol8-CD2.bin / bbs / dev / oberon-a-1.4ß.lha / Oberon-A / texts / SymbolFiles.Text < prev next >

Wrap

Text File | 1994-02-09 | 55KB | 1,367 lines

Departement Informatik Institut für Computersysteme Eidgenössische Technische Hochschule Zürich Robert Griesemer On the Linearization of Graphs and Writing Symbol Files Cuno Pfister (ed.) Oberon Technical Notes Beat Heeb Josef Templ March 1991 Address of the authors: Computersysteme ETH-Zentrum CH-8092 Zurich, Switzerland e-mail: griesemer@inf.ethz.ch pfister@inf.ethz.ch Copyright (c) 1991 Departement Informatik, ETH Zürich On the Linearization of Graphs and Writing Symbol Files Robert Griesemer Abstract The linearization of general graphs represented by pointer and record data structures is a problem often arising in computer programs. Whenever a graph has to be stored on or transmitted over a sequentially organized carrier, a form of linearization is used. A simple algorithm for this purpose is presented and a special application - the writing of symbol files as required by modern language compilers - is described in more detail. Keywords: Linearization, Graphs, Symbol Files. 1. Linearization of Graphs General graphs occur in various forms in computer science, sophisticated data structures may often be interpreted as graphs. Whenever such a data structure has to be stored on a file or has to be transmitted over a network, the linearization problem occurs. For special forms of non-linear data structures (e.g. trees) well-known solutions exist and are comprehensively described in the basic literature. Nevertheless, for more general data structures the wheel has to be reinvented and literature is difficult to find [ReMö86]. The problem has a twofold nature: firstly, the graph has to be linearized by means of a write algorithm , and secondly, it has to be rebuilt out of its linear description by a read algorithm . In the following, linearization means writing a graph to a file. 1.1 Preconditions First of all, we remember that every general (undirected) graph may be represented by a directed graph, by means of having two directed edges instead of an undirected one. Hence, we concentrate on directed graphs only. Secondly, we consider only rooted and connected graphs; i.e. graphs with a special root node and a path from the root to every other node. Unconnected graphs or graphs with several root nodes are easily extended such, that they obey the above preconditions. Thirdly, as an (academic) restriction only finite graphs are considered. In computer programs, graphs may be represented in various forms, depending on the available notational support (the programming language) and the way the data structure is used (the program). Here we consider only graphs described in terms of pointers and records; i.e. every graph node is represented by a record and every edge is represented by a pointer. Note that possible information attached to the edges of a graph ( labeled edges ) may be interpreted as additional node data. The nodes of such a graph may not all have the same type; i.e. the amount of data in every node may be different and the number of directed edges to other nodes may vary. It depends on the application and available language how different nodes are specified. For the sake of simplicity we identify each node with a positive tag -number (> 0), so that every node type is clearly determined by its tag and vice versa. Then, every node contains a certain amount M(tag) of data and (at most) a certain number N(tag) of directed edges to other nodes. Hence, a general graph node and its edges may be described in the following way (written in Oberon [Wi88]): TYPE Node = POINTER TO NodeDesc; NodeDesc = RECORD tag: INTEGER; (* determines node type; tag > 0 *) data: ARRAY M(tag) OF INTEGER; (* M depends on the node type (tag) *) link: ARRAY N(tag) OF Node (* N depends on the node type (tag) *) END; Note that any data structure represented in any way in computer memory may be written to a file by simply writing out sequentially the corresponding memory area. Reading requires "only" the data to be read into the same memory area (otherwise pointers would be incorrect). But normally this is an inappropriate solution for several reasons: often the data structure is distributed over the entire available memory and writing would lead to an immense waste of space. For reading, the destination memory addresses must be specified which is almost never possible and a file created this way is inherently not portable (to another computer system). Hence, writing of graphs requires pointers to be transformed into another form of reference and vice versa for reading. In addition, reading and writing should be as efficient as possible considering both memory and time; e.g. each node description should appear once and only once on the file. 1.2 The Write Algorithm The write algorithm resembles closely a naive recursive mark-algorithm for garbage collectors, actually it is nearly the same task to be done: all nodes have to be traversed and marked; marking is necessary to avoid a second traversal and to refer to the node's first occurence. While traversing the graph its structure is written to a file. Hence, the node data structure has to be extended by a mark field, which initially must be zero: TYPE Node = POINTER TO NodeDesc; NodeDesc = RECORD mark: INTEGER; (* used for linearization; initially = 0 *) tag: INTEGER; (* determines node type; tag > 0 *) data: ARRAY M(tag) OF INTEGER; (* N depends on the node type (tag) *) link: ARRAY N(tag) OF Node (* M depends on the node type (tag) *) END; We say a node q is marked iff q.mark > 0. The WriteNode procedure (see below) traverses the graph in pre-order, starting with the root node. Whenever an unmarked node is encountered, the node is numbered and thus marked (line 4), the node tag and data are written (lines 4 and 5) and its successor nodes are traversed (line 6). Note that a node must be numbered before its subtrees are traversed because they may refer to it. Numbering is done using the counter NofNodes which is one initially and is then incremented by one with every processed node. Hence, NofNodes is always greater than zero. Whenever an already marked node is traversed, only its negative node number is written out as reference number (line 2). If the node doesn't exist (i.e. q = NIL) zero is written out (line 1): VAR NofNodes: INTEGER; (* node number; initially = 1 *) PROCEDURE WriteNode(q: Node); VAR i: INTEGER; BEGIN (1) IF q = NIL THEN WriteInt(0) (2) ELSIF q.mark > 0 THEN (* already marked *) WriteInt(-q.mark) (3) ELSE (* first occurence *) (4) WriteInt(q.tag); q.mark := NofNodes; INC(NofNodes); (5) i := 0; WHILE i < M(tag) DO WriteInt(q.data[i]); INC(i) END; (6) i := 0; WHILE i < N(tag) DO WriteNode(q.link[i]); INC(i) END (7) END END WriteNode; During execution of WriteNode with argument root all nodes are numbered in the order of their first occurence. References to already written nodes are the negative node numbers. While reading the file this information will be used to reconstruct the graph (see 1.3 The Read Algorithm). Before WriteNode can be called a second time for the same graph or parts of it, the preconditions have to be established again, i.e. the nodes must be unmarked. When using time-stamps, no such unmark pass is necessary (due to an idea of J. Templ, see [PfiHeTe91]: A Symmetric Solution to the Load/Store Problem ). The WriteNode procedure leads to the following file structure (in EBNF, terminal symbols are described in double quotes): Graph = NodeRef | NoNode | NodeDesc. NodeRef = "negative node number (< 0)". NoNode = "0". NodeDesc = "node tag (> 0)" "node data" {Graph}. It is obvious that the algorithm terminates for all graphs: there is only a finite number of nodes and every procedure call works on at least one node. The recursion is stopped when an already marked node and hence a loop is found (line 2) and the algorithm returns to its predecessor node. In line 4 the assignment q.mark := NofNodes is executed only if q.mark = 0, i.e. each node is numbered at most once (because NofNodes > 0). On the other hand, since after numbering of the node