Usenet 1994 January

home *** CD-ROM | disk | FTP | other *** search

/ Usenet 1994 January / usenetsourcesnewsgroupsinfomagicjanuary1994.iso / sources / misc / volume23 / quranref / part03 < prev next >

Wrap

Text File | 1991-10-19 | 32.2 KB | 1,117 lines

Newsgroups: comp.sources.misc From: goer@midway.uchicago.edu (Richard L. Goerwitz) Subject: v23i069: quranref - Holy Qur'an word and passage based retrievals, Part03/08 Message-ID: <1991Oct19.022243.12852@sparky.imd.sterling.com> X-Md4-Signature: c3c0b2565f4fdb2b1b848a6961d6826c Date: Sat, 19 Oct 1991 02:22:43 GMT Approved: kent@sparky.imd.sterling.com Submitted-by: goer@midway.uchicago.edu (Richard L. Goerwitz) Posting-number: Volume 23, Issue 69 Archive-name: quranref/part03 Environment: Icon ---- Cut Here and feed the following to sh ---- #!/bin/sh # this is quranref.03 (part 3 of a multipart archive) # do not concatenate these parts, unpack them in order with /bin/sh # file inbits.icn continued # if test ! -r _shar_seq_.tmp; then echo 'Please unpack part 1 first!' exit 1 fi (read Scheck if test "$Scheck" != 3; then echo Please unpack part "$Scheck" next! exit 1 else exit 0 fi ) < _shar_seq_.tmp || exit 1 if test ! -f _shar_wnt_.tmp; then echo 'x - still skipping inbits.icn' else echo 'x - continuing file inbits.icn' sed 's/^X//' << 'SHAR_EOF' >> 'inbits.icn' && X byte_length := 8 X } X X old_byte_mask := (0 < 2^old_len - 1) | 0 X old_byte := iand(old_byte, old_byte_mask) X i := ishift(old_byte, len-old_len) X X len -:= (len > old_len) | { X old_len -:= len X return i X } X X while byte := ord(reads(f)) do { X i := ior(i, ishift(byte, len-byte_length)) X len -:= (len > byte_length) | { X old_len := byte_length-len X old_byte := byte X return i X } X } X Xend SHAR_EOF echo 'File inbits.icn is complete' && true || echo 'restore of inbits.icn failed' rm -f _shar_wnt_.tmp fi # ============= rewrap.icn ============== if test -f 'rewrap.icn' -a X"$1" != X"-c"; then echo 'x - skipping rewrap.icn (File already exists)' rm -f _shar_wnt_.tmp else > _shar_wnt_.tmp echo 'x - extracting rewrap.icn (Text)' sed 's/^X//' << 'SHAR_EOF' > 'rewrap.icn' && X############################################################################ X# X# Name: rewrap.icn X# X# Title: advanced line rewrap utility X# X# Author: Richard L. Goerwitz X# X# Version: 1.4 X# X############################################################################ X# X# The procedure rewrap(s,i), included in this file, reformats text X# fed to it into strings < i in length. Rewrap utilizes a static X# buffer, so it can be called repeatedly with different s arguments, X# and still produce homogenous output. This buffer is flushed by X# calling rewrap with a null first argument. The default for X# argument 2 (i) is 70. X# X# Here's a simple example of how rewrap could be used. The following X# program reads the standard input, producing fully rewrapped output. X# X# procedure main() X# every write(rewrap(!&input)) X# write(rewrap()) X# end X# X# Naturally, in practice you would want to do things like check for in- X# dentation or blank lines in order to wrap only on a paragraph-by para- X# graph basis, as in X# X# procedure main() X# while line := read(&input) do { X# if line == "" then { X# write("" ~== rewrap()) X# write(line) X# } else { X# if match("\t", line) then { X# write(rewrap()) X# write(rewrap(line)) X# } else { X# write(rewrap(line)) X# } X# } X# } X# end X# X# Fill-prefixes can be implemented simply by prepending them to the X# output of rewrap: X# X# i := 70; fill_prefix := " > " X# while line := read(input_file) do { X# line ?:= (f_bit := tab(many('> ')) | "", tab(0)) X# write(fill_prefix || f_bit || rewrap(line, i - *fill_prefix)) X# etc. X# X# Obviously, these examples are fairly simplistic. Putting them to X# actual use would certainly require a few environment-specific X# modifications and/or extensions. Still, I hope they offer some X# indication of the kinds of applications rewrap might be used in. X# X# Note: If you want leading and trailing tabs removed, map them to X# spaces first. Rewrap only fools with spaces, leaving tabs intact. X# This can be changed easily enough, by running its input through the X# Icon detab() function. X# X############################################################################ X# X# See also: wrap.icn X# X############################################################################ X X Xprocedure rewrap(s,i) X X local extra_bit, line X static old_line X initial old_line := "" X X # Default column to wrap on is 70. X /i := 70 X # Flush buffer on null first argument. X if /s then { X extra_bit := old_line X old_line := "" X return "" ~== extra_bit X } X X # Prepend to s anything that is in the buffer (leftovers from the last s). X s ?:= { tab(many(' ')); old_line || trim(tab(0)) } X X # If the line isn't long enough, just add everything to old_line. X if *s < i then old_line := s || " " & fail X X s ? { X X # While it is possible to find places to break s, do so. X while any(' -',line := EndToFront(i),-1) do { X # Clean up and suspend the last piece of s tabbed over. X line ?:= (tab(many(' ')), trim(tab(0))) X if *&subject - &pos + *line > i X then suspend line X else { X old_line := "" X return line || tab(0) X } X } X X # Keep the extra section of s in a buffer. X old_line := tab(0) X X # If the reason the remaining section of s was unrewrapable was X # that it was too long, and couldn't be broken up, then just return X # the thing as-is. X if *old_line > i then { X old_line ? { X if extra_bit := tab(upto(' -')+1) || (tab(many(' ')) | "") X then old_line := tab(0) X else extra_bit := old_line & old_line := "" X return trim(extra_bit) X } X } X # Otherwise, clean up the buffer for prepending to the next s. X else { X # If old_line is blank, then don't mess with it. Otherwise, X # add whatever is needed in order to link it with the next s. X if old_line ~== "" then { X # If old_line ends in a dash, then there's no need to add a X # space to it. X if old_line[-1] ~== "-" X then old_line ||:= " " X } X } X } X Xend X X X Xprocedure EndToFront(i) X # Goes with rewrap(s,i) X *&subject+1 - &pos >= i | fail X suspend &subject[.&pos:&pos <- &pos+i to &pos by -1] Xend SHAR_EOF true || echo 'restore of rewrap.icn failed' rm -f _shar_wnt_.tmp fi # ============= findre.icn ============== if test -f 'findre.icn' -a X"$1" != X"-c"; then echo 'x - skipping findre.icn (File already exists)' rm -f _shar_wnt_.tmp else > _shar_wnt_.tmp echo 'x - extracting findre.icn (Text)' sed 's/^X//' << 'SHAR_EOF' > 'findre.icn' && X######################################################################## X# X# Name: findre.icn X# X# Title: "Find" Regular Expression X# X# Author: Richard L. Goerwitz X# X# Version: 1.17 X# X######################################################################## X# X# I place this and any later versions in the public domain - RLG. X# X######################################################################## X# X# DESCRIPTION: findre() is like the Icon builtin function find(), X# except that it takes, as its first argument, a regular expression X# pretty much like the ones the Unix egrep command uses (the few X# minor differences are listed below). Its syntax is the same as X# find's (i.e. findre(s1,s2,i,j)), with the exception that a no- X# argument invocation wipes out all static structures utilized by X# findre, and then forces a garbage collection. X# X# (For those not familiar with regular expressions and the Unix egrep X# command: findre() offers a simple and compact wildcard-based search X# system. If you do a lot of searches through text files, or write X# programs which do searches based on user input, then findre is a X# utility you might want to look over.) X# X# IMPORTANT DIFFERENCES between find and findre: As noted above, X# findre() is just a find() function that takes a regular expression X# as its first argument. One major problem with this setup is that X# it leaves the user with no easy way to tab past a matched X# substring, as with X# X# s ? write(tab(find("hello")+5)) X# X# In order to remedy this intrinsic deficiency, findre() sets the X# global variable __endpoint to the first position after any given X# match occurs. Use this variable with great care, preferably X# assigning its value to some other variable immediately after the X# match (for example, findre("hello [.?!]*",s) & tmp := __endpoint). X# Otherwise, you will certainly run into trouble. (See the example X# below for an illustration of how __endpoint is used). X# X# IMPORTANT DIFFERENCES between egrep and findre: findre utilizes X# the same basic language as egrep. The only big difference is that X# findre uses intrinsic Icon data structures and escaping conven- X# tions rather than those of any particular Unix variant. Be care- X# ful! If you put findre("$hello$",s) into your source file, X# findre will treat it just like findre("(hello)",s). If, however, X# you enter '$hello$' at run-time (via, say, findre(!&input,s)), X# what Icon receives will depend on your operating system (most X# likely, a trace will show "\$hello\$"). X# X# BUGS: Space has essentially been conserved at the expense of time X# in the automata produced by findre(). The algorithm, in other X# words, will produce the equivalent of a pushdown automaton under X# certain circumstances, rather than strive (at the expense of space) X# for full determinism. I tried to make up a nfa -> dfa converter X# that would only create that portion of the dfa it needed to accept X# or reject a string, but the resulting automaton was actually quite X# slow (if anyone can think of a way to do this in Icon, and keep it X# small and fast, please let us all know about it). Note that under X# version 8 of Icon, findre takes up negligible storage space, due to X# the much improved hashing algorithm. I have not tested it under X# version 7, but I would expect it to use up quite a bit more space X# in that environment. X# X# IMPORTANT NOTE: Findre takes a shortest-possible-match approach X# to regular expressions. In other words, if you look for "a*", X# findre will not even bother looking for an "a." It will just match X# the empty string. Without this feature, findre would perform a bit X# more slowly. The problem with such an approach is that often the X# user will want to tab past the longest possible string of matched X# characters (say tab((findre("a*|b*"), __endpoint)). In circumstan- X# ces like this, please just use something like: X# X# s ? { X# tab(find("a")) & # or use Arb() from the IPL (patterns.icn) X# tab(many('a')) X# tab(many('b')) X# } X# X# or else use some combination of findre and the above. X# X######################################################################## X# X# REGULAR EXPRESSION SYNTAX: Regular expression syntax is complex, X# and yet simple. It is simple in the sense that most of its power X# is concentrated in about a dozen easy-to-learn symbols. It is X# complex in the sense that, by combining these symbols with X# characters, you can represent very intricate patterns. X# X# I make no pretense here of offering a full explanation of regular X# expressions, their usage, and the deeper nuances of their syntax. X# As noted above, this should be gleaned from a Unix manual. For X# quick reference, however, I have included a brief summary of all X# the special symbols used, accompanied by an explanation of what X# they mean, and, in some cases, of how they are used (most of this X# is taken from the comments prepended to Jerry Nowlin's Icon-grep X# command, as posted a couple of years ago): X# X# ^ - matches if the following pattern is at the beginning X# of a line (i.e. ^# matches lines beginning with "#") X# $ - matches if the preceding pattern is at the end of a line X# . - matches any single character X# + - matches from 1 to any number of occurrences of the X# previous expression (i.e. a character, or set of paren- X# thesized/bracketed characters) X# * - matches from 0 to any number of occurrences of the previous X# expression X# \ - removes the special meaning of any special characters X# recognized by this program (i.e if you want to match lines X# beginning with a "[", write ^\[, and not ^[) X# | - matches either the pattern before it, or the one after X# it (i.e. abc|cde matches either abc or cde) X# [] - matches any member of the enclosed character set, or, X# if ^ is the first character, any nonmember of the X# enclosed character set (i.e. [^ab] matches any character X# _except_ a and b). X# () - used for grouping (e.g. ^(abc|cde)$ matches lines consist- X# ing of either "abc" or "cde," while ^abc|cde$ matches X# lines either beginning with "abc" or ending in "cde") X# X######################################################################### X# X# EXAMPLE program: X# X# procedure main(a) X# while line := !&input do { X# token_list := tokenize_line(line,a[1]) X# every write(!token_list) X# } X# end X# X# procedure tokenize_line(s,sep) X# tmp_lst := [] X# s ? { X# while field := tab(findre(sep)|0) & X# mark := __endpoint X# do { X# put(tmp_lst,"" ~== field) X# if pos(0) then break X# else tab(mark) X# } X# } X# return tmp_lst X# end X# X# The above program would be compiled with findre (e.g. "icont X# test_prg.icn findre.icn") to produce a single executable which X# tokenizes each line of input based on a user-specified delimiter. X# Note how __endpoint is set soon after findre() succeeds. Note X# also how empty fields are excluded with "" ~==, etc. Finally, note X# that the temporary list, tmp_lst, is not needed. It is included X# here merely to illustrate one way in which tokens might be stored. X# X# Tokenizing is, of course, only one of many uses one might put X# findre to. It is very helpful in allowing the user to construct X# automata at run-time. If, say, you want to write a program that X# searches text files for patterns given by the user, findre would be X# a perfect utility to use. Findre in general permits more compact X# expression of patterns than one can obtain using intrinsic Icon X# scanning facilities. Its near complete compatibility with the Unix X# regexp library, moreover, makes for greater ease of porting, X# especially in cases where Icon is being used to prototype C code. X# X######################################################################### X X Xglobal state_table, parends_present, slash_present Xglobal biggest_nonmeta_str, __endpoint Xrecord o_a_s(op,arg,state) X X Xprocedure findre(re, s, i, j) X X local p, default_val, x, nonmeta_len, tokenized_re, tmp X static FSTN_table, STRING_table X initial { X FSTN_table := table() X STRING_table := table() X } X X if /re then { X FSTN_table := table() X STRING_table := table() X collect() # do it *now* X return X } X X if /s := &subject X then default_val := &pos X else default_val := 1 X X if \i then { X if i < 1 then X i := *s + (i+1) X } X else i := default_val X X if \j then { X if j < 1 then X j := *s + (j+1) X } X else j := *s+1 X X if /FSTN_table[re] then { X # If we haven't seen this re before, then... X if \STRING_table[re] then { X # ...if it's in the STRING_table, use plain find() X every p := find(STRING_table[re],s,i,j) X do { __endpoint := p + *STRING_table[re]; suspend p } X fail X } X else { X # However, if it's not in the string table, we have to X # tokenize it and check for metacharacters. If it has X # metas, we create an FSTN, and put that into FSTN_table; X # otherwise, we just put it into the STRING_table. X tokenized_re := tokenize(re) X if 0 > !tokenized_re then { X # if at least one element is < 0, re has metas X MakeFSTN(tokenized_re) | err_out(re,2) X # both biggest_nonmeta_str and state_table are global X /FSTN_table[re] := [.biggest_nonmeta_str, copy(state_table)] X } X else { X # re has no metas; put the input string into STRING_table X # for future reference, and execute find() at once X tmp := ""; every tmp ||:= char(!tokenized_re) X insert(STRING_table,re,tmp) X every p := find(STRING_table[re],s,i,j) X do { __endpoint := p + *STRING_table[re]; suspend p } X fail X } X } X } X X X if nonmeta_len := (1 < *FSTN_table[re][1]) then { X # If the biggest non-meta string in the original re X # was more than 1, then put in a check for it... X s[1:j] ? { X tab(x := i to j - nonmeta_len) & X (find(FSTN_table[re][1]) | fail) \ 1 & X (__endpoint := apply_FSTN(&null,FSTN_table[re][2])) & X (suspend x) X } X } X else { X #...otherwise it's not worth worrying about the biggest nonmeta str X s[1:j] ? { X tab(x := i to j) & X (__endpoint := apply_FSTN(&null,FSTN_table[re][2])) & X (suspend x) X } X } X Xend X X X Xprocedure apply_FSTN(ini,tbl) X X local biggest_pos, POS, tmp, fin X static s_tbl X X /ini := 1 & s_tbl := tbl & biggest_pos := 1 X if ini = 0 then { X return &pos X } X POS := &pos X fin := 0 X X repeat { X if tmp := !s_tbl[ini] & X tab(tmp.op(tmp.arg)) X then { X if tmp.state = fin X then return &pos X else ini := tmp.state X } X else (&pos := POS, fail) X } X Xend X X X Xprocedure tokenize(s) X X local token_list, chr, tmp, b_loc, next_one, fixed_length_token_list, i X X token_list := list() X s ? { X tab(many('*+?|')) X while chr := move(1) do { X if chr == "\\" X # it can't be a metacharacter; remove the \ and "put" X # the integer value of the next chr into token_list X then put(token_list,ord(move(1))) | err_out(s,2,chr) X else if any('*+()|?.$^',chr) X then { X # Yuck! Egrep compatibility stuff. X case chr of { X "*" : { X tab(many('*+?')) X put(token_list,-ord("*")) X } X "+" : { X tmp := tab(many('*?+')) | &null X if upto('*?',\tmp) X then put(token_list,-ord("*")) X else put(token_list,-ord("+")) X } X "?" : { X tmp := tab(many('*?+')) | &null X if upto('*+',\tmp) X then put(token_list,-ord("*")) X else put(token_list,-ord("?")) X } X "(" : { X tab(many('*+?')) X put(token_list,-ord("(")) X } X default: { X put(token_list,-ord(chr)) X } X } X } X else { X case chr of { X # More egrep compatibility stuff. X "[" : { X b_loc := find("[") | *&subject+1 X every next_one := find("]",,,b_loc) X \next_one ~= &pos | err_out(s,2,chr) X put(token_list,-ord(chr)) X } X "]" : { X if &pos = (\next_one+1) X then put(token_list,-ord(chr)) & X next_one := &null X else put(token_list,ord(chr)) X } X default: put(token_list,ord(chr)) X } X } X } X } X X token_list := UnMetaBrackets(token_list) X X fixed_length_token_list := list(*token_list) X every i := 1 to *token_list X do fixed_length_token_list[i] := token_list[i] X return fixed_length_token_list X Xend X X X Xprocedure UnMetaBrackets(l) X X # Since brackets delineate a cset, it doesn't make X # any sense to have metacharacters inside of them. X # UnMetaBrackets makes sure there are no metacharac- X # ters inside of the braces. X X local tmplst, i, Lb, Rb X X tmplst := list(); i := 0 X Lb := -ord("[") X Rb := -ord("]") X X while (i +:= 1) <= *l do { X if l[i] = Lb then { X put(tmplst,l[i]) X until l[i +:= 1] = Rb X do put(tmplst,abs(l[i])) X put(tmplst,l[i]) X } X else put(tmplst,l[i]) X } X return tmplst X Xend X X X Xprocedure MakeFSTN(l,INI,FIN) X X # MakeFSTN recursively descends through the tree structure X # implied by the tokenized string, l, recording in (global) X # fstn_table a list of operations to be performed, and the X # initial and final states which apply to them. X X local i, inter, inter2, tmp, Op, Arg X static Lp, Rp, Sl, Lb, Rb, Caret_inside, Dot, Dollar, Caret_outside X # global biggest_nonmeta_str, slash_present, parends_present X initial { X Lp := -ord("("); Rp := -ord(")") X Sl := -ord("|") X Lb := -ord("["); Rb := -ord("]"); Caret_inside := ord("^") X Dot := -ord("."); Dollar := -ord("$"); Caret_outside := -ord("^") X } X X /INI := 1 & state_table := table() & X NextState("new") & biggest_nonmeta_str := "" X /FIN := 0 X X # I haven't bothered to test for empty lists everywhere. X if *l = 0 then { X /state_table[INI] := [] X put(state_table[INI],o_a_s(zSucceed,&null,FIN)) X return X } X X # HUNT DOWN THE SLASH (ALTERNATION OPERATOR) X every i := 1 to *l do { X if l[i] = Sl & tab_bal(l,Lp,Rp) = i then { X if i = 1 then err_out(l,2,char(abs(l[i]))) else { X /slash_present := "yes" X inter := NextState() X inter2:= NextState() X MakeFSTN(l[1:i],inter2,FIN) X MakeFSTN(l[i+1:0],inter,FIN) X /state_table[INI] := [] X put(state_table[INI],o_a_s(apply_FSTN,inter2,0)) X put(state_table[INI],o_a_s(apply_FSTN,inter,0)) X return X } X } X } X X # HUNT DOWN PARENTHESES X if l[1] = Lp then { X i := tab_bal(l,Lp,Rp) | err_out(l,2,"(") X inter := NextState() X if any('*+?',char(abs(0 > l[i+1]))) then { X case l[i+1] of { X -ord("*") : { X /state_table[INI] := [] X put(state_table[INI],o_a_s(apply_FSTN,inter,0)) X MakeFSTN(l[2:i],INI,INI) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X -ord("+") : { X inter2 := NextState() X /state_table[inter2] := [] X MakeFSTN(l[2:i],INI,inter2) X put(state_table[inter2],o_a_s(apply_FSTN,inter,0)) X MakeFSTN(l[2:i],inter2,inter2) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X -ord("?") : { X /state_table[INI] := [] X put(state_table[INI],o_a_s(apply_FSTN,inter,0)) X MakeFSTN(l[2:i],INI,inter) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X } X } X else { X MakeFSTN(l[2:i],INI,inter) X MakeFSTN(l[i+1:0],inter,FIN) X return X } X } X else { # I.E. l[1] NOT = Lp (left parenthesis as -ord("(")) X every i := 1 to *l do { X case l[i] of { X Lp : { X inter := NextState() X MakeFSTN(l[1:i],INI,inter) X /parends_present := "yes" X MakeFSTN(l[i:0],inter,FIN) X return X } X Rp : err_out(l,2,")") X } X } X } X X # NOW, HUNT DOWN BRACKETS X if l[1] = Lb then { X i := tab_bal(l,Lb,Rb) | err_out(l,2,"[") X inter := NextState() X tmp := ""; every tmp ||:= char(l[2 to i-1]) X if Caret_inside = l[2] X then tmp := ~cset(Expand(tmp[2:0])) X else tmp := cset(Expand(tmp)) X if any('*+?',char(abs(0 > l[i+1]))) then { X case l[i+1] of { X -ord("*") : { X /state_table[INI] := [] X put(state_table[INI],o_a_s(apply_FSTN,inter,0)) X put(state_table[INI],o_a_s(any,tmp,INI)) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X -ord("+") : { X inter2 := NextState() X /state_table[INI] := [] X put(state_table[INI],o_a_s(any,tmp,inter2)) X /state_table[inter2] := [] X put(state_table[inter2],o_a_s(apply_FSTN,inter,0)) X put(state_table[inter2],o_a_s(any,tmp,inter2)) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X -ord("?") : { X /state_table[INI] := [] X put(state_table[INI],o_a_s(apply_FSTN,inter,0)) X put(state_table[INI],o_a_s(any,tmp,inter)) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X } X } X else { X /state_table[INI] := [] X put(state_table[INI],o_a_s(any,tmp,inter)) X MakeFSTN(l[i+1:0],inter,FIN) X return X } X } X else { # I.E. l[1] not = Lb X every i := 1 to *l do { X case l[i] of { X Lb : { X inter := NextState() X MakeFSTN(l[1:i],INI,inter) X MakeFSTN(l[i:0],inter,FIN) X return X } X Rb : err_out(l,2,"]") X } X } X } X X # FIND INITIAL SEQUENCES OF POSITIVE INTEGERS, CONCATENATE THEM X if i := match_positive_ints(l) then { X inter := NextState() X tmp := Ints2String(l[1:i]) X # if a slash has been encountered already, forget optimizing X # in this way; if parends are present, too, then forget it, X # unless we are at the beginning or end of the input string X if INI = 1 | FIN = 2 | /parends_present & X /slash_present & *tmp > *biggest_nonmeta_str X then biggest_nonmeta_str := tmp X /state_table[INI] := [] X put(state_table[INI],o_a_s(match,tmp,inter)) X MakeFSTN(l[i:0],inter,FIN) X return X } X X # OKAY, CLEAN UP ALL THE JUNK THAT'S LEFT X i := 0 X while (i +:= 1) <= *l do { X case l[i] of { X Dot : { Op := any; Arg := &cset } X Dollar : { Op := pos; Arg := 0 } X Caret_outside: { Op := pos; Arg := 1 } X default : { Op := match; Arg := char(0 < l[i]) } X } | err_out(l,2,char(abs(l[i]))) X inter := NextState() X if any('*+?',char(abs(0 > l[i+1]))) then { X case l[i+1] of { X -ord("*") : { X /state_table[INI] := [] X put(state_table[INI],o_a_s(apply_FSTN,inter,0)) X put(state_table[INI],o_a_s(Op,Arg,INI)) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X -ord("+") : { X inter2 := NextState() X /state_table[INI] := [] X put(state_table[INI],o_a_s(Op,Arg,inter2)) X /state_table[inter2] := [] X put(state_table[inter2],o_a_s(apply_FSTN,inter,0)) X put(state_table[inter2],o_a_s(Op,Arg,inter2)) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X -ord("?") : { X /state_table[INI] := [] X put(state_table[INI],o_a_s(apply_FSTN,inter,0)) X put(state_table[INI],o_a_s(Op,Arg,inter)) X MakeFSTN(l[i+2:0],inter,FIN) X return X } X } X } X else { X /state_table[INI] := [] X put(state_table[INI],o_a_s(Op,Arg,inter)) X MakeFSTN(l[i+1:0],inter,FIN) X return X } X } X X # WE SHOULD NOW BE DONE INSERTING EVERYTHING INTO state_table X # IF WE GET TO HERE, WE'VE PARSED INCORRECTLY! X err_out(l,4) X Xend X X X Xprocedure NextState(new) X static nextstate X if \new then nextstate := 1 X else nextstate +:= 1 X return nextstate Xend X X X Xprocedure err_out(x,i,elem) X writes(&errout,"Error number ",i," parsing ",image(x)," at ") X if \elem X then write(&errout,image(elem),".") X else write(&errout,"(?).") X exit(i) Xend X X X Xprocedure zSucceed() X return .&pos Xend X X X Xprocedure Expand(s) X X local s2, c1, c2 X X s2 := "" X s ? { X s2 ||:= ="^" X s2 ||:= ="-" X while s2 ||:= tab(find("-")-1) do { X if (c1 := move(1), ="-", X c2 := move(1), X c1 << c2) X then every s2 ||:= char(ord(c1) to ord(c2)) X else s2 ||:= 1(move(2), not(pos(0))) | err_out(s,2,"-") X } X s2 ||:= tab(0) X } X return s2 X Xend X X X Xprocedure tab_bal(l,i1,i2) X X local i, i1_count, i2_count X X i := 0 X i1_count := 0; i2_count := 0 X while (i +:= 1) <= *l do { X case l[i] of { X i1 : i1_count +:= 1 X i2 : i2_count +:= 1 X } X if i1_count = i2_count X then suspend i X } X Xend X X Xprocedure match_positive_ints(l) X X # Matches the longest sequence of positive integers in l, X # beginning at l[1], which neither contains, nor is fol- X # lowed by a negative integer. Returns the first position X # after the match. Hence, given [55, 55, 55, -42, 55], X # match_positive_ints will return 3. [55, -42] will cause X # it to fail rather than return 1 (NOTE WELL!). X X local i X X every i := 1 to *l do { X if l[i] < 0 X then return (3 < i) - 1 | fail X } X return *l + 1 X Xend X X Xprocedure Ints2String(l) X X local tmp X X tmp := "" X every tmp ||:= char(!l) X return tmp X Xend X X Xprocedure StripChar(s,s2) X X local tmp X X if find(s2,s) then { X tmp := "" X s ? { X while tmp ||:= tab(find("s2")) X do tab(many(cset(s2))) X tmp ||:= tab(0) X } X } X return \tmp | s X Xend SHAR_EOF true || echo 'restore of findre.icn failed' rm -f _shar_wnt_.tmp fi # ============= huffcode.icn ============== if test -f 'huffcode.icn' -a X"$1" != X"-c"; then echo 'x - skipping huffcode.icn (File already exists)' rm -f _shar_wnt_.tmp else > _shar_wnt_.tmp echo 'x - extracting huffcode.icn (Text)' sed 's/^X//' << 'SHAR_EOF' > 'huffcode.icn' && X############################################################################ X# X# Name: huffcode.icn X# X# Title: huffman coding tools X# X# Author: Richard L. Goerwitz X# X# Version: 1.4 X# X############################################################################ X# X# An odd assortment of tools that lets me compress text using an X# Iconish version of a generic Huffman algorithm. See block_encode(). X# X############################################################################ X# X# Links: outbits.icn inbits.icn X# X# See also: press.icn X# X############################################################################ X Xrecord node(l,r,n) Xrecord _N(l,r) Xrecord leaf(c,n) Xrecord hcode(c,i,len) X X# For debugging purposes. X# link ximage X Xprocedure count_chars(s, char_tbl) X X # X # Count chars in s, placing stats in char_tbl (keys = chars in X # s, values = leaf records, with the counts for each chr in s X # contained in char_tbl[chr].n). X # X local chr X initial { X /char_tbl & stop("count_chars: need 2 args - 1 string, 2 table") X *char_tbl ~= 0 & stop("count_chars: start me with an empty table!") X } X X s ? { X while chr := move(1) do { X /char_tbl[chr] := leaf(chr,0) X char_tbl[chr].n +:= 1 X } X } X X# write(ximage(char_tbl)) X return *char_tbl # for lack of anything better X Xend X X Xprocedure heap_init(char_tbl) X X # X # Create heap data structure out of the table filled out by X # successive calls to count_chars(s,t). The heap is just a X # list. Naturally, it's size can be obtained via *heap. X # X local heap X X heap := list() X every push(heap, !char_tbl) do { X resettle_heap(heap, 1) X# write(ximage(heap)) X } X X return heap X Xend X X Xprocedure resettle_heap(h, k) X X # X # Based loosely on Sedgewick (2nd. ed., 1988), p. 160. Take k-th X # node on the heap, and walk down the heap, switching this node X # along the way with the child whose value is the least AND whose X # value is less than this node's. Stop when you find no children X # whose value is less than that of the original node. Elements on X # heap are records of type leaf, with the values contained in the X # "n" field. X # X local j X X # While we haven't spilled off the end of the heap (the size of the X # heap is *h; *h / 2 is the biggest k we need to look at)... X while k <= (*h / 2) do { X X # ...double k, assign the result to j. X j := k+k X X # If we aren't at the end of the heap... X if j < *h then { X # ...check to see which of h[k]'s children is the smallest, X # and make j point to it. X if h[j].n > h[j+1].n then X # h[j] :=: h[j+1] X j +:= 1 X } X X # If the current parent (h[k]) has a value less than those of its X # children, then break; we're done. X if h[k].n <= h[j].n then break X X # Otherwise, switch the parent for the child, and loop around X # again, with k (the pointer to the parent) now pointing to the X # new offset of the element we have been working on. X h[k] :=: h[j] X k := j X X } X X return k X Xend X X Xprocedure heap_2_huffman_tree(h) X X # X # Construct the Huffman tree out of heap h. Find the smallest X # element, pop it off the heap, then reshuffle the heap. After X # reshuffling, replace the top record on the stack with a node() X # record whose n field equal to the sum of the n fields for the X # element popped off the stack originally, and the one that is X # now about to be replaced. Link the new node record to the 2 X # elements on the heap it is now replacing. Reshuffle the heap X # again, then repeat. You're done when the size of the heap is SHAR_EOF true || echo 'restore of huffcode.icn failed' fi echo 'End of part 3' echo 'File huffcode.icn is continued in part 4' echo 4 > _shar_seq_.tmp exit 0 exit 0 # Just in case... -- Kent Landfield INTERNET: kent@sparky.IMD.Sterling.COM Sterling Software, IMD UUCP: uunet!sparky!kent Phone: (402) 291-8300 FAX: (402) 291-4362 Please send comp.sources.misc-related mail to kent@uunet.uu.net.