home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Usenet 1994 October
/
usenetsourcesnewsgroupsinfomagicoctober1994disk2.iso
/
misc
/
volume1
/
8711
/
2
< prev
next >
Wrap
Internet Message Format
|
1990-07-13
|
47KB
Path: uunet!seismo!sundc!pitstop!sun!amdcad!ames!necntc!ncoast!allbery
From: sommar@enea.se.UUCP (Erland Sommarskog)
Newsgroups: comp.sources.misc
Subject: A customable string-comparison package
Message-ID: <4977@ncoast.UUCP>
Date: 2 Nov 87 02:40:22 GMT
Sender: allbery@ncoast.UUCP
Lines: 1126
Approved: allbery@ncoast.UUCP
X-Archive: comp.sources.misc/8711/2
This is to be posted in comp.sources.misc. Thank you.
This posting contains a package for string-comparisons
in fairly sophisticated way where reagrd to accents,
non-letters and case is only taken if necessary.
The user defines how each character should be sorted,
whether if it is a letter, and whether it is a variant
of another letter. See READ ME for complete description.
The purpose is not to provide a facility, though,
rather to demonstrate the idea. The code is in Ada,
so you may have problem with using it straight off,
if you don't have access to a compiler. Still you can
take advantage of the ideas if you like. Comments and
questions are welcome to:
Erland Sommarskog
ENEA Data, Stockholm
sommar@enea
----------------------------------------------------------------------
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create:
# READ ME
# comline.a
# define.a
# latin1.a
# main.a
# natascii.a
# strcompb.a
# strcomps.a
# This archive created: Fri Oct 30 23:01:51 1987
export PATH; PATH=/bin:/usr/bin:$PATH
if test -f 'READ ME'
then
echo shar: "will not over-write existing file 'READ ME'"
else
cat << \SHAR_EOF > 'READ ME'
The intention of this posting is not to provide a facility, but
rather to demonstrate a technique to do string comparisons
in a more sophisticated way than simply using ASCII values.
Comments, questions etc are very welcome to:
Erland Sommarskog
ENEA Data, Stockholm
sommar@enea.UUCP
The posting contains seven files that can be divided into three
groups:
I: strcompS.a and strcompB.a
The core of the posting. They contain a package for string
comparisons. It has a character-transscription table to be
loaded by the user and comparison operators for trans-
scripted string. The exported routines are described below.
StrcompS is the specification, whereas strcompB contains
the package body.
II: latin1.a and natascii.a
They declare names for characters, to be used, for example,
when defining a collating sequence for the package above.
Latin1 declares names for the ISO standard 8859/1. Natascii
declares names for national replacements of the ordinary
ASCII set.
III: define.a, comline.a and main.a
An demonstration application that uses the string-comparison
package. Define.a loads the character collating sequence.
Comline.a reads the command line. Note that this file is
bound to Verdix Ada for Unix and must be rewritten for another
system.
Main.a is the main program. It reads lines from standard
input or a named file and writes the sorted lines to standard
output when end-of-file is detected.
You find a description of the options last in this file.
You should compile the files in the order: latin1, natascii,
strcompS, strcompB, define, comline, main.
Four-dimensional sorting
------------------------
The string-comparison package compares strings at four levels:
1) Alphabetic
2) Accents
3) Non-letters
4) Difference in case
What is an alphabetic etc is up to the user. He may define "$"
being a letter with "(" as its lowercase variant if he likes.
One level is only regarded if the level above have no difference.
As an example I take
T^ete-`a-t^ete
(I assume a "normal" loading of the character table here.)
For the first level we use TETEATETE, thus we remove the accents
and the hyphens. On the next we re-insert the accents so we get
T^ETE`AT^ETE
On level three we only take the hyphens in regard. When comparing
non-letters the package uses the simple ASCII values. The earlier
a character comes, the lower is the sort value. Thus, "trans-scription"
will precede "transscrip-tion". (Actually, as the implementation
is done, the position is more important than the ASCII value.)
On the last level we use
T^ete`at^ete
thus, the original writing with the hyphens removed. Note that the
user can specify case to be insigificant.
(This isn't a description on how the package is implemented, just
a way of illustrating the result. In practice it's done a little
more effective.)
When defining accented variants it is possible to let a character
be a variant of a string, in this way the AE ligature can be sorted
as "AE". The opposite is not possible, and what worse is, a string
can't have an alphabetic value. Thus the package is not able to sort
languages as Spanish (CH and LL) correctly.
The number characters are handled in a special way if you define them
as alphabetics. A sequence of figures will read as one number and sort
after all other alphabetics. (Even if they were defined as the first
characters.) So you will get
File1 File2 File10 File11
instead of the usual
File1 File10 File11 File2
If you like to sort them as they are read, this is also possible.
E.g. load "0" as a variant of "zero".
The package contains the following routines:
Load Operations
---------------
PROCEDURE Load_alphabetic(ch : IN character);
Loads ch as the next alphabetic character. The order of loading
determines the sorting values.
PROCEDURE Load_variant(ch : IN character;
Equ_ch : IN character;
Equ_kind : IN Equivalence_kind);
TYPE Equivalence_kind IS (Exact, Case_diff, Accented);
PROCEDURE Load_variant(ch : IN character;
Equ_str : IN string);
Load_variant loads ch as a variant of Equ_ch or Equ_str. The interpretation
of Equ_kind is:
Exact: Exactly the same. There is no difference. What you use when you
don't want case to be significant.
Case_diff: Load ch as a lowercase variant of Equ_ch. There will be
difference at level 4.
Accented: Load ch as variant of Equ_ch at level 2.
The latter version of Load_variant always loads ch at level 2.
For simplify loading, the package also provides routines for loading
a character and its ASCII lowercase equivalent simultaneously:
PROCEDURE Set_case_significance(Flag : boolean);
PROCEDURE Alpha_both_cases(ch : IN character);
PROCEDURE Variant_both_cases(ch : IN character;
Equ_ch : IN character);
PROCEDURE Variant_both_cases(ch : IN character;
Equ_str : IN string);
With Set_case_significant you determine whether case should be
significant when loading the pairs. Variant_both_cases loads ch
at level 2.
The loading operations raise Already_defined if an attempt is
made to load a character twice. If Equ_ch or part of Equ_str is
undefined, this gives the exception Undefined_equivalent.
Transscription operations
-------------------------
These routines translates a string to the internal coding.
TYPE Transscripted_string(Max_length : natural) IS PRIVATE;
PROCEDURE Transscribe(ch : IN character;
Trans_str : OUT Transscripted_string);
PROCEDURE Transscribe(Str : IN string;
Trans_str : OUT Transscripted_string);
If the transscription is too long, the routines will raise
Transscription_error.
Comparison operators:
---------------------
FUNCTION "<=" (Left, Right : Transscripted_string) RETURN boolean;
FUNCTION "<" (Left, Right : Transscripted_string) RETURN boolean;
FUNCTION ">=" (Left, Right : Transscripted_string) RETURN boolean;
FUNCTION ">" (Left, Right : Transscripted_string) RETURN boolean;
I have only included operations for comparing transscripted
strings. Of course there could be a set for uncoded strings too.
Other function
--------------
FUNCTION Is_letter(ch : character) RETURN boolean;
The demonstration program
-------------------------
The program takes the options:
-8 Use ISO/Latin-1. If not present, use 7-bit ASCII with national
replacements.
-e Case is significant. When omitted, case is not significant.
-LX Selects language. X should be one of the following:
s or S: Swedish. (Default)
d or D: Danish
g: German1: "A, "O and "U sorts as A, O and U.
G: German2: "A, "O and "U sorts as AE, OE and UE.
f or F French
In the definition routine I load space as the first alphabetic
letter. This gives the result that "Smith, Tony" will sort
before "Smithson, Alan".
SHAR_EOF
fi
if test -f 'comline.a'
then
echo shar: "will not over-write existing file 'comline.a'"
else
cat << \SHAR_EOF > 'comline.a'
----------------------------------------------------------------------
-- PROCEDURE Read_command_line --
----------------------------------------------------------------------
-- This procedure reads the command line to get the options and the
-- input file. You will probably have to replace it, unless you also
-- use Verdix Ada system for Unix.
WITH Define; Use Define;
WITH Command_line; USE Command_line; -- Verdix package
WITH Text_io;
WITH IO_exceptions;
PROCEDURE Read_command_line(Language : OUT Define.Languages;
Exact : OUT boolean;
Eightbit : OUT boolean) IS
BEGIN
FOR i IN 1..argc - 1 LOOP
IF argv(i).s(1) = '-' THEN
CASE argv(i).s(2) IS
WHEN '8' => Eightbit := true;
WHEN 'E' ! 'e' => Exact := true;
WHEN 'L' ! 'l' => CASE argv(i).s(3) IS
WHEN 's' ! 'S' => Language := Swedish;
WHEN 'd' ! 'D' => Language := Danish;
WHEN 'g' => Language := German1;
WHEN 'G' => Language := German2;
WHEN 'f' ! 'F' => Language := French;
WHEN OTHERS => NULL;
END CASE;
WHEN OTHERS => Text_io.Put_line("Unknown option: " & argv(i).s);
END CASE;
ELSE
DECLARE
USE Text_io;
Infile : File_type;
BEGIN
Open(Infile, In_file, argv(i).s);
Set_input(Infile);
EXCEPTION
WHEN IO_exceptions.Name_error =>
Put_line(argv(i).s & " does not exsist");
END;
END IF;
END LOOP;
END Read_command_line;
SHAR_EOF
fi
if test -f 'define.a'
then
echo shar: "will not over-write existing file 'define.a'"
else
cat << \SHAR_EOF > 'define.a'
----------------------------------------------------------------------
-- Define collating sequence --
----------------------------------------------------------------------
-- This package contains a procedure with the same name that demon-
-- strates the use of the load operations in the String_comparison
-- package.
PACKAGE Define IS
TYPE Languages IS (Swedish, Danish, German1, German2, French);
-- German1 sort "A, "O and "U as A, O and U. German2 as AE, OE and UE.
PROCEDURE Collatting_sequence(Language : IN Languages;
Case_significant : IN boolean;
Eightbit : IN boolean);
END Define;
WITH String_comparison; USE String_comparison;
WITH ISO_Latin_1; USE ISO_Latin_1;
WITH National_ASCII; USE National_ASCII;
PACKAGE BODY Define IS
PROCEDURE Collatting_sequence(Language : IN Languages;
Case_significant : IN boolean;
Eightbit : IN boolean) IS
BEGIN
-- Set the significane of case
Set_case_significance(Case_significant);
-- Load space as the first letter and the A to Z
Load_alphabetic(' ');
-- Load the letters from A to Z to begin with
FOR ch IN 'A'..'V' LOOP
Alpha_both_cases(ch);
END LOOP;
IF Language = Swedish THEN
Variant_both_cases('W', 'V');
ELSE
Alpha_both_cases('W');
END IF;
FOR ch IN 'X'..'Z' LOOP
Alpha_both_cases(ch);
END LOOP;
-- And so for the specific letters. Begin with the seven-bits
IF NOT Eightbit THEN
CASE Language IS
WHEN Swedish => Alpha_both_cases(SW_UC_A_ring);
Alpha_both_cases(SW_UC_A_dots);
Alpha_both_cases(SW_UC_O_dots);
Variant_both_cases(SW_UC_E_acute, 'E');
Variant_both_cases(SW_UC_U_dots, 'Y');
WHEN Danish => Alpha_both_cases(DA_UC_AE);
Alpha_both_cases(DA_UC_O_oblique);
Alpha_both_cases(DA_UC_A_ring);
WHEN German1 => Variant_both_cases(GER_UC_A_dots, 'A');
Variant_both_cases(GER_UC_O_dots, 'O');
Variant_both_cases(GER_UC_U_dots, 'U');
Load_variant(GER_LC_s_sharp, "ss");
WHEN German2 => Variant_both_cases(GER_UC_A_dots, "AE");
Variant_both_cases(GER_UC_O_dots, "OE");
Variant_both_cases(GER_UC_U_dots, "UE");
Load_variant(GER_LC_s_sharp, "ss");
WHEN French => Load_variant(FR_LC_a_grave, 'a', Accented);
Load_variant(FR_LC_c_cedilla, 'c', Accented);
Load_variant(FR_LC_e_acute, 'e', Accented);
Load_variant(FR_LC_u_grave, 'u', Accented);
Load_variant(FR_LC_e_grave, 'e', Accented);
END CASE;
-- Now lets take the eightbit case, ISO-Latin/1.
ELSE
-- First we take characters that differs from langauge to language
-- They are oA, "A, AE, "O, /O, and "U.
CASE Language IS
WHEN Swedish => Alpha_both_cases(UC_A_ring);
Alpha_both_cases(UC_A_dots);
Variant_both_cases(UC_AE, UC_A_dots);
Alpha_both_cases(UC_O_dots);
Variant_both_cases(UC_O_oblique, UC_O_dots);
Variant_both_cases(UC_U_dots, 'Y');
WHEN Danish => Alpha_both_cases(UC_AE);
Variant_both_cases(UC_A_dots, UC_AE);
Alpha_both_cases(UC_O_oblique);
Variant_both_cases(UC_O_dots, UC_O_oblique);
Alpha_both_cases(UC_A_ring);
Variant_both_cases(UC_U_dots, 'Y');
WHEN German1 !
French => Variant_both_cases(UC_A_dots, 'A');
Variant_both_cases(UC_O_dots, 'O');
Variant_both_cases(UC_U_dots, 'U');
Variant_both_cases(UC_A_ring, 'A');
Variant_both_cases(UC_O_oblique, 'O');
Variant_both_cases(UC_AE, "AE");
WHEN German2 => Variant_both_cases(UC_A_dots, "AE");
Variant_both_cases(UC_O_dots, "OE");
Variant_both_cases(UC_U_dots, "UE");
Variant_both_cases(UC_A_ring, 'A');
Variant_both_cases(UC_O_oblique, 'O');
Variant_both_cases(UC_AE, "AE");
END CASE;
-- All other variants
Variant_both_cases(UC_A_grave, 'A');
Variant_both_cases(UC_A_acute, 'A');
Variant_both_cases(UC_A_circum, 'A');
Variant_both_cases(UC_A_tilde, 'A');
Variant_both_cases(UC_C_cedilla, 'C');
Variant_both_cases(UC_E_grave, 'E');
Variant_both_cases(UC_E_acute, 'E');
Variant_both_cases(UC_E_circum, 'E');
Variant_both_cases(UC_E_dots, 'E');
Variant_both_cases(UC_Edh, 'D');
Variant_both_cases(UC_I_grave, 'I');
Variant_both_cases(UC_I_acute, 'I');
Variant_both_cases(UC_I_circum, 'I');
Variant_both_cases(UC_I_dots, 'I');
Variant_both_cases(UC_N_tilde, 'N');
Variant_both_cases(UC_O_grave, 'O');
Variant_both_cases(UC_O_acute, 'O');
Variant_both_cases(UC_O_circum, 'O');
Variant_both_cases(UC_O_tilde, 'O');
Load_variant(LC_s_sharp, "ss");
Variant_both_cases(UC_U_grave, 'U');
Variant_both_cases(UC_U_acute, 'U');
Variant_both_cases(UC_U_circum, 'U');
Variant_both_cases(UC_Y_acute, 'Y');
Load_variant(LC_y_dots, 'y', Accented);
END IF;
-- Finally the numbers
FOR ch IN '0'..'9' LOOP
Load_alphabetic(ch);
END LOOP;
END Collatting_sequence;
END Define;
SHAR_EOF
fi
if test -f 'latin1.a'
then
echo shar: "will not over-write existing file 'latin1.a'"
else
cat << \SHAR_EOF > 'latin1.a'
----------------------------------------------------------------------
-- PACKAGE ISO_Latin_1 --
----------------------------------------------------------------------
-- This package defines names for the characters in the standard
-- ISO 8859/1, known as Latin-1, that are not in the ASCII set,
-- i.e. characters with codes >= 160. (Control characters 128-159
-- are excluded.
WITH Unchecked_conversion;
PACKAGE ISO_Latin_1 IS
-- Implementation note: To define the constants within the existing
-- character type I use Unchecked_conversion. Note that this is not
-- legal Ada. Ada defines the character type as covering codes from
-- 0 to 127. Thus, all these declarations should raise Constraint_error,
-- however neither DEC Ada, nor Verdix for Unix do so.
-- Note also that the Ada definition permits an implementation to
-- restrict Unchecked_conversion.
-- The proper way would be define a new enumeration type, however this
-- requires more work, including a new Text_io.
TYPE Byte IS NEW integer RANGE 0..255;
FUNCTION Eight_bit IS NEW Unchecked_conversion(Byte, Character);
No_break_space : CONSTANT character := Eight_bit(160);
Exclaim_up_down : CONSTANT character := Eight_bit(161);
Cent : CONSTANT character := Eight_bit(162);
Pound : CONSTANT character := Eight_bit(163);
Gen_currency : CONSTANT character := Eight_bit(164);
Yen : CONSTANT character := Eight_bit(165);
Broken_bar : CONSTANT character := Eight_bit(166);
Paragraph : CONSTANT character := Eight_bit(167);
Diaraesis : CONSTANT character := Eight_bit(168);
Copyright : CONSTANT character := Eight_bit(169);
Fem_ordinal : CONSTANT character := Eight_bit(170);
L_angle_quote : CONSTANT character := Eight_bit(171);
Not_sign : CONSTANT character := Eight_bit(172);
Soft_hyphen : CONSTANT character := Eight_bit(173);
Reg_trade : CONSTANT character := Eight_bit(174);
Macron : CONSTANT character := Eight_bit(175);
Degree : CONSTANT character := Eight_bit(176);
Plus_minus : CONSTANT character := Eight_bit(177);
Super_2 : CONSTANT character := Eight_bit(178);
Super_3 : CONSTANT character := Eight_bit(179);
Acute : CONSTANT character := Eight_bit(180);
Mu : CONSTANT character := Eight_bit(181);
Pilcrow : CONSTANT character := Eight_bit(182);
Middle_dot : CONSTANT character := Eight_bit(183);
Cedilla : CONSTANT character := Eight_bit(184);
Super_1 : CONSTANT character := Eight_bit(185);
Mask_ord : CONSTANT character := Eight_bit(186);
R_angle_quote : CONSTANT character := Eight_bit(187);
Quarter : CONSTANT character := Eight_bit(188);
Half : CONSTANT character := Eight_bit(189);
Three_quarter : CONSTANT character := Eight_bit(190);
Query_up_down : CONSTANT character := Eight_bit(191);
UC_A_grave : CONSTANT character := Eight_bit(192);
UC_A_acute : CONSTANT character := Eight_bit(193);
UC_A_circum : CONSTANT character := Eight_bit(194);
UC_A_tilde : CONSTANT character := Eight_bit(195);
UC_A_dots : CONSTANT character := Eight_bit(196);
UC_A_ring : CONSTANT character := Eight_bit(197);
UC_AE : CONSTANT character := Eight_bit(198);
UC_C_cedilla : CONSTANT character := Eight_bit(199);
UC_E_grave : CONSTANT character := Eight_bit(200);
UC_E_acute : CONSTANT character := Eight_bit(201);
UC_E_circum : CONSTANT character := Eight_bit(202);
UC_E_dots : CONSTANT character := Eight_bit(203);
UC_I_grave : CONSTANT character := Eight_bit(204);
UC_I_acute : CONSTANT character := Eight_bit(205);
UC_I_circum : CONSTANT character := Eight_bit(206);
UC_I_dots : CONSTANT character := Eight_bit(207);
UC_edh : CONSTANT character := Eight_bit(208);
UC_N_tilde : CONSTANT character := Eight_bit(209);
UC_O_grave : CONSTANT character := Eight_bit(210);
UC_O_acute : CONSTANT character := Eight_bit(211);
UC_O_circum : CONSTANT character := Eight_bit(212);
UC_O_tilde : CONSTANT character := Eight_bit(213);
UC_O_dots : CONSTANT character := Eight_bit(214);
Mult_sign : CONSTANT character := Eight_bit(215);
UC_O_oblique : CONSTANT character := Eight_bit(216);
UC_U_grave : CONSTANT character := Eight_bit(217);
UC_U_acute : CONSTANT character := Eight_bit(218);
UC_U_circum : CONSTANT character := Eight_bit(219);
UC_U_dots : CONSTANT character := Eight_bit(220);
UC_Y_acute : CONSTANT character := Eight_bit(221);
UC_thorn : CONSTANT character := Eight_bit(222);
LC_s_sharp : CONSTANT character := Eight_bit(223);
LC_a_grave : CONSTANT character := Eight_bit(224);
LC_a_acute : CONSTANT character := Eight_bit(225);
LC_a_circum : CONSTANT character := Eight_bit(226);
LC_a_tilde : CONSTANT character := Eight_bit(227);
LC_a_dots : CONSTANT character := Eight_bit(228);
LC_a_ring : CONSTANT character := Eight_bit(229);
LC_ae : CONSTANT character := Eight_bit(230);
LC_c_cedilla : CONSTANT character := Eight_bit(231);
LC_e_grave : CONSTANT character := Eight_bit(232);
LC_e_acute : CONSTANT character := Eight_bit(233);
LC_e_circum : CONSTANT character := Eight_bit(234);
LC_e_dots : CONSTANT character := Eight_bit(235);
LC_i_grave : CONSTANT character := Eight_bit(236);
LC_i_acute : CONSTANT character := Eight_bit(237);
LC_i_circum : CONSTANT character := Eight_bit(238);
LC_i_dots : CONSTANT character := Eight_bit(239);
LC_edh : CONSTANT character := Eight_bit(240);
LC_n_tilde : CONSTANT character := Eight_bit(241);
LC_o_grave : CONSTANT character := Eight_bit(242);
LC_o_acute : CONSTANT character := Eight_bit(243);
LC_o_circum : CONSTANT character := Eight_bit(244);
LC_o_tilde : CONSTANT character := Eight_bit(245);
LC_o_dots : CONSTANT character := Eight_bit(246);
Div_sign : CONSTANT character := Eight_bit(247);
LC_o_oblique : CONSTANT character := Eight_bit(248);
LC_u_grave : CONSTANT character := Eight_bit(249);
LC_u_acute : CONSTANT character := Eight_bit(250);
LC_u_circum : CONSTANT character := Eight_bit(251);
LC_u_dots : CONSTANT character := Eight_bit(252);
LC_y_acute : CONSTANT character := Eight_bit(253);
LC_thorn : CONSTANT character := Eight_bit(254);
LC_y_dots : CONSTANT character := Eight_bit(255);
END ISO_latin_1;
SHAR_EOF
fi
if test -f 'main.a'
then
echo shar: "will not over-write existing file 'main.a'"
else
cat << \SHAR_EOF > 'main.a'
----------------------------------------------------------------------
-- Sort package and main program --
----------------------------------------------------------------------
-- This file contains a sort package that uses the string-comparison
-- package when sorting and the main program. The sort package is very
-- simple, it contains just one routine for inserting into the tree
-- and for writing the tree to standard output.
PACKAGE Sort_package IS
PROCEDURE Insert(Str : IN string);
PROCEDURE Write_tree;
END Sort_package;
-- The main program. Reads line from standard input and insert them
-- into the sort package. When end-of-fils is detected, write the
-- tree.
WITH Text_io;
WITH IO_exceptions;
WITH Sort_package;
WITH Define; USE Define;
WITH Read_command_line;
PROCEDURE Main IS
Language : Define.Languages := Swedish;
Eightbit : boolean := false;
Exact : boolean := false;
Line : string(1..80);
Len : natural;
BEGIN
Read_command_line(Language, Exact, Eightbit);
Define.collatting_sequence(Language, Exact, Eightbit);
LOOP
Text_io.Get_line(Line, Len);
Sort_package.Insert(Line(1..Len));
END LOOP;
EXCEPTION
WHEN IO_exceptions.End_error => Sort_package.Write_tree;
END Main;
-- Below the body of the sort package
WITH Text_io;
WITH String_comparison; USE String_comparison;
PACKAGE BODY Sort_package IS
TYPE Tree_entry(Key_size : positive; Str_len : natural);
TYPE Tree_type IS ACCESS Tree_entry;
TYPE Tree_entry(Key_size : positive; Str_len : natural) IS
RECORD
Left : Tree_type := NULL;
Right : Tree_type := NULL;
Key : Transscripted_string(Key_size);
Str : string(1..Str_len);
END RECORD;
Tree : Tree_type := NULL;
-- Internal recursive insertion procedure. Called by the exported
PROCEDURE Insert(Tree : IN OUT Tree_type;
Key : IN Transscripted_string;
Str : IN string) IS
BEGIN
IF Tree /= NULL THEN
IF Key < Tree.Key THEN
Insert(Tree.left, Key, Str);
ELSIF Key > Tree.Key THEN
Insert(Tree.right, Key, Str);
END IF;
ELSE
Tree := NEW Tree_entry(Key.Max_length, Str'length);
Tree.Key := Key;
Tree.Str := Str;
END IF;
END Insert;
-- Exported Insert
PROCEDURE Insert(Str : IN string) IS
Transscript : Transscripted_string(Str'length + 20);
BEGIN
Transscribe(Str, Transscript);
Insert(Tree, Transscript, Str);
EXCEPTION
WHEN Transscription_error =>
Text_io.Put_line(Str);
Text_io.Put_line("This line has too long transscription. Skipped.");
END Insert;
-- This procedure travserse the tree and writes all entries on standard output
PROCEDURE Write_tree(Tree : IN Tree_type) IS
BEGIN
IF Tree /= NULL THEN
Write_tree(Tree.Left);
Text_io.Put_line(Tree.Str);
Write_tree(Tree.Right);
END IF;
END Write_tree;
-- Exported Write_tree;
PROCEDURE Write_tree IS
BEGIN
Write_tree(Tree);
END;
END Sort_package;
SHAR_EOF
fi
if test -f 'natascii.a'
then
echo shar: "will not over-write existing file 'natascii.a'"
else
cat << \SHAR_EOF > 'natascii.a'
----------------------------------------------------------------------
-- PACKAGE National ASCII --
----------------------------------------------------------------------
-- This package declares alternate names for the ASCII codes
-- 64, 91-94, 96 and 123-126 to be used when when these codes refers
-- to national characters. The names are restricted to letters.
-- Languages covered: Swedish/Finnish, Danish/Norwegian, German,
-- French and Italian.
PACKAGE National_ASCII IS
-- Swedish and Finnish
SW_UC_E_acute : CONSTANT character := '@';
SW_UC_A_ring : CONSTANT character := ']';
SW_UC_A_dots : CONSTANT character := '[';
SW_UC_O_dots : CONSTANT character := '\';
SW_UC_U_dots : CONSTANT character := '^';
SW_LC_e_acute : CONSTANT character := '`';
SW_LC_a_ring : CONSTANT character := '}';
SW_LC_a_dots : CONSTANT character := '{';
SW_LC_o_dots : CONSTANT character := '|';
SW_LC_u_dots : CONSTANT character := '~';
-- Danish and Norwegian
DA_UC_AE : CONSTANT character := '[';
DA_UC_O_oblique : CONSTANT character := '\';
DA_UC_A_ring : CONSTANT character := ']';
DA_UC_U_dots : CONSTANT character := '^';
DA_LC_ae : CONSTANT character := '{';
DA_LC_o_oblique : CONSTANT character := '|';
DA_LC_a_ring : CONSTANT character := '}';
DA_LC_u_dots : CONSTANT character := '~';
-- German
GER_UC_A_dots : CONSTANT character := '[';
GER_UC_O_dots : CONSTANT character := '\';
GER_UC_U_dots : CONSTANT character := ']';
GER_LC_a_dots : CONSTANT character := '{';
GER_LC_o_dots : CONSTANT character := '|';
GER_LC_u_dots : CONSTANT character := '}';
GER_LC_s_sharp : CONSTANT character := '~';
-- French
FR_LC_a_grave : CONSTANT character := '@';
FR_LC_c_cedilla : CONSTANT character := '\';
FR_LC_e_acute : CONSTANT character := '{';
FR_LC_u_grave : CONSTANT character := '|';
FR_LC_e_grave : CONSTANT character := '}';
-- Italian
IT_LC_A_ring : CONSTANT character := ']';
IT_LC_u_grave : CONSTANT character := '`';
IT_LC_a_grave : CONSTANT character := '}';
IT_LC_o_grave : CONSTANT character := '{';
IT_LC_e_grave : CONSTANT character := '|';
IT_LC_i_grave : CONSTANT character := '~';
END National_ASCII;
SHAR_EOF
fi
if test -f 'strcompb.a'
then
echo shar: "will not over-write existing file 'strcompb.a'"
else
cat << \SHAR_EOF > 'strcompb.a'
----------------------------------------------------------------------
-- BODY string_comparison --
----------------------------------------------------------------------
-- This file contains the implementation part of the string comparison
-- package.
PACKAGE BODY string_comparison IS
-- CONTENTS
-- --------
-- Type declarations and simple functions
-- Internal Load operations
-- Exported load operations
-- Internal routines for transscribing numbers
-- Exported transscription operations
-- Internal comparison procedures
-- Exportered string comparators
-- The transscription table
-- The translation of a character is a string. This is for characters
-- like the AE ligature. Also useful is you want "0" = "zero".
TYPE Transscript_entry(Length : positive) IS
RECORD
Alphabetic : Natural_string(1..Length) := (OTHERS => 0);
Accent : Natural_string(1..Length) := (OTHERS => 0);
Case_variant : boolean := false;
END RECORD;
TYPE Entry_ptr IS ACCESS Transscript_entry;
-- Pointer to allow different sizes
-- The index in the table is the ordinal number. Ada's character type is
-- limited to 127.
Char_table : ARRAY (0..255) OF Entry_ptr := (OTHERS => NULL);
-- Other types
-- This type is for internal comparison functions
TYPE Relation_type IS (Less_than, Equal, Greater_than);
-- Range for the number characters
SUBTYPE Numbers IS integer RANGE character'pos('0')..character'pos('9');
-- Variables
-- Case significance
Case_significant : boolean := true;
-- Last used codes
Last_alpha_code : integer := 0;
Last_accent_code : integer := 0;
-- When storing an alphabetic we increment Last_alpha_code, when loading
-- a accent variant we increment Last_accent_code.
-- Simple functions
FUNCTION Is_letter(ch : character) RETURN boolean IS
BEGIN
RETURN Char_table(character'pos(ch)).Length > 0;
END;
-- Set case significance for the double-case load operations
PROCEDURE Set_case_significance(Flag : boolean) IS
BEGIN
Case_significant := Flag;
END;
-- Internal Load operations
-- These take integer parametes. The exported routines call these.
-- We're having integer to avoid problems with characters over 127.
PROCEDURE Load_alphabetic(ch : integer) IS
-- Load ch in the table as a one without any Accent part. If ch is already
-- defined, raise Already defined.
BEGIN
IF Char_table(ch) /= NULL THEN
RAISE Already_defined;
END IF;
Char_table(ch) := NEW Transscript_entry(1);
Last_alpha_code := Last_alpha_code + 1;
Char_table(ch).Alphabetic(1) := Last_alpha_code;
END Load_alphabetic;
PROCEDURE Load_variant(ch : IN integer;
Equ_ch : IN integer;
Equ_kind : IN Equivalence_kind) IS
-- Load ch as an variant of Equ_ch. Equ_ch must be defined or else
-- we raise Undefined_equivalent.
BEGIN
IF Char_table(ch) /= NULL THEN
RAISE Already_defined;
END IF;
IF Char_table(Equ_ch) = NULL THEN
RAISE Undefined_equivalent;
END IF;
Char_table(ch) := NEW Transscript_entry(Char_table(Equ_ch).Length);
Char_table(ch).Alphabetic := Char_table(Equ_ch).Alphabetic;
Char_table(ch).Accent := Char_table(Equ_ch).Accent;
Char_table(ch).Case_variant := Char_table(Equ_ch).Case_variant;
-- Actually: Char_table(ch).all := Char_table(Equ_ch).all;
-- Alas, Verdix Ada can't handle this properly
CASE Equ_kind IS
WHEN Exact => NULL;
WHEN Case_diff => Char_table(ch).Case_variant := true;
WHEN Accented => Last_accent_code := Last_accent_code + 1;
Char_table(ch).Accent(1) := Last_accent_code;
END CASE;
END Load_variant;
PROCEDURE Load_variant(ch : IN integer;
Equ_str : IN Natural_string) IS
-- Load ch as an accented letter (digraph) of Equ_str. If not all
-- characters in Equ_str are deifined, raise Undefined_equivalent.
BEGIN
IF Char_table(ch) /= NULL THEN
RAISE Already_defined;
END IF;
FOR i IN Equ_str'range LOOP
IF Char_table(Equ_str(i)) = NULL THEN
RAISE Undefined_equivalent;
END IF;
END LOOP;
Char_table(ch) := NEW Transscript_entry(Equ_str'length);
FOR i IN Equ_str'range LOOP
Char_table(ch).Alphabetic(i) := Char_table(Equ_str(i)).Alphabetic(1);
Last_accent_code := Last_accent_code + 1;
Char_table(ch).Accent(i) := Last_accent_code;
END LOOP;
END Load_variant;
-- The exported load operations
PROCEDURE Load_alphabetic(ch : IN character) IS
BEGIN
Load_alphabetic(character'pos(ch));
END Load_alphabetic;
PROCEDURE Load_variant(ch : IN character;
Equ_ch : IN character;
Equ_kind : IN Equivalence_kind) IS
BEGIN
Load_variant(character'pos(ch), character'pos(Equ_ch), Equ_kind);
END Load_variant;
PROCEDURE Load_variant(ch : IN character;
Equ_str : IN string) IS
Equ_int : Natural_string(Equ_str'range);
BEGIN
FOR i IN Equ_str'range LOOP
Equ_int(i) := character'pos(Equ_str(i));
END LOOP;
Load_variant(character'pos(ch), Equ_int);
END Load_variant;
-- Exported double-case load operations.
PROCEDURE Alpha_both_cases(ch : IN character) IS
Int_ch : integer := character'pos(ch);
BEGIN
Load_alphabetic(Int_ch);
IF Case_significant THEN
Load_variant(Int_ch + 32, Int_ch, Case_diff);
ELSE
Load_variant(Int_ch + 32, Int_ch, Exact);
END IF;
END Alpha_both_cases;
PROCEDURE Variant_both_cases(ch : IN character;
Equ_ch : IN character) IS
Int_ch : integer := character'pos(ch);
BEGIN
Load_variant(Int_ch, character'pos(Equ_ch), Accented);
IF Case_significant THEN
Load_variant(Int_ch + 32, Int_ch, Case_diff);
ELSE
Load_variant(Int_ch + 32, Int_ch, Exact);
END IF;
END Variant_both_cases;
PROCEDURE Variant_both_cases(ch : IN character;
Equ_str : IN string) IS
Int_ch : integer := character'pos(ch);
BEGIN
Load_variant(ch, Equ_str);
IF Case_significant THEN
Load_variant(Int_ch + 32, Int_ch, Case_diff);
ELSE
Load_variant(Int_ch + 32, Int_ch, Exact);
END IF;
END Variant_both_cases;
-- Internal procedure for transscribing numbers
PROCEDURE Get_number(Str : IN string;
Str_ix : IN OUT integer;
Number : OUT integer) IS
-- Assume Str(Str_ix) is a number. Read as long there are numbers.
-- Leave Str_ix at the last number character.
No_in_str : natural := 0;
ch : integer := character'pos(Str(Str_ix));
BEGIN
WHILE ch IN Numbers LOOP
No_in_str := 10 * No_in_str + ch - Numbers'first;
IF Str_ix + 1 IN Str'range THEN
Str_ix := Str_ix + 1;
ch := character'pos(Str(Str_ix));
ELSE
ch := 0;
END IF;
END LOOP;
Number := No_in_str;
EXCEPTION
WHEN Numeric_error => RAISE Transscription_error;
END;
-- Exported transscription operations
PROCEDURE Transscribe(ch : IN character;
Trans_str : OUT Transscripted_string) IS
BEGIN
Transscribe( (1 => ch), Trans_str);
END Transscribe;
PROCEDURE Transscribe(Str : IN string;
Trans_str : OUT Transscripted_string) IS
-- Transscribe Str using the table. If the transscription does
-- not fit into the out parameter, raise Transscription_error.
-- Characters in Str that are not defined are regarded as non-letters.
-- Non-letters are always stored at the their index in Str.
-- Numbers are stored specially.
ch : natural; -- Current character;
Tr_ix : natural := 0; -- Index in Trans_str except the non-letter part.
Str_ix : integer := Str'first; -- Index in Str and non-letter part.
No_in_str : natural;
BEGIN
WHILE Str_ix IN Str'range LOOP
ch := character'pos(Str(Str_ix));
IF Char_table(ch) /= NULL THEN
IF Tr_ix + Char_table(ch).Length > Trans_str.Max_length THEN
RAISE Transscription_error;
END IF;
IF ch NOT IN Numbers OR Char_table(ch).Accent(1) /= 0 THEN
FOR i IN 1..Char_table(ch).Length LOOP
Tr_ix := Tr_ix + 1;
Trans_str.Alphabetic(Tr_ix) := Char_table(ch).Alphabetic(i);
Trans_str.Case_part(Tr_ix) := Char_table(ch).Case_variant;
Trans_str.Accents(Tr_ix) := Char_table(ch).Accent(i);
END LOOP;
ELSE
Get_number(Str, Str_ix, No_in_str);
Tr_ix := Tr_ix + 1;
Trans_str.Alphabetic(Tr_ix) := 1000 + No_in_str;
END IF;
ELSE
IF Str_ix > Trans_str.Max_length THEN
RAISE Transscription_error;
END IF;
Trans_str.Non_letters(Str_ix) := ch;
Trans_str.Non_letter_length := Str_ix;
END IF;
Str_ix := Str_ix + 1;
END LOOP;
Trans_str.Length := Tr_ix;
END Transscribe;
-- Internal comparison routines
FUNCTION Relation(Left, Right : Natural_string) RETURN Relation_type IS
-- This function is more os less obsolete. "<" etc should do the job.
-- Verdix Ada can't this on integer arrays, unfortunately.
i : positive := 1;
Bug : EXCEPTION; -- Should not occur
BEGIN
WHILE (i <= Left'last AND i <= Right'last) AND THEN
Left(i) = Right(i) LOOP
i := i + 1;
END LOOP;
IF i > Left'last AND i > Right'last THEN
RETURN Equal;
ELSIF i > Left'last THEN
RETURN Less_than;
ELSIF i > Right'last THEN
RETURN Greater_than;
ELSIF Left(i) < Right(i) THEN
RETURN Less_than;
ELSIF Left(i) > Right(i) THEN
RETURN Greater_than;
ELSE
RAISE Bug; -- This should not occur.
END IF;
END Relation;
FUNCTION Relation(Left, Right : Transscripted_string) RETURN Relation_type IS
-- Compare the parts in order. Continue as long as there is unequallity.
Rel : Relation_type;
BEGIN
Rel := Relation(Left.Alphabetic(1..Left.Length),
Right.Alphabetic(1..Right.Length));
IF Rel /= Equal THEN
RETURN Rel;
END IF;
Rel := Relation(Left.Accents(1..Left.Length),
Right.Accents(1..Right.Length));
IF Rel /= Equal THEN
RETURN Rel;
END IF;
Rel := Relation(Left.Non_letters(1..Left.Non_letter_length),
Right.Non_letters(1..Right.Non_letter_length));
IF Rel /= Equal THEN
RETURN Rel;
END IF;
IF Left.Case_part(1..Left.Length) <
Right.Case_part(1..Right.Length) THEN
RETURN Less_than;
ELSIF Left.Case_part(1..Left.Length) >
Right.Case_part(1..Right.Length) THEN
RETURN Greater_than;
ELSE
RETURN Equal;
END IF;
END Relation;
-- Exported comparison operators
FUNCTION "<=" (Left, Right : Transscripted_string) RETURN boolean IS
BEGIN
RETURN Relation(Left, Right) /= Greater_than;
END;
FUNCTION "<" (Left, Right : Transscripted_string) RETURN boolean IS
BEGIN
RETURN Relation(Left, Right) = Less_than;
END;
FUNCTION ">=" (Left, Right : Transscripted_string) RETURN boolean IS
BEGIN
RETURN Relation(Left, Right) /= Less_than;
END;
FUNCTION ">" (Left, Right : Transscripted_string) RETURN boolean IS
BEGIN
RETURN Relation(Left, Right) = Greater_than;
END;
END string_comparison;
SHAR_EOF
fi
if test -f 'strcomps.a'
then
echo shar: "will not over-write existing file 'strcomps.a'"
else
cat << \SHAR_EOF > 'strcomps.a'
----------------------------------------------------------------------
-- SPECIFCATION String_comparison --
----------------------------------------------------------------------
-- This package provides operations for comparing strings according to
-- a user-defined scheme.
-- The package contains operations for load an internal coding table,
-- routines for coding strings and for comparing coded strings.
PACKAGE String_comparison IS
-- Load a character as the next in the primary colltating sequence
PROCEDURE Load_alphabetic(ch : IN character);
PROCEDURE Alpha_both_cases(ch : IN character);
-- Load a variant of a character in the main sequence, on accent
-- level, on case level or as exactly the same.
TYPE Equivalence_kind IS (Exact, Case_diff, Accented);
PROCEDURE Load_variant(ch : IN character;
Equ_ch : IN character;
Equ_kind : IN Equivalence_kind);
-- The three below always load on accent level.
PROCEDURE Load_variant(ch : IN character;
Equ_str : IN string);
PROCEDURE Variant_both_cases(ch : IN character;
Equ_ch : IN character);
PROCEDURE Variant_both_cases(ch : IN character;
Equ_str : IN string);
-- Exceptions that can be raised by the load operations
Undefined_equivalent : EXCEPTION;
Already_defined : EXCEPTION;
-- Change case significance when loading both cases. Default is off.
PROCEDURE Set_case_significance(Flag : boolean);
-- Transscript type and coding operations
TYPE Transscripted_string(Max_length : natural) IS PRIVATE;
PROCEDURE Transscribe(ch : IN character;
Trans_str : OUT Transscripted_string);
PROCEDURE Transscribe(Str : IN string;
Trans_str : OUT Transscripted_string);
Transscription_error : EXCEPTION;
-- Comparison operators
FUNCTION "<=" (Left, Right : Transscripted_string) RETURN boolean;
FUNCTION "<" (Left, Right : Transscripted_string) RETURN boolean;
FUNCTION ">=" (Left, Right : Transscripted_string) RETURN boolean;
FUNCTION ">" (Left, Right : Transscripted_string) RETURN boolean;
-- Others
FUNCTION Is_letter(ch : character) RETURN boolean;
PRIVATE
TYPE Natural_string IS ARRAY(integer RANGE <>) OF natural;
TYPE Boolean_string IS ARRAY(integer RANGE <>) OF boolean;
TYPE Transscripted_string(Max_length : natural) IS
RECORD
Length : natural := 0;
Alphabetic : Natural_string(1..Max_length) := (OTHERS => 0);
Accents : Natural_string(1..Max_length) := (OTHERS => 0);
Case_part : Boolean_string(1..Max_length) := (OTHERS => false);
Non_letter_length : natural := 0;
Non_letters : Natural_string(1..Max_length) := (OTHERS => 256);
END RECORD;
END String_comparison;
SHAR_EOF
fi
exit 0
# End of shell archive