World Book - Encyclopedia of Science

home *** CD-ROM | disk | FTP | other *** search

/ World Book - Encyclopedia of Science / WBScience.iso / DATA / books / 20000d04 / query / style / style.PRM < prev next >

Wrap

Text File | 2001-05-08 | 4KB | 78 lines

# $Id: style.prm,v 1.7 1996/09/15 00:36:13 bjohnson Exp $ # Copyright (C) 1987-1995 Verity, Inc. # # style.prm - collection schema parameters # # This file is used to enable/disable index schema features through # macro definitions similar to those allowed by the C preprocesser. # This file is included in other style files using $include so # that the selected features are propogated to the schemas of all # tables in the index. Refer to the "Using the style.prm File" # chapter in the Collection Buiding Guide for more information. # ----------------------------------------------------------------- # The IDX-CONFIG parameter defines the storage format used to # encode the word positions in the index. WCT (Word Count) format # is a compact format, storing the ordinal counting position of the # word from the beginning of the document. PSW (Paragraph, Sentence, # Word) format takes approximately 15-20% more disk space, but # stores semantically accurate paragraph and sentence boundaries. # Optionally, Many may be specified with either WCT or PSW to # improve the accuracy of the <MANY> operator at the expense of # diskspace and search performance. # This example enbles Word Count word position format (the default). $define IDX-CONFIG "WCT" # This example turns on Paragraph/Sentence/Word word position format. # It also enables the <MANY> operator accuracy improvement. #$define IDX-CONFIG "PSW Many" # ----------------------------------------------------------------- # The IDXOPTS parameters define which index options are applied to # the various index token tables. The following index options are # supported for each: Stemdex enables an index by the stem of each # word. Casedex stores all case variants of a word separately, so # one can search for case sensitive terms such as "Jobs", "Apple", # and "NeXT" more easily. Soundex stores phonetic representations # of the word, using AT&T's standard soundex algorithm. The # application may also store 1-4 bytes of application-specific # data with each word instance, in the form of Location data and/or # Qualify Instance data. These options are specified separately # for each token table: word, zone, and zone attribute. $define WORD-IDXOPTS "Stemdex Casedex Soundex" $define ZONE-IDXOPTS "Stemdex Casedex Soundex" $define ATTR-IDXOPTS "" # ----------------------------------------------------------------- # Clustering is enabled by uncommenting the DOC-FEATURES line. # This stores a feature vector for each document in the # Documents table. These features are used for Clustering # results and fast Query-by-Example. See the discussions on # Clustering in the Collection Building Guide for more information. #$define DOC-FEATURES "TF" # ----------------------------------------------------------------- # Document Summarization is enabled by uncommenting one of # the DOC-SUMMARIES lines below. The summarization data is # stored in the documents table so that it might easily be # shown when displaying the results of a search. # See the discussions on Document Summarization in the # Collection Building Guide for more information. # The example below stores the best three sentences of # the document, but not more than 500 bytes. #$define DOC-SUMMARIES "XS MaxSents 3 MaxBytes 500" # The example below stores the first four sentences of # the document, but not more than 500 bytes. #$define DOC-SUMMARIES "LS MaxSents 4 MaxBytes 500" # The example below stores the first 150 bytes of # the document, with whitespace compressed. #$define DOC-SUMMARIES "LB MaxBytes 150"