home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Canadian Encyclopedia 2000 (Student Edition)
/
TheCanadianEncyclopedia2000StudentEdition-Win95Mac-Disc2of2.bin
/
data
/
books
/
0-7715-1984-2
/
reference
/
veritystyles
/
style.prm
< prev
next >
Wrap
Text File
|
1999-01-19
|
4KB
|
78 lines
# $Id: style.prm,v 1.7 1996/09/15 00:36:13 bjohnson Exp $
# Copyright (C) 1987-1995 Verity, Inc.
#
# style.prm - collection schema parameters
#
# This file is used to enable/disable index schema features through
# macro definitions similar to those allowed by the C preprocesser.
# This file is included in other style files using $include so
# that the selected features are propogated to the schemas of all
# tables in the index. Refer to the "Using the style.prm File"
# chapter in the Collection Buiding Guide for more information.
# -----------------------------------------------------------------
# The IDX-CONFIG parameter defines the storage format used to
# encode the word positions in the index. WCT (Word Count) format
# is a compact format, storing the ordinal counting position of the
# word from the beginning of the document. PSW (Paragraph, Sentence,
# Word) format takes approximately 15-20% more disk space, but
# stores semantically accurate paragraph and sentence boundaries.
# Optionally, Many may be specified with either WCT or PSW to
# improve the accuracy of the <MANY> operator at the expense of
# diskspace and search performance.
# This example enbles Word Count word position format (the default).
$define IDX-CONFIG "WCT"
# This example turns on Paragraph/Sentence/Word word position format.
# It also enables the <MANY> operator accuracy improvement.
#$define IDX-CONFIG "PSW Many"
# -----------------------------------------------------------------
# The IDXOPTS parameters define which index options are applied to
# the various index token tables. The following index options are
# supported for each: Stemdex enables an index by the stem of each
# word. Casedex stores all case variants of a word separately, so
# one can search for case sensitive terms such as "Jobs", "Apple",
# and "NeXT" more easily. Soundex stores phonetic representations
# of the word, using AT&T's standard soundex algorithm. The
# application may also store 1-4 bytes of application-specific
# data with each word instance, in the form of Location data and/or
# Qualify Instance data. These options are specified separately
# for each token table: word, zone, and zone attribute.
$define WORD-IDXOPTS "Stemdex Casedex Soundex"
$define ZONE-IDXOPTS "Stemdex Casedex Soundex"
$define ATTR-IDXOPTS ""
# -----------------------------------------------------------------
# Clustering is enabled by uncommenting the DOC-FEATURES line.
# This stores a feature vector for each document in the
# Documents table. These features are used for Clustering
# results and fast Query-by-Example. See the discussions on
# Clustering in the Collection Building Guide for more information.
#$define DOC-FEATURES "TF"
# -----------------------------------------------------------------
# Document Summarization is enabled by uncommenting one of
# the DOC-SUMMARIES lines below. The summarization data is
# stored in the documents table so that it might easily be
# shown when displaying the results of a search.
# See the discussions on Document Summarization in the
# Collection Building Guide for more information.
# The example below stores the best three sentences of
# the document, but not more than 500 bytes.
#$define DOC-SUMMARIES "XS MaxSents 3 MaxBytes 500"
# The example below stores the first four sentences of
# the document, but not more than 500 bytes.
#$define DOC-SUMMARIES "LS MaxSents 4 MaxBytes 500"
# The example below stores the first 150 bytes of
# the document, with whitespace compressed.
#$define DOC-SUMMARIES "LB MaxBytes 150"