Differences between Stopword Files - Full-Text Retrieval (FTR) - Help

Full-Text Retrieval (FTR) Help

Language
English
Product
Full-Text Retrieval (FTR)
Search by Category
Help

The following chart shows the effects of the different character classes available in the delivered stopword files. The input character string is followed by the terms that will appear in the collection after indexing has occurred with each stopword file in place.

Stopword files are supported in Directa, but not SmartPlant Foundation.

String

fultext.stp

cc_all.stp

cc_join.stp

(in parens)

IN

PARENS

(IN

PARENS)

IN

PARENS

ABC123

ABC

123

ABC123

ABC123

It is x12- 328.

IT

IS

X

12

328

IT

IS

X12-328.

IT

IS

X12-328

Cherokee, N.C.

CHEROKEE

N

C

CHEROKEE,

N.

C.

CHEROKEE

N.C

08/03/94

08

03

94

08/03/94

08/03/94

401(k)

401

K

401(K)

401(K

Hi there!

HI

THERE

HI

THERE!

HI

THERE

$1,000,000.00

1,000,000.00

$1,000,000.00

1,000,000.00

abc-123

ABC

123

ABC-123

ABC-123

Subject: My test.

SUBJECT

MY

TEST

SUBJECT:

MY

TEST.

SUBJECT

MY

TEST

/usr/tmp/dog.txt

USR

TMP

DOG

TXT

/USR/TMP/DOG.TXT

USR/TMP/DOG.TXT

x==1;

X

1

X==1;

X

The fultext.stp stopword file breaks all strings except numbers when a punctuation character occurs. It also breaks alphanumeric terms when a digit is encountered. The cc_all.stp file breaks terms only when a space is encountered. This causes all punctuation (including beginning and ending) to become part of the term. The cc_join.stp file breaks terms on spaces, but ignores beginning and ending punctuation.

The delivered stopword files can be used as a basis for forming almost any character class definition of your own.