Opened 22 months ago
WebSearch: use index-time word breaking information during seach time as well
|Reported by:||simko||Owned by:|
In demo site, when searching for "spectrum.", one gets a warning phrase:
No exact match found for spectrum., using spectrum instead...
followed by two hits.
Considering that dot is stripped away from indexed terms at the index time, see CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS and CFG_BIBINDEX_CHARS_PUNCTUATION and friends, it should not be necessary for the search engine to look for the dotted version at the search time.
The purpose of this ticket is to take advantage of CFG_BIBINDEX_CHARS_PUNCTUATION and friends also during search time. I.e. if a character is stripped away during indexing-time, then strip it away also during search-time, when looking for words. (Not for phrases or regexps.) We can amend search_unit_in_bibwords to this effect so that incoming terms to look for will be washed similarly as during the indexing process.
Note that this may concern stemming and stopwords and such, but we have another ticket to take care of centralising indexing configurations, so further improvements could be dealt with there. See ticket:852.