Opened 4 years ago

Last modified 4 years ago

#22 new defect

BibIndex (fulltext): Demo fulltext searching on inspire-hep-dev doesn't find 'rattazzon'

Reported by: tbrooks Owned by:
Priority: minor Milestone:
Component: BibIndex Version:
Keywords: Cc:

Description

Suprisingly:

astro-ph/0607086

does not find rattazzon even though it is in the fulltext (see snippet for "honor theorist" search in fulltext)

This may be due to its enclosure in in the text...

Change History (1)

comment:1 Changed 4 years ago by simko

The problem seems to be due to non-ASCII UTF-8 quotes.
If one searches for ‘rattazzon’, one finds it:

http://inspire-hep-dev.cern.ch/search?p=‘rattazzon’&f=fulltext

The current word breaking sequencer handles only ASCII quotes.
That is, ` not ‘, and ' not ’.

We should add all tho common UTF-8 characters of that kind
to the config.

Note: See TracTickets for help on using tickets.