Opened 4 years ago

Closed 3 years ago

#261 closed defect (fixed)

Parentheses in search don't work with alternate search terms

Reported by: tbrooks Owned by: tbrooks
Priority: critical Milestone:
Component: WebSearch Version:
Keywords: INSPIRE Syntax Oct Cc:

Description

Consider the search

author:"dixon, lance j" or (author:"dixon, la j" and year:0->2011)

which is, admittedly a contrived example. Note that it must be equivalent to

author:"dixon, lance j" or author:"dixon, la j"

given that we have no papers with years outside that range.

Yet, on inspirebeta, the first gives 255, while the second 121.

This is because author:"dixon, la j" triggers the:

No exact match found for dixon, la j, using dixon la j instead...

search modification inline when it gets no results. This is just wrong, and is very dangerous with SPIRES-style author searches.

Similar annoyance is caused by the nearest terms box, though that doesn't generally change the search result. (e.g.:

http://inspirebeta.net/search?ln=en&p=find+%28t+supergtre+and+a+brooks%29+or+a+witten&f=&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb

Where there is a bunch of noise (again exacerbated with SPIRES-style author expansion) that might be helpful, but might just be confusing.

I'm submitting a trivial patch that fixes this by setting ap=0 and nearest_terms_box=False when search_pattern_parenthesized calls search_pattern on the subclauses.

Note that this means that we no longer give any "help" to users if a part _or all_ of their parenthesized query gives no results. This seems reasonable to me, but this will extend to authors in the SPIRES syntax, which might be less reasonable.

A better long term solution might be to refactor all of this so that, at the end of the search, one can check on whether 0 results were found, and give suggestions then, rather than changing/suggesting things in the middle of a compound search, with a possible enhancement to let people know, in a subtler way, if one of their subclauses failed to return results.


Change History (12)

comment:1 Changed 4 years ago by tbrooks

  • Status changed from new to in_merge

Please take my branch:

261-parens-and-alternate-search

for the fix described above

Version 0, edited 4 years ago by tbrooks (next)

comment:2 follow-up: Changed 4 years ago by jblayloc

Your point about checking for number of results and making suggestions at the end is well taken. I agree that this is the right thing to do.

But the specific set of search discrepancies you mention are fixed in a branch with master+131...

comment:3 in reply to: ↑ 2 Changed 4 years ago by tbrooks

Replying to jblayloc:

But the specific set of search discrepancies you mention are fixed in a branch with master+131...

Indeed they are, however the similar search terms box fires anytime we do a search like:

find a brooks, travis

since there are a lot of "ors" in that search expansion of the first name.

This is _very_ bad because the "no results" pushes the actual search result off of the page and a user could easily think that there are no results for the search at all.

At the very least I'd take my patch below (wrapped in CFG_INSPIRE_SITE == 1) and deploy it soon, to fix this immediate problem.

Another option is to move the "no results" to the bottom, below the search results that were found.

comment:4 Changed 4 years ago by jblayloc

Ok, I read your patch. I agree with Tibor that this isn't ideal. I also agree with you that the current behavior for
find a brooks, travis
is crap.

So I reverted your change, but set a gate on CFG_INSPIRE_SITE so that ap would be set to 0 and display_nearest_search_box would be set to False. I also made the a commit message that "Addresses" rather than "Fixes" this ticket.

I think we could deploy this on PROD while I work on building the correct fix, which I also think is to collect the suggestions and display them once, at the end of the processing loop. I can start working on this tonight or tomorrow a.m.

comment:5 Changed 4 years ago by tbrooks

  • Keywords INSPIRE Syntax Oct added

comment:6 Changed 4 years ago by simko

1) WRT current behaviour of find a brooks, travis, this query gets expanded by the SPIRES syntax compatibility parser into:

['+', 'author:"brooks, travis*"', '|', 'exactauthor:"brooks, t"', '|',
'exactauthor:"brooks, tr"', '|', 'exactauthor:"brooks, tra"', '|',
'exactauthor:"brooks, trav"', '|', 'exactauthor:"brooks, travi"']

I wonder how common is the need to iterate through all the various travi-trav-tra-tr forms; maybe a simpler expansion like:

author:"brooks, travis*" OR exactauthor:"brooks, t"

would cover the most common use cases well already?

If we manage to simplify the query expansion, it would help in removing the most weird stuff like brooks, tra as if by passing. Although it would not solve the longer-term task fully, of course, e.g. the internal parsing logic may still get exposed for unsuccessful queries like find a brooks, zelda.

2) WRT CFG_INSPIRE_SITE changes, I'll wait for the final updates then, but please check how things work for Invenio syntax vs SPIRES syntax too, because you are addressing search_pattern_parenthesised() globally, which is called also for Invenio syntax. So we may get into troubles for queries like reportnumber:cern/th if they happen to live inside parens.

In longer term, if SPIRES syntax parser does query expansion in the pre-search times, while the Invenio syntax parser leaves this for the later post-search times, so to speak, then indeed the ap behaviour may be harmful in one case, while useful in the other case, and so it would seem necessary to distinguish the two for the whole duration of the search, if we want to offer both alternatives at run time in parallel. This could be done for example by enriching basic search units (opft) to include an information whether ap is wanted for a particular unit or not. If we set ap only globally, then we will loose one way or another. If we set ap differently for paren vs non-paren expressions, we may get into (foo) vs foo differences. Moreover, on a similar note, people may get confused by SPIRES mode vs Invenio mode differences, find a brooks, travis vs brooks, travis in author field.

comment:7 Changed 4 years ago by jblayloc

I've cleaned up the temporary fix to 261 a little more so the SPIRES syntax converter can report whether it's actually responsive to a given search or not; this makes knowing whether the search had started life as a SPIRES syntax query easy. It's pushed to my repository in branch 261-parens-and-alternate-search.

(Travis you should delete your copy (which after all is a proper subset of mine) to avoid confusion.)

comment:8 Changed 4 years ago by simko

Thanks. Since the second commit fully reverted the first "temporary" commit, I preferred to squash them together, keeping only Joe's one, so that we don't have to wonder about the commit history if we happen to have to look at the code in some time from now. I hope this is OK with you.

(It is kind of like when I'm merging and there are some ultra simple issues such as today's syntax ones (ticket:296#comment:4) or datemodified ones (ticket:283#comment:11); then I'm often doing in-situ fixes in the name of the author during merge so that the repo history is clean.)

comment:9 Changed 4 years ago by Joe Blaylock <jrbl@…>

In [ef470a6aec5f619319d29819dbc0e2edbbaa76b7]:

WebSearch: less noisy `ap' for SPIRES syntax

  • SPIRES syntax queries are pre-analyzed and may lead to many internal Boolean OR queries for various search units that were not present in the original query. Post-search combination of queries for these units at the level of hitsets with alternate or no results are therefore not suitable for these situations. ap=0 and display_nearest_terms_box=False is therefore set to work around these issues.
  • Simplifies semantics around detecting presence of SPIRES syntax, improves kwalitee slightly. Also adds FIXME so we don't lose history.
  • This commit addresses #261.

comment:10 Changed 4 years ago by simko

  • Status changed from in_merge to assigned

Deployed on TEST and PROD; putting the ticket back to the assigned status for the long-term solution. Though we may perhaps want to close it here and create a new proper ticket for the long-term solution. (e.g. this ticket's subject originated mostly about paren related issues anyway)

comment:11 Changed 3 years ago by tbrooks

  • Status changed from assigned to in_merge

Created new ticket, #327

comment:12 Changed 3 years ago by tbrooks

  • Resolution set to fixed
  • Status changed from in_merge to closed
Note: See TracTickets for help on using tickets.