Opened 4 years ago

Last modified 11 months ago

#269 in_review defect

Odd parenthesis use confuses... something.

Reported by: jblayloc Owned by: simko
Priority: major Milestone:
Component: WebSearch Version:
Keywords: refactor Invenio INSPIRE Syntax NEWS Cc:

Description

find (t supergravity or result ) and t quark

Note that space after 'result'. Without it, the search works. Which means it gives no results. But with that space, then it complains that there are mismatched parentheses and breaks.

This is the behavior in my #131 version of search_engine_query_parser.

But the version of seqp in master breaks just as much, and regardless of whether there's a space.

Annette noticed that (on -prod):
fin (a hawking) gives nothing
fin (a hawking or ellis) breaks too (only 3 results)

... but fin (a hawking or a ellis) looks OK

So there's something fascinatingly bizarre going on.

Change History (18)

comment:1 Changed 4 years ago by jblayloc

A simpler query is (ellis ), which breaks in Invenio syntax.

This is not in the query parser; if you call Parser.parse_query('(ellis )') it produces exactly the right output. So the problem is somewhere else in the stack, probably in search_engine.py.

I haven't yet checked whether the other paren related breakages that Annette noticed are in the parser or not.

comment:2 Changed 4 years ago by jblayloc

Annette's other examples also work when you call Parser.parse_query(query_string), so this is also being broken somewhere else in the stack.

comment:3 Changed 4 years ago by jblayloc

  • Summary changed from transitive title searches sometimes to Odd parenthesis use confuses... something.

comment:4 Changed 4 years ago by jblayloc

Also note from the user tests with Lance Dixon:

He found it natural to use syntax like author:dixon -(collaboration:D0) title:QCD which is sort of odd, but I understand his motivation... anyway the point is that this failed, due to the parens in inspirebeta. I did not check that this worked in tislxn1

comment:5 Changed 4 years ago by jblayloc

New version pushed per conversation on inspire-dev. Branch name is 296-WebSearch_SPIRES_parser_uses_ISO_dates.

comment:6 Changed 3 years ago by jblayloc

Some searches are fixed, but some are still giving problems. Notably:

  1. fin (a hawking)
  2. (ellis )
  3. author:dixon -(collaboration:D0) title:QCD

cases 1 and 3 are essentially similar and I think the problem is in search_engine.py in search_pattern() somewhere.

case 2 is just weird.

comment:7 Changed 3 years ago by jblayloc

  • Owner changed from jblayloc to valkyrie
  • Status changed from new to assigned

Valkyrie, I'm hoping that the work you've been doing recently with parenthesized parser will negate the need for this, in which case I invite you to worksforme it and move on. This is also not very high priority, but would be a natural follow-on to finishing the SQPP plugin.

comment:8 Changed 3 years ago by valkyrie

  • Status changed from assigned to in_merge

The cases with (a hawking ) etc., were failing due to the empty string between hawking and ) being expanded as a part of the name. I added some scrubbers that will remove these unnecessary spaces and also made the author name regexp a little smarter. This is available in branch 269-parentheses_quirks on my public AFS repo at SLAC.

comment:9 Changed 3 years ago by valkyrie

  • Status changed from in_merge to assigned

I am setting this back to fail because the UI still captures the searches incorrectly; this will be resolved with ticket #453 when everything is directed through the parenthesised parser. The unit tests for these cases now pass, but the integration step will be necessary for the changes to be visible to users.

comment:10 Changed 3 years ago by valkyrie

Additional info on this:

The SPIRES part parses correctly into (author:hawking) but there is a special case currently in search_engine that prevents search terms inside parentheses that do not contain spaces (this special case is for search terms like U(1) and SL(2,Z)) from going through the parenthesised parser. I will be working on sending all terms through the parenthesised parser in the future, since it correctly parses this.

comment:11 Changed 3 years ago by valkyrie

  • Keywords refactor added

comment:12 Changed 3 years ago by valkyrie

  • Status changed from assigned to in_merge

This was fixed alongside ticket #453 and is available on github as branch 269-453-direct_through_parend_parser. See notes on ticket #453 about elegance, etc.

Version 0, edited 3 years ago by valkyrie (next)

comment:13 Changed 3 years ago by valkyrie

Now in AFS

comment:14 Changed 3 years ago by jblayloc

Held up by inclusion of #534; Tibor to merge base commit and time #534, and then we can discuss any ongoing issues there.

comment:15 Changed 3 years ago by valkyrie

  • Status changed from in_merge to assigned

comment:16 Changed 3 years ago by valkyrie

  • Status changed from assigned to in_work

comment:17 Changed 3 years ago by valkyrie

  • Status changed from in_work to in_merge

269-453-parend_parser_improvements

--- see ticket #453

comment:18 Changed 11 months ago by simko

  • Owner changed from valkyrie to simko
  • Status changed from in_merge to in_review
Note: See TracTickets for help on using tickets.