Opened 3 years ago

Last modified 5 months ago

#804 new defect

WebSearch: prevent infinite synonyms lookup

Reported by: jblayloc Owned by: jcaffaro
Priority: major Milestone: v1.2
Component: WebSearch Version: maint-1.1
Keywords: DEPLOYED INSPIRE Cc:

Description

The synonym getter in search_unit calls search_unit with the synonyms. This means that if you have a circumstance where getting the synonyms of a synonym transforms a synonym into the original term, you end up doing infinite recursion and blowing the stack.

So I added a flag to search_unit that lets calls to search_unit for synonyms not do synonym lookups.

Attachments (1)

0001-WebSearch-fix-infinite-recursion-in-search_unit.patch (2.5 KB) - added by jblayloc 3 years ago.
The patch that fixes this bug

Download all attachments as: .zip

Change History (11)

Changed 3 years ago by jblayloc

The patch that fixes this bug

comment:1 Changed 3 years ago by jblayloc

  • Status changed from new to in_merge
  • Summary changed from 500 Internal Server Errors on Journal Lookup to [PATCH] 500 Internal Server Errors on Journal Lookup

comment:2 Changed 12 months ago by jcaffaro

  • Summary changed from [PATCH] 500 Internal Server Errors on Journal Lookup to WebSearch: prevent infinite synonyms lookup

An issue with the provided solution is that synonym expansion will stop after one lookup. For eg. with the following knowledge base:

A->B
B->C

a search for A will search for either A or B but not C.

A branch that solves the above limitation is available in jerome/804-master-websearch-fix-synonyms-infinite-recursion

comment:3 Changed 11 months ago by simko

  • Owner set to simko
  • Status changed from in_merge to in_review

comment:4 Changed 8 months ago by simko

  • Milestone set to v1.2
  • Version set to maint-1.1

comment:5 Changed 8 months ago by simko

  • Status changed from in_review to in_integration

Thanks, merging into maint-1.1 as well.

comment:6 Changed 8 months ago by jcaffaro

  • Resolution set to fixed
  • Status changed from in_integration to closed

In ed6d4d54d29c1a791855fa245fa9a9cecc542c12/invenio:

WebSearch: fix infinite synonym lookup cases

  • Fixes infinite recursion when a knowledge base that is used for synonym lookup contains a cycle (A->B, B->A). Adds 'ignore_synonyms' parameter to search_unit() in order to control which synonyms have already been translated and should consequently be ignored. (closes #804)

Reviewed-by: Tibor Simko <tibor.simko@…>

comment:7 Changed 8 months ago by simko

  • Owner changed from simko to jcaffaro

comment:8 Changed 8 months ago by jcaffaro

In ed6d4d54d29c1a791855fa245fa9a9cecc542c12/invenio:

WebSearch: fix infinite synonym lookup cases

  • Fixes infinite recursion when a knowledge base that is used for synonym lookup contains a cycle (A->B, B->A). Adds 'ignore_synonyms' parameter to search_unit() in order to control which synonyms have already been translated and should consequently be ignored. (closes #804)

Reviewed-by: Tibor Simko <tibor.simko@…>

comment:9 Changed 8 months ago by jcaffaro

In ed6d4d54d29c1a791855fa245fa9a9cecc542c12/invenio:

WebSearch: fix infinite synonym lookup cases

  • Fixes infinite recursion when a knowledge base that is used for synonym lookup contains a cycle (A->B, B->A). Adds 'ignore_synonyms' parameter to search_unit() in order to control which synonyms have already been translated and should consequently be ignored. (closes #804)

Reviewed-by: Tibor Simko <tibor.simko@…>

comment:10 Changed 5 months ago by skaplun

  • Keywords INSPIRE added
  • Resolution fixed deleted
  • Status changed from closed to new

We seem to have a regression in INSPIRE:

2013-11-14 23:58:57 --> Unexpected error occurred: 'list' object is not callable.
2013-11-14 23:58:57 --> Traceback is:
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibtask.py", line 531, in task_init
2013-11-14 23:58:57 -->     ret = _task_run(task_run_fnc)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibtask.py", line 1067, in _task_run
2013-11-14 23:58:57 -->     if callable(task_run_fnc) and task_run_fnc():
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank.py", line 159, in task_run_core
2013-11-14 23:58:57 -->     func_object(key)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_tag_based_indexer.py", line 443, in citation
2013-11-14 23:58:57 -->     return bibrank_engine(run)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_tag_based_indexer.py", line 356, in bibrank_engine
2013-11-14 23:58:57 -->     func_object(rank_method_code, cfg_name, config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_tag_based_indexer.py", line 68, in citation_exec
2013-11-14 23:58:57 -->     dic, index_update_time = get_citation_weight(rank_method_code, config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 140, in get_citation_weight
2013-11-14 23:58:57 -->     weights = process_and_store(updated_recids, config, chunk_size)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 176, in process_and_store
2013-11-14 23:58:57 -->     cites, refs = process_chunk(chunk, config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 209, in process_chunk
2013-11-14 23:58:57 -->     config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 766, in ref_analyzer
2013-11-14 23:58:57 -->     config=config)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/bibrank_citation_indexer.py", line 90, in get_recids_matching_query
2013-11-14 23:58:57 -->     ret = search_pattern(p=p, f=f, m=m) & recids_cache(collections)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 2064, in search_pattern
2013-11-14 23:58:57 -->     basic_search_unit_hitset = search_unit(bsu_p, bsu_f, bsu_m, wl)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 2311, in search_unit
2013-11-14 23:58:57 -->     ignore_synonyms)
2013-11-14 23:58:57 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 2311, in search_unit
[...]
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 2290, in search_unit
2013-11-14 23:58:58 -->     tokenizer = get_field_tokenizer_type(f)
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 460, in get_field_tokenizer_type
2013-11-14 23:58:58 -->     field_tokenizer_cache.recreate_cache_if_needed()
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/data_cacher.py", line 76, in recreate_cache_if_needed
2013-11-14 23:58:58 -->     if self.timestamp_verifier() > self.timestamp:
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/search_engine.py", line 448, in timestamp_verifier
2013-11-14 23:58:58 -->     return get_table_update_time('idxINDEX')
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/dbquery.py", line 419, in get_table_update_time
2013-11-14 23:58:58 -->     run_on_slave=run_on_slave)
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/invenio/dbquery.py", line 256, in run_sql
2013-11-14 23:58:58 -->     rc = cur.execute(sql, param)
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 168, in execute
2013-11-14 23:58:58 -->     self.errorhandler(self, TypeError, m)
2013-11-14 23:58:58 -->   File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
2013-11-14 23:58:58 -->     raise errorclass, errorvalue
2013-11-14 23:58:58 --> Exiting.
Note: See TracTickets for help on using tickets.