Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#506 closed defect (fixed)

Provide PS, type and field code searching

Reported by: tbrooks Owned by: jblayloc
Priority: major Milestone:
Component: WebSearch Version:
Keywords: INSPIRE syntax Cc:

Description

Depends on #505

Once #505 is completed add kbs to the type code and field code abbreviations, as well as aliases in the SPIRES search syntax for

PS, SCL, type, TC -> 690C_a (via doctype.kb)

FC, field -> 65017 (via a classifications.kb)

Both kbs are published in Travis public inspire repo in the PS_FC_kbs branch

Additionally indexes should be made for both of these MARC codes.

After this is all done, we should looks at standardizing the coding across TC/Note/etc and FC/archive category/collection etc.

Change History (17)

comment:1 Changed 3 years ago by tbrooks

  • Owner set to valkyrie
  • Status changed from new to assigned

comment:2 Changed 3 years ago by valkyrie

  • Status changed from assigned to in_merge

I don't know how to build the indexes, but this is available in my public INSPIRE branch (/afs/slac.stanford.edu/public/groups/library/valkyrie-public-git/inspire-valkyrie.git/) as knowledge-bases.

comment:3 Changed 3 years ago by simko

  • Status changed from in_merge to assigned

1) You can make indexes by inserting proper configuration statements
to the top-level Makefile; see for example what I did earlier for
the "firstauthor" index. (INSPIRE repo, commit 2146ab19)

2) The journal index is already made, so you can test the journal
synonym searching (including volume and pages) even without creating
the other indexes. In the journal index synonym configuration, the
branch currently uses the massaging function leading_to_number, but
I think you should rather use leading_to_comma, because INSPIRE
convention for journal index is to separate journal,volume,page
values by commas. So with leading_to_number, journal searches
including volume and page would not work. For the two other indexes,
the exact massaging function seems appropriate.

3) For the two other classification/doctype indexes, we may perhaps
consider using the index-time synonyms instead of search-time
synonyms, especially if people are used to values like E from SPIRES
times.

4) But I'd like to clarify the terminology regarding
classification/doctype indexes first.

WRT "classification" KB, it generates values for 65017 field which is
called "subject" in cataloguing tools. How do we want to call the new
index in the user facing parts of INSPIRE, "classification" or
"subject" or "fc" or something else? Consider that people may be
typing and/or seeing query terms like classification:E or
classification:"Experiment-HEP", so we'd better choose something
nice. Maybe stick to "subject" like in cataloguing tools, maybe stick
to "fc" if we choose this to be the user-facing canonical index name
and not only an alias, etc.

WRT "doctype", there is a similar naming mismatch. Moreover, here the
word doctype has a very concrete meaning in Invenio, namely the type
of a document attached to a record. So it may be misleading to call
it that. (BTW, see also somewhat related filetype/doctype index issue
in ticket:473.)

5) When updating KB/index names, we may want to amend the following
description TALKTYPEDESC='Mapping of... something?' a bit. :)

6) You should also document in the commit log the weblinks/oalinks fix
done alongside the process. Ideally, we should perhaps separate this
fix into a commit of its own, since it is unrelated to synonym KBs.

comment:4 follow-up: Changed 3 years ago by valkyrie

  • Status changed from assigned to infoneeded

Ok, with those comments in mind, I renamed the doctype and classification indices to "media" and "subject", and I corrected all the other comments. I didn't separate out the weblinks/oalinks fix, although if you feel strongly about it I can.

I am having trouble testing this fix, since there doesn't seem to be any information in 690c. http://inspirebeta.net/search?ln=en&ln=en&p=690__c%3AReview&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb

?

Anyway, there are branches for both Invenio (for SPIRES syntax to translate the new index names correctly) and Inspire available in my afs repo as knowledge-bases.

Last edited 3 years ago by valkyrie (previous) (diff)

comment:5 in reply to: ↑ 4 Changed 3 years ago by tbrooks

  • Status changed from infoneeded to assigned

Replying to valkyrie:

Ok, with those comments in mind, I renamed the doctype and classification indices to "media"

Hmm. media sounds odd. can we call it "type" or type-code (prefer type)

I am having trouble testing this fix, since there doesn't seem to be any information in 690c. http://inspirebeta.net/search?ln=en&ln=en&p=690__c%3AReview&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb

Try http://inspirebeta.net/search?ln=en&ln=en&p=690C%3AReview&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb

it is 690C not 690c...thats because the "C" is not a subfield. To be precise it is 690C_a

This may require a bit of checking/changing in your configuration...

As a side comment - I certainly don't think we find these layers of MARC complexity valuable, I imagine only 1-2 people in INSPIRE know why there is a "C" there. In the long run a separation of Invenio from MARC would be desirable in my opinion, but in the occasional case of exporting to others using MARC, they could potentially still be useful

comment:6 Changed 3 years ago by valkyrie

Ok, I renamed the index.

Sadly, I still can't make it work. Could one of you take a look at my indexing stuff in the makefile? The subject and journal indexes work just fine, but the type index isn't behaving. Is "type" a reserved word for some other reason? Anyway, the code if you can take a look is in inspire/invenio branches called knowledge-bases.

comment:7 Changed 3 years ago by simko

  • Keywords syntax added

comment:8 Changed 3 years ago by valkyrie

  • Status changed from assigned to in_merge

ok, y'all, this is now working

INSPIRE and invenio branches called knowledge-bases available in AFS

comment:9 Changed 3 years ago by jblayloc

It is worth noting that to work correctly this branch requires configuration directives to be set in invenio-local.conf, like:

CFG_WEBSEARCH_SYNONYM_KBRS = {
  'journal': ['JOURNALS', 'leading_to_comma'],
  'collection': ['COLLECTION', 'exact'],
  'subject': ['SUBJECT', 'exact'],
}

comment:10 Changed 3 years ago by jblayloc

  • Status changed from in_merge to assigned

comment:11 Changed 3 years ago by jblayloc

  • Owner changed from valkyrie to jblayloc

I think that this actually does work. I failed it so I could reassign it to myself, because I'm doing some cleanups and chasing down problems with the unit tests (the problems appear to be in the tests themselves.) I'm snapshotting to my github inspire and invenio repositories, in 506-knowledge_bases-rebased. I'll be deploying to inspire-hep-dev in a minute so people can check this out.

comment:12 Changed 3 years ago by jblayloc

  • Keywords DEPLOYED added
  • Status changed from assigned to in_merge

I have now deployed this on prod, as per INSPIRE RT#148083.

I have cherry-picked the Invenio patch into inspire-ops on branch rebased-20110816 (which is still our latest deployment target) and Travis has put the Inspire patch into the inspire repository.

The branches are on my AFS and github as 506-knowledge-bases-rebased.

comment:13 Changed 3 years ago by hoc

OK, for this part:
PS, SCL, type, TC -> 690C_a (via doctype.kb)

SCL searching doesn't work yet.

the search
find a smith and scl s (or scl p)
should simply be an alias for
find a smith and tc p

There are other SCL values (as Travis pointed out on EVO it's a blend of FC and TC) but the "published" one is the most important and we want to seriously deprecate other uses of it.

comment:14 Changed 3 years ago by hoc

Here's a problem with conference papers:

find a witten and tc c [does not work]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+c

find a witten and tc conference paper [does not work, this used to work]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+conference+paper

find a witten and tc conference [works, this is new]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+conference

comment:15 Changed 3 years ago by jblayloc

I've moved heath's comments to #791 because that's where I mean to take care of them. I think this ticket is still ready for merge. Tibor, you'll want to make sure you fetch the very latest version of the 506-knowledge_bases-rebased branches, because I did some squashing tonight.

comment:16 Changed 3 years ago by Valkyrie Savage <vasavage@…>

  • Resolution set to fixed
  • Status changed from in_merge to closed

In [c5c240e3e3ad27db467926d04de21aa2f84478a9]:

WebSearch: type and field codes

  • updated SPIRES mappings to reflect different indices for type and field codes and journal codens
  • tests for same (fixes #506)(fixes #521)
  • changes behavior of bibknowledge slightly so that exact kbr search by key, for the empty string, returns hits only if the empty string is actually a kbr key.
  • and tests for this behavior

Co-authored-by: Joe Blaylock <jrbl@…>

comment:17 Changed 3 years ago by simko

  • Keywords DEPLOYED removed
Note: See TracTickets for help on using tickets.