#506 closed defect (fixed)
Provide PS, type and field code searching
| Reported by: | tbrooks | Owned by: | jblayloc |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | WebSearch | Version: | |
| Keywords: | INSPIRE syntax | Cc: |
Description
Depends on #505
Once #505 is completed add kbs to the type code and field code abbreviations, as well as aliases in the SPIRES search syntax for
PS, SCL, type, TC -> 690C_a (via doctype.kb)
FC, field -> 65017 (via a classifications.kb)
Both kbs are published in Travis public inspire repo in the PS_FC_kbs branch
Additionally indexes should be made for both of these MARC codes.
After this is all done, we should looks at standardizing the coding across TC/Note/etc and FC/archive category/collection etc.
Change History (17)
comment:1 Changed 2 years ago by tbrooks
- Owner set to valkyrie
- Status changed from new to assigned
comment:2 Changed 2 years ago by valkyrie
- Status changed from assigned to in_merge
comment:3 Changed 2 years ago by simko
- Status changed from in_merge to assigned
1) You can make indexes by inserting proper configuration statements
to the top-level Makefile; see for example what I did earlier for
the "firstauthor" index. (INSPIRE repo, commit 2146ab19)
2) The journal index is already made, so you can test the journal
synonym searching (including volume and pages) even without creating
the other indexes. In the journal index synonym configuration, the
branch currently uses the massaging function leading_to_number, but
I think you should rather use leading_to_comma, because INSPIRE
convention for journal index is to separate journal,volume,page
values by commas. So with leading_to_number, journal searches
including volume and page would not work. For the two other indexes,
the exact massaging function seems appropriate.
3) For the two other classification/doctype indexes, we may perhaps
consider using the index-time synonyms instead of search-time
synonyms, especially if people are used to values like E from SPIRES
times.
4) But I'd like to clarify the terminology regarding
classification/doctype indexes first.
WRT "classification" KB, it generates values for 65017 field which is
called "subject" in cataloguing tools. How do we want to call the new
index in the user facing parts of INSPIRE, "classification" or
"subject" or "fc" or something else? Consider that people may be
typing and/or seeing query terms like classification:E or
classification:"Experiment-HEP", so we'd better choose something
nice. Maybe stick to "subject" like in cataloguing tools, maybe stick
to "fc" if we choose this to be the user-facing canonical index name
and not only an alias, etc.
WRT "doctype", there is a similar naming mismatch. Moreover, here the
word doctype has a very concrete meaning in Invenio, namely the type
of a document attached to a record. So it may be misleading to call
it that. (BTW, see also somewhat related filetype/doctype index issue
in ticket:473.)
5) When updating KB/index names, we may want to amend the following
description TALKTYPEDESC='Mapping of... something?' a bit. :)
6) You should also document in the commit log the weblinks/oalinks fix
done alongside the process. Ideally, we should perhaps separate this
fix into a commit of its own, since it is unrelated to synonym KBs.
comment:4 follow-up: ↓ 5 Changed 2 years ago by valkyrie
- Status changed from assigned to infoneeded
Ok, with those comments in mind, I renamed the doctype and classification indices to "media" and "subject", and I corrected all the other comments. I didn't separate out the weblinks/oalinks fix, although if you feel strongly about it I can.
I am having trouble testing this fix, since there doesn't seem to be any information in 690c. http://inspirebeta.net/search?ln=en&ln=en&p=690__c%3AReview&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb
?
Anyway, there are branches for both Invenio (for SPIRES syntax to translate the new index names correctly) and Inspire available in my afs repo as knowledge-bases.
comment:5 in reply to: ↑ 4 Changed 2 years ago by tbrooks
- Status changed from infoneeded to assigned
Replying to valkyrie:
Ok, with those comments in mind, I renamed the doctype and classification indices to "media"
Hmm. media sounds odd. can we call it "type" or type-code (prefer type)
I am having trouble testing this fix, since there doesn't seem to be any information in 690c. http://inspirebeta.net/search?ln=en&ln=en&p=690__c%3AReview&action_search=Search&sf=&so=d&rm=&rg=25&sc=0&of=hb
it is 690C not 690c...thats because the "C" is not a subfield. To be precise it is 690C_a
This may require a bit of checking/changing in your configuration...
As a side comment - I certainly don't think we find these layers of MARC complexity valuable, I imagine only 1-2 people in INSPIRE know why there is a "C" there. In the long run a separation of Invenio from MARC would be desirable in my opinion, but in the occasional case of exporting to others using MARC, they could potentially still be useful
comment:6 Changed 2 years ago by valkyrie
Ok, I renamed the index.
Sadly, I still can't make it work. Could one of you take a look at my indexing stuff in the makefile? The subject and journal indexes work just fine, but the type index isn't behaving. Is "type" a reserved word for some other reason? Anyway, the code if you can take a look is in inspire/invenio branches called knowledge-bases.
comment:7 Changed 2 years ago by simko
- Keywords syntax added
comment:8 Changed 2 years ago by valkyrie
- Status changed from assigned to in_merge
ok, y'all, this is now working
INSPIRE and invenio branches called knowledge-bases available in AFS
comment:9 Changed 22 months ago by jblayloc
It is worth noting that to work correctly this branch requires configuration directives to be set in invenio-local.conf, like:
CFG_WEBSEARCH_SYNONYM_KBRS = {
'journal': ['JOURNALS', 'leading_to_comma'],
'collection': ['COLLECTION', 'exact'],
'subject': ['SUBJECT', 'exact'],
}
comment:10 Changed 22 months ago by jblayloc
- Status changed from in_merge to assigned
comment:11 Changed 22 months ago by jblayloc
- Owner changed from valkyrie to jblayloc
I think that this actually does work. I failed it so I could reassign it to myself, because I'm doing some cleanups and chasing down problems with the unit tests (the problems appear to be in the tests themselves.) I'm snapshotting to my github inspire and invenio repositories, in 506-knowledge_bases-rebased. I'll be deploying to inspire-hep-dev in a minute so people can check this out.
comment:12 Changed 22 months ago by jblayloc
- Keywords DEPLOYED added
- Status changed from assigned to in_merge
I have now deployed this on prod, as per INSPIRE RT#148083.
I have cherry-picked the Invenio patch into inspire-ops on branch rebased-20110816 (which is still our latest deployment target) and Travis has put the Inspire patch into the inspire repository.
The branches are on my AFS and github as 506-knowledge-bases-rebased.
comment:13 Changed 22 months ago by hoc
OK, for this part:
PS, SCL, type, TC -> 690C_a (via doctype.kb)
SCL searching doesn't work yet.
the search
find a smith and scl s (or scl p)
should simply be an alias for
find a smith and tc p
There are other SCL values (as Travis pointed out on EVO it's a blend of FC and TC) but the "published" one is the most important and we want to seriously deprecate other uses of it.
comment:14 Changed 22 months ago by hoc
Here's a problem with conference papers:
find a witten and tc c [does not work]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+c
find a witten and tc conference paper [does not work, this used to work]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+conference+paper
find a witten and tc conference [works, this is new]
http://inspirebeta.net/search?ln=en&ln=en&p=find+a+witten+and+tc+conference
comment:15 Changed 22 months ago by jblayloc
I've moved heath's comments to #791 because that's where I mean to take care of them. I think this ticket is still ready for merge. Tibor, you'll want to make sure you fetch the very latest version of the 506-knowledge_bases-rebased branches, because I did some squashing tonight.
comment:16 Changed 22 months ago by Valkyrie Savage <vasavage@…>
- Resolution set to fixed
- Status changed from in_merge to closed
comment:17 Changed 22 months ago by simko
- Keywords DEPLOYED removed

I don't know how to build the indexes, but this is available in my public INSPIRE branch (/afs/slac.stanford.edu/public/groups/library/valkyrie-public-git/inspire-valkyrie.git/) as knowledge-bases.