Opened 3 years ago

Last modified 15 months ago

#662 new enhancement

Improved BibTeX export

Reported by: skaplun Owned by:
Priority: critical Milestone:
Component: BibFormat Version:
Keywords: thesis theses CDS proceedings Cc: Sho.Maruyama@…

Description

Currently BibTeX export is generated out of the box, by using a single output format, calling a single format element, namely bfe_bibtex.py. This imply the business-logic of the BibTeX export is all hard-coded in one place and can not easily be adapted to the most different document type. This has an impact, e.g. in CDS, when exporting Thesis documents.

One possible solution would be to have generic bfe_bibtex_field.py, based on bfe_bibtex.format_bibtex_field, able to format certain subfields, and to provide demo BibTeX exports, that are adapted to the specific document type.

Change History (7)

comment:1 Changed 3 years ago by skaplun

Some more on the subject (from Savannah)
[...]
Request came from a user (Yngve Inntjore Levinsen@…) :
I have a couple of proposals to the bibtex formatting you provide.
First off, I would like to propose a new entry "url", that equals the CDS url (or if you think it is more correct, directly to the ps/pdf document). Most bibtex readers support this tag, so that you get a link to the web page and/or document in this url.
The second I would like to propose is that instead of putting the oai: entry in the title, you create a new tag called oai. This would then eventually work in similar manner as the doi tag already does, and not clutter up the title.
You could also consider to use the tag "abstract" that many bibtex readers already know of. This is not necessary when you want to use the bibtex to reference in a paper, but it is nice when you have your own bibtex library of the documents that are relevant to you. Myself I have hundreds of papers in my bibtex file already, and use it for storing all articles that might be of interest to me now or in the future.
In order to explain what I mean, I attach a bibtex proposal to this article: http://cdsweb.cern.ch/record/1299163

@article{Gschwendtner:1299163, 
   author = "Gschwendtner, E and Apyan, A and Elsener, K and Sailer, A and Uythoven, J and  Appleby, R B and Salt, M and Ferrari, A and Ziemann, V", 
   title = "The ClIC Post-Collision Line.", 
   number = "EuCARD-CON-2010-030", 
   year = "2010", 
   url = "http://cdsweb.cern.ch/record/1299163", 
   oai = "cds.cern.ch:1299163", 
   abstract = "The 1.5 TeV CLIC beams, with a total power of 14 MW per beam, are disrupted at the interaction point due to the very strong beam-beam effect. As a result, some 3.5 MW reach the main dump in form of beamstrahlung photons. About 0.5 MW of e+e- pairs with a very broad energy spectrum need to be disposed of along the post-collision line. The conceptual design of this beam line will be presented. Emphasis will be on the optimization studies of the CLIC post-collision line design with respect to the energy deposition in windows, dumps and absorbers, on the design of the luminosity monitoring for a fast feedback to the beam steering and on the background conditions for the luminosity monitoring equipment." 
}

[...]

comment:3 Changed 3 years ago by arwagner

Other fields come to mind, especially when leaving "only journal articles". Years ago I did a quite extensive mapping from PICA format (a library catalogue) to BibTeX which I could contribute upon request. Some samples:

@BOOK{613652282,

title = {{R}adiowave propagation: physics and applications},
publisher = {Wiley},
year = {2010},
author = {Levis, Curt A. AND Johnson, Joel T. AND Teixeira, Fernando L.},
pages = {XII, 301 S.},
address = {Hoboken, NJ},
isbn = {9780470542958},
rvk = {ZN 3240 L666},
comment = {Ill., graph. Darst.},
ddc = {621.384/11},
keywords = {ELT, P4, Radio / Radio wave propagation / },
language = {eng},
loc = {TK6565.A6},
timestamp = {2010.11.24},
url = {http://www.gbv.de/dms/ilmenau/toc/613652282.PDF}

}

Notice the fields sisbn (10-digit isbn, comes in handy if one wants to fetch covers ;) ddc, loc and keywords which can be populated by 082 and 6xx categories. For library materials one might probably also export the shelfmark (above in the rvk field, a common German system). URL might also point to a scanned TOC.

Also note that it might be very sensible to escape captials in {}.

Finally, a record generated from arXiv:

@ARTICLE{Aubert-2005a,

author = {Aubert, B. and others},
title = {{S}earch for lepton flavor violation in the decay $\tau \to \mu \gamma$},
journal = {Physical Review Letters},
year = {2005},
volume = {95},
pages = {041802},
abstract = {A search for the nonconservation of lepton flavor number in the decay

tau±-->µ±gamma has been performed using 2.07×108 e+e--->tau+tau-
events produced at a center-of-mass energy near 10.58 GeV with the
BABAR detector at the PEP-II storage ring. We find no evidence for
a signal and set an upper limit on the branching ratio of [script
B](tau±-->µ±gamma)<6.8×10-8 at 90% confidence level.},

collaboration = {BABAR},
doi = {10.1103/PhysRevLett.95.041802},
eprint = {hep-ex/0502032},
file = {Aubert_2005ye-eprint.pdf:Aubert_2005ye-eprint.pdf:PDF;hep-ex0502032.ps.gz:/scratch/arwagner/papers/Flavourviolation/hep-ex0502032.ps.gz:PDF},
slaccitation = {%%CITATION = HEP-EX 0502032;%%}

}

Note also the fields for eprint, doi and slaccitation. file contains linkages to locally stored full text files (JabRef syntax).

In case, I could give additional input on the issue :)

comment:4 follow-up: Changed 2 years ago by skaplun

  • Keywords proceedings added
  • Priority changed from major to critical

Moreover this is critical in CDS real use case scenarios. If a researcher is compiling his document bibliography by taking BibTeX from CDS, he currently obtain very poor export data. If he then doesn't enrich by hand the bibliography, this poor metadata will be included in his document, thus making more difficult for automatic mining tools, and cataloguers and other researchers to retrieve the intended document.

Take e.g.: <http://cdsweb.cern.ch/record/1390408>

Its BibTeX export currently doesn't mention any report number (such as LHCb-PROC-2011-060), nor it mentioned it has been presented at a conference.

@article{Callot:1390408,
      author       = "Callot, O",
      title        = "LHCb : From the detector to the first physics results",
      month        = "Oct",
      year         = "2011",
      note         = "Linked to talk LHCb-TALK-2011-176",
}
Version 0, edited 2 years ago by skaplun (next)

comment:5 in reply to: ↑ 4 Changed 2 years ago by jcaffaro

Replying to skaplun:

@article{Callot:1390408,

author = "Callot, O",
title = "LHCb : From the detector to the first physics results",
month = "Oct",
year = "2011",
note = "Linked to talk LHCb-TALK-2011-176",

}

A similar request came today. What could be the preferred output?

  1. @inproceedings{Callot:1390408,
      author       = "Callot, O",
      title        = "LHCb : From the detector to the first physics results",
      month        = "Oct",
      year         = "2011",
      note         = "Linked to talk LHCb-TALK-2011-176",
      crossref     = {1378086},
    }
    @proceedings{1378086,
      title        = "oai:cds.cern.ch:1378086. HEP-MAD 11, 5th High-Energy
                      Physics Conference in Madagascar",
      booktitle    = "oai:cds.cern.ch:1378086. HEP-MAD 11, 5th High-Energy
                      Physics Conference in Madagascar",
      year         = "2011",
      month        = "Aug"
    }
    
  1. @inproceedings{Callot:1390408,
      author       = "Callot, O",
      title        = "LHCb : From the detector to the first physics results",
      booktitle    = "oai:cds.cern.ch:1378086. HEP-MAD 11, 5th High-Energy
                      Physics Conference in Madagascar",
      month        = "Oct",
      year         = "2011",
      note         = "Linked to talk LHCb-TALK-2011-176",
    }
    
  1. @inproceedings{Callot:1390408,
      author       = "Callot, O",
      title        = "LHCb : From the detector to the first physics results",
      month        = "Oct",
      year         = "2011",
      note         = "Linked to talk LHCb-TALK-2011-176",
      howpublished = "oai:cds.cern.ch:1378086. HEP-MAD 11, 5th High-Energy
                      Physics Conference in Madagascar, Aug 2011",
    }
    

Some quick comments (leaving out HOW to implement the above):

  1. What if a bibliography is generated out of a search query or a basket? The @proceedings might be repeated unnecessarily several times in the output, which might be an issue?
  2. What if month and year of the conference are different than the contribution? Shall it these be also included in booktitle.
  3. Same as for B. Is it semantically more/less correct?

comment:6 Changed 2 years ago by skaplun

Well according to http://en.wikipedia.org/wiki/BibTeX#Cross-referencing this should be the way, (i.e. the first proposal). For that we might simply solve it with some pythonic hack :-) where we might introduce some closure or sort of similar thing that would not re-display twice per request the same @proceeding :-) (I can imagine a hackish format element that is aware of the request object and store there the list of already outputted proceedings).

Otherwise this hack can be generalized in an extension of bibformat where we would allow for sorts of singletons formats (that can't be outputted more than once in a request).

comment:7 Changed 15 months ago by skaplun

  • Type changed from task to enhancement
Note: See TracTickets for help on using tickets.