Opened 21 months ago
Closed 18 months ago
#799 closed enhancement (fixed)
RefExtract: introduce author extraction mode
| Reported by: | simko | Owned by: | chayward |
|---|---|---|---|
| Priority: | major | Milestone: | |
| Component: | DocExtract | Version: | |
| Keywords: | Cc: |
Description
RefExtract should be enhanced with author extraction mode, behaving like giva. That is, provided an input PDF file, one should be able to run:
$ refextract --extract-authors -f 1:file.pdf
and RefExtract should study the beginning portion of the file, looking for authors and affiliations, and it should output something like:
<datafield tag="100" ind1=" " ind2=" ">
<subfield code="a">Doe, J</subfield>
<subfield code="u">U. Foo</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Bloggs, J</subfield>
<subfield code="u">U. Bar</subfield>
</datafield>
<datafield tag="700" ind1=" " ind2=" ">
<subfield code="a">Mustermann, E</subfield>
<subfield code="u">U. Xyzzy</subfield>
<subfield code="u">U. Zyxxy</subfield>
</datafield>
IOW, refextract would provide two modes: the traditional --extract-references mode that would be the default, and a new --extract-authors mode the addition of which is the task of this ticket.
(Note that this may later touch a question of marking detected fields with provenance $2 and $9 information so that operating author extraction on the back end may be automatised and that refextract-found fields won't overwrite human-edited fields.)
Change History (2)
comment:1 Changed 18 months ago by simko
- Status changed from new to in_merge
comment:2 Changed 18 months ago by simko
- Resolution set to fixed
- Status changed from in_merge to closed

Merged in [b1759372d4e7f4c24a980f0e3e5fbaf69f41f8d1].