Opened 3 years ago

Last modified 23 months ago

#438 new enhancement

WebSubmit: preserve bookmarks when stamping PDF files

Reported by: jcaffaro Owned by:
Priority: major Milestone:
Component: WebSubmit Version:
Keywords: Cc:

Description

The WebSubmit file stamper relies on the pdftk tool to process PDF files, but unfortunately pdftk does not preserve bookmarks (eg. table of content or "outlines"), even though it is able to dump them.

Ghostscript could be used instead to stamp files, as it can preserve such information.

Ghostscript could fully replace pdftk in this case, or could be used simply to enrich the resulting stamped file with the necessary instructions to create the bookmarks (PDFMark?), provided that these instructions can be extracted correctly from the original file.

Some additional untested information:

$ pdftk original_file.pdf dump_data
InfoKey: Creator
InfoValue: LaTeX with hyperref package
NumberOfPages: 88
BookmarkTitle: Introduction
BookmarkLevel: 1
BookmarkPageNumber: 9
BookmarkTitle: Context
BookmarkLevel: 2
BookmarkPageNumber: 9
BookmarkTitle: CERN
BookmarkLevel: 3
$ pdftk stamped_file.pdf update_info original_file.info output new_stamped_file

(does not seem to add the necessary information unfortunately)
$ gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -o stamped_file.pdf original_file.pdf stamp.pdf

(would apparently merge the two files and keep the bookmarks from both)
$ gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -o new_stamped_file.pdf \
       -c "[/Page 1 /View [/XYZ null null null] /Title (Title 1) /OUT pdfmark" \
       -c "[/Page 2 /View [/XYZ null null null] /Title (Title 2) /OUT pdfmark" \
-f stamped_file.pdf

(would apparently add the specified bookmarks instructions to the given file. They should be extracted from the original file, and cleaned for this syntax)

In any case one should be careful if bookmarks are linking to absolute page numbers, as the number of pages can change when adding a cover page. This would probably not be handled by Ghostscript.

Change History (1)

comment:1 Changed 23 months ago by skaplun

Alternatively a Pythonic version of this method could be implemented using the pyPdf library, in particular the mergePage method. In principle, pyPdf is able to copy over the new PDF all the metadata of the original one, thus it should be possible to also provide the bookmarks.

Note: See TracTickets for help on using tickets.