PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 6:26 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Thu Sep 14, 2017 12:43 pm 
Offline

Joined: Thu Sep 14, 2017 12:35 pm
Posts: 3
Hi folks,

Trying to process a PDF file and split it using the bookmarks defined using PDFSharp and while I can get a list of bookmarks I can not figure out how to actually figure out what page number corresponds to the bookmark definition.

Back story: One of the engineering software we use generates a single PDF file that actually consists of three separate documents. In the infinite wisdom of this enterprise software company, they don't actually let you split these and save them as separate PDFs. There are also a couple of other quirks we post-process so I have a small utility that engineers run their output files through and I'd like to add the functionality to split that combined PDF into separate documents.

An example PDF file I am working with has three top level bookmarks defined, on pages 1, 5 and 6 and while I can see the bookmarks with the snippet below I couldn't figure out a way to map the bookmark to a page number.

Splitting the PDF seems to be fairly well documented, what I am stuck with is how I can map bookmarks to page numbers?

Test Code:

Code:
using (PdfDocument document = PdfReader.Open("test.pdf", PdfDocumentOpenMode.Import))
{
    PdfDictionary outline = document.Internals.Catalog.Elements.GetDictionary("/Outlines");

    Console.WriteLine("Page count: " + document.PageCount);

    foreach(var page in document.Pages)
    {
        // any hierarchy info on the page itself? doesn't seem to have any.
        Console.WriteLine(page.ToString());

    }

    for (PdfDictionary child = outline.Elements.GetDictionary("/First"); child != null; child = child.Elements.GetDictionary("/Next"))
    {
        Console.WriteLine(child.Elements.GetString("/Title"));

        // FIXME: get page numbers?

    }

}


Results in:

Code:
Page count: 9
<< /Contents [ 1019 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 3874 2667 ] /Parent 1 0 R /Resources 1018 0 R /Type /Page >>
<< /Contents [ 1022 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 3874 2667 ] /Parent 1 0 R /Resources 1021 0 R /Type /Page >>
<< /Contents [ 1025 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 3874 2667 ] /Parent 1 0 R /Resources 1024 0 R /Type /Page >>
<< /Contents [ 1028 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 3874 2667 ] /Parent 1 0 R /Resources 1027 0 R /Type /Page >>
<< /Contents [ 1032 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 842 595 ] /Parent 1 0 R /Resources 1031 0 R /Type /Page >>
<< /Annots [ 46 0 R 48 0 R 50 0 R 52 0 R 54 0 R 56 0 R 58 0 R 60 0 R 62 0 R 64 0 R 66 0 R 68 0 R 70 0 R 72 0 R 74 0 R ] /Contents [ 1043 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 1130 799 ] /Parent 1 0 R /Resources 1042 0 R /Type /Page >>
<< /Annots [ 82 0 R 84 0 R 86 0 R 88 0 R 90 0 R 92 0 R 94 0 R 96 0 R 98 0 R 100 0 R 102 0 R 104 0 R 106 0 R 108 0 R 110 0 R 112 0 R 114 0 R 116 0 R 118 0 R 120 0 R 122 0 R 124 0 R 126 0 R 128 0 R 130 0 R 132 0 R 134 0 R 136 0 R 138 0 R 140 0 R 142 0 R 144 0 R 146 0 R 148 0 R 150 0 R 152 0 R 154 0 R 156 0 R 158 0 R ] /Contents [ 1048 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 1130 799 ] /Parent 1 0 R /Resources 1047 0 R /Type /Page >>
<< /Annots [ 166 0 R 168 0 R 170 0 R 172 0 R 174 0 R 176 0 R 178 0 R 180 0 R 182 0 R ] /Contents [ 1053 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 1130 799 ] /Parent 1 0 R /Resources 1052 0 R /Type /Page >>
<< /Annots [ 190 0 R 192 0 R 194 0 R 196 0 R ] /Contents [ 1058 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 1130 799 ] /Parent 1 0 R /Resources 1057 0 R /Type /Page >>
Bookmark 1
Bookmark 2
Bookmark 3


Manually looking at the file I know the three top level bookmarks defined are on pages 1 (Bookmark 1), 5 (Bookmark 2) and 6 (Bookmark 3). How can I go about extracting this information using PDFSharp?

Thanks for any pointers.


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 14, 2017 1:13 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
Hi!
hamzas wrote:
Thanks for any pointers.
Those outlines (bookmark entries) may have an Action entry "/A" or a Destination entry "/Dest". The latter contains the page reference directly, the former should be a GoTo action with a page reference.

Analyse the outline elements in the debugger and see which properties allow you get "/Dest" or "/A".

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 14, 2017 2:20 pm 
Offline

Joined: Thu Sep 14, 2017 12:35 pm
Posts: 3
Thanks for the help, Thomas.

I am not very familiar with the PDF file format but according to this doc (http://www.pdfsharp.net/wiki/WorkOnPdfO ... ample.ashx) I should be looking for "/S" and "/D", perhaps?

Traversing through one of the outline objects and looking for "/A" as you've suggested, I did get the following:

Image

Still don't seem to get page numbers, per se. Is
Code:
iref(39, 0)
the secret? How can I map this to an actual page number?

Cheers.


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 14, 2017 3:33 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
Your page dump does not include the ID of the page objects. One of the pages will have the ID "39 0" and its position in the dictionary tells you the page number in the PDF.

Bookmarks with "/A" are rare, deprecated by Adobe, less compatible, and require more bytes. With most other PDF files you will find the "/Dest" element at the outline.

With outlines, the destination is "/Dest". With actions, the destination is "/D".

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Thu Sep 14, 2017 10:40 pm 
Offline

Joined: Thu Sep 14, 2017 12:35 pm
Posts: 3
Thanks for the pointers Thomas, looks like I have some more digging to do!


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 144 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group