PDFsharp & MigraDoc Foundation • View topic - Detect headings and create bookmarks from existing pdf

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

Detect headings and create bookmarks from existing pdf

Moderator: Stefan Lange

Page 1 of 1

[ 5 posts ]

Print view

Previous topic | Next topic

Author

Message

Asugusto

Post subject: Detect headings and create bookmarks from existing pdf

Posted: Sun Aug 20, 2017 3:26 am

Joined: Sun Aug 13, 2017 4:16 am
Posts: 6

Hi,

Maybe you can help me with the following problem...

I recive a very simple pdf file, i read the file and sett some security settings and everyone is happy so far, but now i have to add bookmarks
i was thinking in read the code of the html, try to interpretate what is a heading and start to make the tree myself, do you know a better way?

Regards

Top

TH-Soft

Post subject: Re: Detect headings and create bookmarks from existing pdf

Posted: Sun Aug 20, 2017 5:24 pm

PDFsharp Expert

Joined: Sat Mar 14, 2015 10:15 am
Posts: 913
Location: CCAA

Hi!

Asugusto wrote:

i was thinking in read the code of the html [...]

PDF is close to PostScript and far from HTML.
Extracting text from PDF is not trivial. Extracting text with font attributes is a bit more challenging.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)

Top

Asugusto

Post subject: Re: Detect headings and create bookmarks from existing pdf

Posted: Sun Aug 20, 2017 8:19 pm

Joined: Sun Aug 13, 2017 4:16 am
Posts: 6

Hi, thanks for the answer

I know that is challenging, but i think that i have no option... Let me explain the situacion:

All i have is the html and a base url.
So i'm rendering the html in a browser engine that obtains the styles and scripts, once are rendered i print a simple pdf

Now i have to add bookmarks to all the Headings (H1...H9) but is complicated

I discard the option of create the pdf by myself and then inject the images, graphics, etc.
Because there are continuesly changing, are large and i guess it would not be equal to the original html...

Any suggestion?

Top

TH-Soft

Post subject: Re: Detect headings and create bookmarks from existing pdf

Posted: Sun Aug 20, 2017 9:13 pm

PDFsharp Expert

Joined: Sat Mar 14, 2015 10:15 am
Posts: 913
Location: CCAA

Asugusto wrote:

All i have is the html and a base url.
So i'm rendering the html in a browser engine that obtains the styles and scripts, once are rendered i print a simple pdf

Open source? Any chance to intercept the process?

Do you have control over the CSS? This could make things easier - e.g. by having distinct font sizes or by having minimal (invisible) variations of the text colour.

With all PDFs coming from the same PDF generator, things will be a bit simpler.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)

Top

Asugusto

Post subject: Re: Detect headings and create bookmarks from existing pdf

Posted: Sun Aug 20, 2017 9:28 pm

Joined: Sun Aug 13, 2017 4:16 am
Posts: 6

Yes, i'm using ChromiumWebBrowser from CefSharp (https://github.com/cefsharp/CefSharp), have to search but i think i can intercept it.

About the control of the css yes, i could have it. But i'm not seeing how it will helpme, can you explain me?

Regards

Top

Page 1 of 1

[ 5 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: Google [Bot] and 64 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum