PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Detect headings and create bookmarks from existing pdf
https://forum.pdfsharp.net/viewtopic.php?f=2&t=3643
Page 1 of 1

Author:  Asugusto [ Sun Aug 20, 2017 3:26 am ]
Post subject:  Detect headings and create bookmarks from existing pdf

Hi,

Maybe you can help me with the following problem...

I recive a very simple pdf file, i read the file and sett some security settings and everyone is happy so far, but now i have to add bookmarks
i was thinking in read the code of the html, try to interpretate what is a heading and start to make the tree myself, do you know a better way?

Regards

Author:  TH-Soft [ Sun Aug 20, 2017 5:24 pm ]
Post subject:  Re: Detect headings and create bookmarks from existing pdf

Hi!
Asugusto wrote:
i was thinking in read the code of the html [...]
PDF is close to PostScript and far from HTML.
Extracting text from PDF is not trivial. Extracting text with font attributes is a bit more challenging.

Author:  Asugusto [ Sun Aug 20, 2017 8:19 pm ]
Post subject:  Re: Detect headings and create bookmarks from existing pdf

Hi, thanks for the answer

I know that is challenging, but i think that i have no option... Let me explain the situacion:

All i have is the html and a base url.
So i'm rendering the html in a browser engine that obtains the styles and scripts, once are rendered i print a simple pdf

Now i have to add bookmarks to all the Headings (H1...H9) but is complicated :(

I discard the option of create the pdf by myself and then inject the images, graphics, etc.
Because there are continuesly changing, are large and i guess it would not be equal to the original html...

Any suggestion?

Author:  TH-Soft [ Sun Aug 20, 2017 9:13 pm ]
Post subject:  Re: Detect headings and create bookmarks from existing pdf

Asugusto wrote:
All i have is the html and a base url.
So i'm rendering the html in a browser engine that obtains the styles and scripts, once are rendered i print a simple pdf
Open source? Any chance to intercept the process?

Do you have control over the CSS? This could make things easier - e.g. by having distinct font sizes or by having minimal (invisible) variations of the text colour.

With all PDFs coming from the same PDF generator, things will be a bit simpler.

Author:  Asugusto [ Sun Aug 20, 2017 9:28 pm ]
Post subject:  Re: Detect headings and create bookmarks from existing pdf

Yes, i'm using ChromiumWebBrowser from CefSharp (https://github.com/cefsharp/CefSharp), have to search but i think i can intercept it.

About the control of the css yes, i could have it. But i'm not seeing how it will helpme, can you explain me?


Regards

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/