PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 6:05 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 5 posts ] 
Author Message
PostPosted: Sun Aug 20, 2017 3:26 am 
Offline

Joined: Sun Aug 13, 2017 4:16 am
Posts: 6
Hi,

Maybe you can help me with the following problem...

I recive a very simple pdf file, i read the file and sett some security settings and everyone is happy so far, but now i have to add bookmarks
i was thinking in read the code of the html, try to interpretate what is a heading and start to make the tree myself, do you know a better way?

Regards


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 20, 2017 5:24 pm 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 909
Location: CCAA
Hi!
Asugusto wrote:
i was thinking in read the code of the html [...]
PDF is close to PostScript and far from HTML.
Extracting text from PDF is not trivial. Extracting text with font attributes is a bit more challenging.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 20, 2017 8:19 pm 
Offline

Joined: Sun Aug 13, 2017 4:16 am
Posts: 6
Hi, thanks for the answer

I know that is challenging, but i think that i have no option... Let me explain the situacion:

All i have is the html and a base url.
So i'm rendering the html in a browser engine that obtains the styles and scripts, once are rendered i print a simple pdf

Now i have to add bookmarks to all the Headings (H1...H9) but is complicated :(

I discard the option of create the pdf by myself and then inject the images, graphics, etc.
Because there are continuesly changing, are large and i guess it would not be equal to the original html...

Any suggestion?


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 20, 2017 9:13 pm 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 909
Location: CCAA
Asugusto wrote:
All i have is the html and a base url.
So i'm rendering the html in a browser engine that obtains the styles and scripts, once are rendered i print a simple pdf
Open source? Any chance to intercept the process?

Do you have control over the CSS? This could make things easier - e.g. by having distinct font sizes or by having minimal (invisible) variations of the text colour.

With all PDFs coming from the same PDF generator, things will be a bit simpler.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Sun Aug 20, 2017 9:28 pm 
Offline

Joined: Sun Aug 13, 2017 4:16 am
Posts: 6
Yes, i'm using ChromiumWebBrowser from CefSharp (https://github.com/cefsharp/CefSharp), have to search but i think i can intercept it.

About the control of the css yes, i could have it. But i'm not seeing how it will helpme, can you explain me?


Regards


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 138 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group