PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sun Dec 21, 2014 10:10 pm

All times are UTC




Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: HTML TO PDF Conversion
PostPosted: Thu Apr 08, 2010 2:44 pm 
Offline

Joined: Thu Apr 08, 2010 2:35 pm
Posts: 4
Hi I would like to know what would be the best strategy to convert an ASP.NET Server Control's rendered HTML to PDF using PDFSharp, the PDF should look exactly the same as the control looks in the browser.
I am banging my head against this since last 3 weeks :(


Top
 Profile  
 
PostPosted: Thu Apr 08, 2010 5:21 pm 
Offline

Joined: Mon Jan 04, 2010 2:44 pm
Posts: 23
You have set yourself an impossible task. PDF is designed for print output, and, as such explicitly specifies physical things like line breaks, page breaks, word spacing, and the like. HTML is designed for video display, and generally specifies "put line breaks wherever you [the browser, acting on the users instructions] think best", "Page breaks? What are page breaks?", "kern words however you [the browser again] see fit", and the like.

That said, the best strategy is probably to start with whatever input the ASP.NET Server Control uses, not the HTML it generates. Parsing Strict HTML is a pain; parsing Javascript is a terror.


Top
 Profile  
 
PostPosted: Fri Apr 09, 2010 4:43 am 
Offline

Joined: Thu Apr 08, 2010 2:35 pm
Posts: 4
Thanks DaleStan,
But is it possible to get a look and feel as close as possible?
I typically would like to write a method:
SaveControlAsPdf(System.Web.UI.Control control)
And this method would save the control to pdf.
I have already written some code to get the HTML of the control.
Is there anything like HTMLParser of iTextSharp?.
Thanks in advance


Top
 Profile  
 
PostPosted: Fri Apr 09, 2010 10:20 pm 
Offline

Joined: Mon Dec 07, 2009 8:33 am
Posts: 8
Hello,
I have too been wondering about how to archive web pages.
However I found a book which concerns the archiving of webpages.

I dont know what your reason for doing this is, but if you want PDF-A, you can not link to other directories from within the PDF (it all has to be embedded).

What you could do, is to take a screenshot of the webpage (can use the webbrowser in .net), then print the screenshot to the pdf.

But the thing abuot this, is that it is a picture.. which kindof sucks.. but you could add metadata from the HTML-file. If not, I guess you will have to make some custom code to "translate" or "parse" the HTML to some elements in the PDF. It should be doable, but I would then base it on a framework that supports PDF-A, as it would be a product you could sell.

Dont know of any frameworks that support PDF-A though.


Top
 Profile  
 
PostPosted: Mon Apr 12, 2010 9:27 am 
Offline

Joined: Thu Apr 08, 2010 2:35 pm
Posts: 4
Hi,
I do not need any form of linking.
Is there a way then to achive this?
Can you please share the book if possible?


Top
 Profile  
 
PostPosted: Tue Apr 20, 2010 6:12 am 
Offline

Joined: Mon Dec 07, 2009 8:33 am
Posts: 8
uday wrote:
Hi,
I do not need any form of linking.
Is there a way then to achive this?
Can you please share the book if possible?

Sorry, I havent have had time to read it yet, even though it is not a very large book.
If you still want to know the title, please pm me :-)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: Bing [Bot] and 4 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group