PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Jul 18, 2024 3:33 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Tue Mar 06, 2012 6:48 pm 
Offline

Joined: Sat Mar 26, 2011 2:24 am
Posts: 6
I have a fairly simple task: I need to read a PDF file and write out its image contents while ignoring its text contents. So essentially I need to do the complement of "save as text".

Ideally, I would prefer to avoid any sort of re-compression of the image contents but if it's not possible, it's ok too.

Are the examples of how to do it?

Thanks!


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 07, 2012 8:30 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany
You can use the Export Images sample to get started, but several special cases are missing there:
http://www.pdfsharp.net/wiki/ExportImages-sample.ashx

Sometimes two filters apply to one image and code for non-JPEG images is completely missing.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 07, 2012 2:35 pm 
Offline

Joined: Sat Mar 26, 2011 2:24 am
Posts: 6
Thomas Hoevel wrote:
You can use the Export Images sample to get started, but several special cases are missing there:


Yes, I have looked at that example before. The problem is that it only saves "pictures", not the pictorial representation of the text. Is there another example that would show how to loop over all "text" items in a PDF document? I might be able to use that example for the following: Create a Document object from the original PDF and then loop over all the text pieces and either remove the textual contents of these pieces or replace them with something bogus. I would then export the modified Document into a different PDF file. Do you think this might work?


Top
 Profile  
Reply with quote  
PostPosted: Wed Mar 07, 2012 3:24 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany
Here is code that extracts text from PDF:
viewtopic.php?p=4010#p4010

Extracting text is a difficult task - also discussed here:
http://stackoverflow.com/a/9161732/162529

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Fri Mar 09, 2012 7:26 pm 
Offline

Joined: Sat Mar 26, 2011 2:24 am
Posts: 6
Thomas, do you have a link to a document that describes the latest version of the PDF format in detail? Or some older version? Thx


Top
 Profile  
Reply with quote  
PostPosted: Mon Mar 12, 2012 7:58 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany
Try Adobe:
http://www.adobe.com/devnet/pdf/pdf_reference.html

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 41 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group