PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Examples of how to strip text from PDF?
https://forum.pdfsharp.net/viewtopic.php?f=2&t=1934
Page 1 of 1

Author:  ilyaz [ Tue Mar 06, 2012 6:48 pm ]
Post subject:  Examples of how to strip text from PDF?

I have a fairly simple task: I need to read a PDF file and write out its image contents while ignoring its text contents. So essentially I need to do the complement of "save as text".

Ideally, I would prefer to avoid any sort of re-compression of the image contents but if it's not possible, it's ok too.

Are the examples of how to do it?

Thanks!

Author:  Thomas Hoevel [ Wed Mar 07, 2012 8:30 am ]
Post subject:  Re: Examples of how to strip text from PDF?

You can use the Export Images sample to get started, but several special cases are missing there:
http://www.pdfsharp.net/wiki/ExportImages-sample.ashx

Sometimes two filters apply to one image and code for non-JPEG images is completely missing.

Author:  ilyaz [ Wed Mar 07, 2012 2:35 pm ]
Post subject:  Re: Examples of how to strip text from PDF?

Thomas Hoevel wrote:
You can use the Export Images sample to get started, but several special cases are missing there:


Yes, I have looked at that example before. The problem is that it only saves "pictures", not the pictorial representation of the text. Is there another example that would show how to loop over all "text" items in a PDF document? I might be able to use that example for the following: Create a Document object from the original PDF and then loop over all the text pieces and either remove the textual contents of these pieces or replace them with something bogus. I would then export the modified Document into a different PDF file. Do you think this might work?

Author:  Thomas Hoevel [ Wed Mar 07, 2012 3:24 pm ]
Post subject:  Re: Examples of how to strip text from PDF?

Here is code that extracts text from PDF:
viewtopic.php?p=4010#p4010

Extracting text is a difficult task - also discussed here:
http://stackoverflow.com/a/9161732/162529

Author:  ilyaz [ Fri Mar 09, 2012 7:26 pm ]
Post subject:  Re: Examples of how to strip text from PDF?

Thomas, do you have a link to a document that describes the latest version of the PDF format in detail? Or some older version? Thx

Author:  Thomas Hoevel [ Mon Mar 12, 2012 7:58 am ]
Post subject:  Re: Examples of how to strip text from PDF?

Try Adobe:
http://www.adobe.com/devnet/pdf/pdf_reference.html

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/