PDFsharp & MigraDoc Foundation • View topic - Examples of how to strip text from PDF?

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

Examples of how to strip text from PDF?

Moderator: Stefan Lange

Page 1 of 1

[ 6 posts ]

Print view

Previous topic | Next topic

Author

Message

ilyaz

Post subject: Examples of how to strip text from PDF?

Posted: Tue Mar 06, 2012 6:48 pm

Joined: Sat Mar 26, 2011 2:24 am
Posts: 6

I have a fairly simple task: I need to read a PDF file and write out its image contents while ignoring its text contents. So essentially I need to do the complement of "save as text".

Ideally, I would prefer to avoid any sort of re-compression of the image contents but if it's not possible, it's ok too.

Are the examples of how to do it?

Thanks!

Top

Thomas Hoevel

Post subject: Re: Examples of how to strip text from PDF?

Posted: Wed Mar 07, 2012 8:30 am

PDFsharp Guru

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany

You can use the Export Images sample to get started, but several special cases are missing there:
http://www.pdfsharp.net/wiki/ExportImages-sample.ashx

Sometimes two filters apply to one image and code for non-JPEG images is completely missing.

_________________
Regards
Thomas Hoevel
PDFsharp Team

Top

ilyaz

Post subject: Re: Examples of how to strip text from PDF?

Posted: Wed Mar 07, 2012 2:35 pm

Joined: Sat Mar 26, 2011 2:24 am
Posts: 6

Thomas Hoevel wrote:

You can use the Export Images sample to get started, but several special cases are missing there:

Yes, I have looked at that example before. The problem is that it only saves "pictures", not the pictorial representation of the text. Is there another example that would show how to loop over all "text" items in a PDF document? I might be able to use that example for the following: Create a Document object from the original PDF and then loop over all the text pieces and either remove the textual contents of these pieces or replace them with something bogus. I would then export the modified Document into a different PDF file. Do you think this might work?

Top

Thomas Hoevel

Post subject: Re: Examples of how to strip text from PDF?

Posted: Wed Mar 07, 2012 3:24 pm

PDFsharp Guru

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany

Here is code that extracts text from PDF:
viewtopic.php?p=4010#p4010

Extracting text is a difficult task - also discussed here:
http://stackoverflow.com/a/9161732/162529

_________________
Regards
Thomas Hoevel
PDFsharp Team

Top

ilyaz

Post subject: Re: Examples of how to strip text from PDF?

Posted: Fri Mar 09, 2012 7:26 pm

Joined: Sat Mar 26, 2011 2:24 am
Posts: 6

Thomas, do you have a link to a document that describes the latest version of the PDF format in detail? Or some older version? Thx

Top

Thomas Hoevel

Post subject: Re: Examples of how to strip text from PDF?

Posted: Mon Mar 12, 2012 7:58 am

PDFsharp Guru

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3101
Location: Cologne, Germany

Try Adobe:
http://www.adobe.com/devnet/pdf/pdf_reference.html

_________________
Regards
Thomas Hoevel
PDFsharp Team

Top

Page 1 of 1

[ 6 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: Bing [Bot], Google [Bot] and 41 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum