PDFsharp & MigraDoc Foundation • View topic - PDFReader, Position of elements

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

PDFReader, Position of elements

Moderator: Stefan Lange

Page 1 of 1

[ 3 posts ]

Print view

Previous topic | Next topic

Author

Message

SunsetTowers

Post subject: PDFReader, Position of elements

Posted: Mon Jun 16, 2008 11:26 pm

Joined: Mon Jun 16, 2008 11:06 pm
Posts: 2

Please pardon me, I am new to PDFsharp. I would like to commend those that have taken the time to create such an incredible tool.

I have taken the time to do some searching. But as of yet I haven't been able to find anything that lead me in the right direction.

First of all I am trying to strip a document of it's text to look for certain strings. While I have been able to do this, I also need to look up the position of the found strings, because there is a posibility of the strings occuring in other places in the document. If they occur else where I do not care.

The documents are in the same templated format, so I'm certain that the string I'm looking for will always be in the same position, or close enough that I can allow for a small amount of movement.

I have interated through everything I can think of to get at specific elements and still been unable to find any strings at all.

It maybe my lack of understanding of the PDF format. But I was assuming that I would find all the elements for the document easily accessible. IE Document.Page.Elements. Then a foreach would take me through the page's elements allowing me to look for certain types. And the types would have all the relevant information such as Location, Text, Font, etc...

If this is possible I would certainly like to know.

The second thing I'm trying to do is to split these same pdf files and merge them into seperate ones based upon the incoming data. Thank you however wrote the split and merge routines. That part of the job will be easy to manage and won't take long to code once I get the first part out of the way.

Thanks in advanced for any help or assitance that can be provided.

Top

greg7gkb

Post subject:

Posted: Tue Jul 15, 2008 10:35 pm

Joined: Tue Jul 15, 2008 10:12 pm
Posts: 1

Hello,
Has anyone found any more information on this problem? I am looking for the exact same information as in the original post (text objects with location information)...

Any ideas?
Greg

Top

Thomas Hoevel

Post subject:

Posted: Wed Jul 16, 2008 8:20 am

PDFsharp Guru

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3096
Location: Cologne, Germany

Hi!

Depending on the PDF creator and the font, words may appear as small substrings in the PDF file.
What looks like "word" to the human reader may appear as "w", "o", and "rd" (or as "wo" and "rd" or maybe "w" and "ord") in the PDF file.

Finding strings may be easier if you use a fixed pitch font like Courier - but even this does not work with all PDF creators.
However most PDF creators will write "word" if a fixed pitch font is used.

_________________
Regards
Thomas Hoevel
PDFsharp Team

Top

Page 1 of 1

[ 3 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: Google [Bot] and 389 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum