PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Fri Apr 26, 2024 3:57 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Mon Jun 16, 2008 11:26 pm 
Offline

Joined: Mon Jun 16, 2008 11:06 pm
Posts: 2
Please pardon me, I am new to PDFsharp. I would like to commend those that have taken the time to create such an incredible tool.

I have taken the time to do some searching. But as of yet I haven't been able to find anything that lead me in the right direction.

First of all I am trying to strip a document of it's text to look for certain strings. While I have been able to do this, I also need to look up the position of the found strings, because there is a posibility of the strings occuring in other places in the document. If they occur else where I do not care.

The documents are in the same templated format, so I'm certain that the string I'm looking for will always be in the same position, or close enough that I can allow for a small amount of movement.

I have interated through everything I can think of to get at specific elements and still been unable to find any strings at all.

It maybe my lack of understanding of the PDF format. But I was assuming that I would find all the elements for the document easily accessible. IE Document.Page.Elements. Then a foreach would take me through the page's elements allowing me to look for certain types. And the types would have all the relevant information such as Location, Text, Font, etc...

If this is possible I would certainly like to know.

The second thing I'm trying to do is to split these same pdf files and merge them into seperate ones based upon the incoming data. Thank you however wrote the split and merge routines. That part of the job will be easy to manage and won't take long to code once I get the first part out of the way.

Thanks in advanced for any help or assitance that can be provided.


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Tue Jul 15, 2008 10:35 pm 
Offline

Joined: Tue Jul 15, 2008 10:12 pm
Posts: 1
Hello,
Has anyone found any more information on this problem? I am looking for the exact same information as in the original post (text objects with location information)...

Any ideas?
Greg


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed Jul 16, 2008 8:20 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3096
Location: Cologne, Germany
Hi!

Depending on the PDF creator and the font, words may appear as small substrings in the PDF file.
What looks like "word" to the human reader may appear as "w", "o", and "rd" (or as "wo" and "rd" or maybe "w" and "ord") in the PDF file.

Finding strings may be easier if you use a fixed pitch font like Courier - but even this does not work with all PDF creators.
However most PDF creators will write "word" if a fixed pitch font is used.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 389 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group