PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Fri Sep 06, 2024 1:18 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Tue Dec 07, 2010 3:09 pm 
Offline

Joined: Mon Mar 09, 2009 11:19 am
Posts: 12
Hello,

I would replace a string by another on the PDF, it's possible ? thank you verry mutch.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 08, 2010 1:28 pm 
Offline

Joined: Mon Mar 09, 2009 11:19 am
Posts: 12
Hello,

I don't have an answer for my questions, it's possible ?.
Thank you verry mutch.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 08, 2010 3:26 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3109
Location: Cologne, Germany
Possible? Yes.
Complicated? Yes.

See here:
viewtopic.php?p=3816#p3816

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 08, 2010 3:41 pm 
Offline

Joined: Mon Mar 09, 2009 11:19 am
Posts: 12
Thomas Hoevel wrote:
Possible? Yes.
Complicated? Yes.

See here:
viewtopic.php?p=3816#p3816


Hello,
Thank you verry mutch for your answer, i don't have à problem to extract a text from PDF, my problem was to replace a string by another, for exemple i would to replace alle M. with Mme. on my PDF, it's possible ?
Thank you.


Top
 Profile  
Reply with quote  
PostPosted: Wed Dec 08, 2010 4:44 pm 
Offline
Supporter
User avatar

Joined: Thu May 27, 2010 7:40 pm
Posts: 59
Location: New Hampshire, USA
Hello,

I'm no PDF Expert... but I think...

In pdf, each letter in a word could be written with a different font or style and would become its own 'element' in the document stream even though it looks like a single word when rendered.
This could make it difficult to search for a specific word. In this case I think you might have to assemble the document to know which characters are rendered next to each other, and in which order to do this search/replace accurately. (especially if the document were edited more than once given the way the catalogs and edits get applied by Acrobat.) My guess here...

If you are in control over how the doc was created, then you could be reasonably successful in doing string replacements using PdfReader class and iterating over the document's contents I would think as long as all the search terms were written using the same font and style and didn't get manually edited afterwords.

I do something similar, but I'm processing Annotations (hyperlinks) which are a bit easier to find and update. I wish I had some code to pass along, but I don't.

Perhaps you should look at the document explorer example in the Samples and see how it parses out the contents of a word document for display and see if you can use these techniques to solve your problem. I'm not sure offhand what the name of that project file is, but if you load the Samples master solution, you should be able to find it.

Let us know what you find out!

-Jeff


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 09, 2011 2:24 pm 
Offline

Joined: Tue Aug 09, 2011 2:00 pm
Posts: 1
Bit of a necro post coming up, but I never found the answer to this question on this forum myself.

Was stuck on this problem myself for a while but finally solved it using pfdSharp. The trick was to read out the stream in page.Contents.Elements.GetDictionary().Stream, convert the stream into a string. perform string.Replace() on all parts you need and then convert the new string back into a stream and save the new stream into your page.Contents.Elements.GetDictionary().Stream.Value.

Code:

byte[] inStream;
byte[] outStream;
string stringStream;

for (int i = 0; i < importDoc.PageCount; i++)
{
newPage = importDoc.Pages[i];
stringStream= "";

for (int j = 0; j < newPage.Contents.Elements.Count; j++)
{
PdfDictionary.PdfStream stream = newPage .Contents.Elements.GetDictionary(j).Stream;
inStream = stream.Value;
foreach (byte b in inStream)
stringStream += (char)b;

stringStream = stringStream.Replace("tag", stringStream);

outStream = PdfDictionary.PdfStream.RawEncode(stringStream);
newPage.Contents.Elements.GetDictionary(j).Stream.Value = outStream;
}

newPage = exportDoc.AddPage(newPage);
}

exportDoc.Save("Path.pdf");

Enjoy!


Top
 Profile  
Reply with quote  
PostPosted: Wed Oct 02, 2013 1:14 pm 
Offline

Joined: Wed Oct 02, 2013 1:08 pm
Posts: 1
Hello

I am completely new to this topic and have to replace a placeholder in a PDF file.
Please could you make a sample project for me available (C# or VB.NET)?

Kindes regards,
Ralph Koerber


Top
 Profile  
Reply with quote  
PostPosted: Fri Jan 03, 2014 8:07 pm 
Offline

Joined: Thu Jan 02, 2014 5:51 pm
Posts: 3
Ralph - I am looking to do this very same thing... anyone have any ideas..?

q-kev


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 85 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group