PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sat Apr 20, 2024 3:54 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 2 posts ] 
Author Message
PostPosted: Mon Feb 26, 2007 10:00 pm 
Offline

Joined: Sun Feb 25, 2007 1:19 pm
Posts: 1
Hi there all... I've tried creating a few test applications to answer this question, but cannot figure it out!

Can someone give me a simple example showing how to extract all the text in a pdf document into a single string? I would *greatly* appreciate any help you can provide!


Top
 Profile  
Reply with quote  
 Post subject:
PostPosted: Wed May 16, 2007 5:46 pm 
Offline

Joined: Fri Mar 23, 2007 11:37 pm
Posts: 16
Location: Berlin
Hello,
this is a very dirty solution, but it shows one way to get what you want. You do have to mind about encoding properly, as the example assumes, that the pdf text is encoded in default system encoding.

it extracts text from the first page only.

Code:
string pdfTextRegexp = @"(T[wdcm*])[\s]*(\[([^\]]*)\]|\((?<text>[^\)]*)\))[\s]*Tj";

PdfDocument r = PdfReader.Open(file);
PdfContents contents = r.Pages[0].Contents;
foreach (PdfReference o in contents.Elements) {
   PdfContent c = o.Value as PdfContent;
   if (c != null) {
      string content = Encoding.Default.GetString(c.Stream.Value);
      using (StringReader sr = new StringReader(content)) {
         string line;
         while ((line = sr.ReadLine()) != null) {
            Match m = Regex.Match(line, pdfTextRegexp, RegexOptions.Compiled);
            if (m.Success) {
               Debug.WriteLine(m.Groups["text"].Value);
            }
         }
      }
   }
}


Anyone who has a better solution, hopefully using the PDFsharp api, please contribute.

Regards,
André


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 2 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 300 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group