PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Wed Apr 24, 2024 10:56 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 3 posts ] 
Author Message
 Post subject: Extracting text from pdf
PostPosted: Thu Nov 08, 2007 8:04 pm 
Offline

Joined: Thu Nov 08, 2007 7:57 pm
Posts: 2
Hi,

Is it possible to extract text from a pdf file?

It would be better yet if I could extract the text from a area from a page of the pdf instead of the entire file...

I am trying to do it with PDFSharp, but I am not finding a way to do it.

TIA,
Luiz Papa


Top
 Profile  
Reply with quote  
 Post subject: Extracting Text from PDF
PostPosted: Fri Nov 09, 2007 2:33 pm 
Offline

Joined: Fri Nov 09, 2007 2:25 pm
Posts: 1
I have been trying to do that, too. I have been able to use the ContentReader to read a page. I looped through all of the cObjects in the page, but couldn't figure out how to display the content of the object or how to determine if it had any text in it.

Dave Galloway


Top
 Profile  
Reply with quote  
 Post subject: Pdfbox
PostPosted: Fri Nov 09, 2007 4:39 pm 
Offline

Joined: Thu Nov 08, 2007 7:57 pm
Posts: 2
I think I will use pdfbox to do that.

The code below does exactly what I want. The only problem is that I have to put IKVM within my project references...

org.pdfbox.pdmodel.PDDocument doc = org.pdfbox.pdmodel.PDDocument.load(txtFile.Text);
org.pdfbox.util.PDFTextStripperByArea stripper = new org.pdfbox.util.PDFTextStripperByArea();
java.awt.geom.Rectangle2D rect = new java.awt.geom.Rectangle2D.Double(x, y, width, height);
stripper.addRegion("regiao1", rect);
stripper.setSortByPosition(true);
org.pdfbox.pdmodel.PDDocumentCatalog cat = doc.getDocumentCatalog();
org.pdfbox.pdmodel.PDPageNode pn = cat.getPages();
org.pdfbox.pdmodel.PDPage pag = pn.getKids().toArray()[0] as org.pdfbox.pdmodel.PDPage;
stripper.extractRegions(pag);
return stripper.getTextForRegion("regiao1");


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 205 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group