PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

From Stream.value to
https://forum.pdfsharp.net/viewtopic.php?f=2&t=1902
Page 1 of 1

Author:  ag3 [ Fri Feb 10, 2012 11:42 am ]
Post subject:  From Stream.value to

HI,

I am looking to extract text from my pdf.

Currently I do :

Code:
PdfDocument document = PdfReader.Open(this.filePath);
            foreach (PdfPage page in document.Pages)
            {
                for (int index = 0; index < page.Contents.Elements.Count; index++)
                {

                    PdfDictionary.PdfStream stream = page.Contents.Elements.GetDictionary(index).Stream;
                    String res = "";
                    foreach (byte cd in stream.Value)
                        res += (char)cd;
                   //TODO: res encoding invalid
            }


my variable res contains text but also text encoded.
I tried to use unicode, iso encoders without success.

Quote:
res contains:
BT
/R7 9.96 Tf
0.999386 0 0 1 278.4 761.6 Tm
( )Tj
-221.896 -12.12 Td
(\n \r  )Tj
227.54 -675.96 Td
( )Tj
ET


I am looking for something like (Hello World)Tj.

Maybe it's coded through the font ?

Could you give me some hints to decode the text.

Thx

Regards,
alex

Author:  Thomas Hoevel [ Mon Feb 13, 2012 9:38 am ]
Post subject:  Re: From Stream.value to

Hi!

Not my area of expertise.

Maybe this thread will help:
viewtopic.php?p=4010#p4010

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/