HI,
I am looking to extract text from my pdf.
Currently I do :
Code:
PdfDocument document = PdfReader.Open(this.filePath);
foreach (PdfPage page in document.Pages)
{
for (int index = 0; index < page.Contents.Elements.Count; index++)
{
PdfDictionary.PdfStream stream = page.Contents.Elements.GetDictionary(index).Stream;
String res = "";
foreach (byte cd in stream.Value)
res += (char)cd;
//TODO: res encoding invalid
}
my variable res contains text but also text encoded.
I tried to use unicode, iso encoders without success.
Quote:
res contains:
BT
/R7 9.96 Tf
0.999386 0 0 1 278.4 761.6 Tm
( )Tj
-221.896 -12.12 Td
(\n \r)Tj
227.54 -675.96 Td
( )Tj
ET
I am looking for something like (Hello World)Tj.
Maybe it's coded through the font ?
Could you give me some hints to decode the text.
Thx
Regards,
alex