PDFsharp & MigraDoc Foundation https://forum.pdfsharp.net/ |
|
From Stream.value to https://forum.pdfsharp.net/viewtopic.php?f=2&t=1902 |
Page 1 of 1 |
Author: | ag3 [ Fri Feb 10, 2012 11:42 am ] |
Post subject: | From Stream.value to |
HI, I am looking to extract text from my pdf. Currently I do : Code: PdfDocument document = PdfReader.Open(this.filePath); foreach (PdfPage page in document.Pages) { for (int index = 0; index < page.Contents.Elements.Count; index++) { PdfDictionary.PdfStream stream = page.Contents.Elements.GetDictionary(index).Stream; String res = ""; foreach (byte cd in stream.Value) res += (char)cd; //TODO: res encoding invalid } my variable res contains text but also text encoded. I tried to use unicode, iso encoders without success. Quote: res contains: BT /R7 9.96 Tf 0.999386 0 0 1 278.4 761.6 Tm ( )Tj -221.896 -12.12 Td (\n \r)Tj 227.54 -675.96 Td ( )Tj ET I am looking for something like (Hello World)Tj. Maybe it's coded through the font ? Could you give me some hints to decode the text. Thx Regards, alex |
Author: | Thomas Hoevel [ Mon Feb 13, 2012 9:38 am ] |
Post subject: | Re: From Stream.value to |
Hi! Not my area of expertise. Maybe this thread will help: viewtopic.php?p=4010#p4010 |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |