PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Need help with Hexadecimal strings
https://forum.pdfsharp.net/viewtopic.php?f=2&t=3519
Page 1 of 1

Author:  jma [ Wed Dec 28, 2016 3:15 pm ]
Post subject:  Need help with Hexadecimal strings

hi! :)

PdfSharp is a very cool library but I have a problem about extracting texts. PDF reference explain this : "Strings may also be written in hexadecimal form".
I get some hexadecimal form after extracting texts but these datas are differents of original text.

For exemple :
My PDF contains "Ajout d’une langue à un projet." and when i extract texts with PdfSharp, i get this : "00040169017D01B5019A0003011A035B01B50176011E0003016F01020176015001B5011E00030103000301B5017600030189018C017D0169011E019A0358".
I can't find the original text. Indeed when I convert this to ASCII, it gives me "i}µš[µvovPµµv‰Œ}išX" which is totaly different compared the original text..


Does anyone have any clue what is the issue and how to fix it?

Thanks beforehand for your reply!

Author:  TH-Soft [ Wed Dec 28, 2016 4:13 pm ]
Post subject:  Re: Need help with Hexadecimal strings

Hi!

PDF files often contain a subset of Unicode fonts and there should be a mapping table that allows you to translate the indexes from the hex string to the Unicode values.
Can you copy the text to the clipboard using Adobe Reader?

Author:  jma [ Thu Dec 29, 2016 3:40 pm ]
Post subject:  Re: Need help with Hexadecimal strings

Yes I can copy the text to the clipboard using Adobe Reader.

I finaly found the mapping table! I just have to use it to convert the hexa form.

Thanks for your answer, it helped me :D

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/