TH-Soft wrote:
ta221 wrote:
As for finding a final solution which fully fixes the issue -
in the code where ScanHexadecimalString method in class Lexer: under the condition:
if(count > 2 && chars[0] == (char)0xFE && chars[1] == (char)0xFF)
the line should be added at the end of the if block:
return this.symbol = Symbol.BigEndianUnicodeHexString;
The string is a "UnicodeHexString" and I'm afraid it might have unwanted side effects to introduce a new, secondary name "BigEndianUnicodeHexString" for "UnicodeHexString".
I can confirm there is a problem with Unicode properties and password protection. It's not the only problem with password protection.
Having two symbol types for one type of string is potentially harmful.
It's important to say that I called the symbol with the name Symbol.BigEndianUnicodeHexString
However, I think it's more right to call this as Symbol.BigEndianUnicodeString - the same as Symbol.UnicodeString is called.
I did some tests yesterday including importing a PDF that was saved by PDFsharp previously.
After the various tests (including opening an encrypted PDF which was encrypted by PDFsharp) - I found no special issues, at least not to the document's properties.
I found that adding 0xFE and 0xFF before each UnicodeString / UnicodeHexString - each string which meets the the condition - chars[0] == (char)0xFE && chars[1] == (char)0xFF) is important for the successful of having unicode characters being displayed correctly upon reading an encrypted PDF file.
Also, I found this didn't harm non-encrypted files.
However, there may be the case that unforeseen issues may occur (I really hope that there are no issues).
Update as of 01/08:After a further investigation, I decided the correct thing is to split cases to Symbol.BigEndianUnicodeHexString and Symbol.BigEndianUnicodeString to ensure a correct result of production.
Furtheremore, the new BigEndianUnicodeEncoding class which is based on RawUnicodeEncoding class should include (2 * count + 2) and (+2) and (-2) in the correct places instead of (+ 8/ - 8).
This is due to the fact that the only prefix "0xFE+0xFF" should be included and there are no more additional bytes beyond these.
Thank you for listening.