PDFsharp & MigraDoc Foundation

End of inline image not detected properly + patch
Page 1 of 1

Author:  Gerben Vos [ Thu Sep 08, 2016 7:32 pm ]
Post subject:  End of inline image not detected properly + patch

PdfSharp does not do a proper scan for the end of an inline image in a content stream, but instead scans for an 'E' character followed by an 'I' character. (EI is the end-of-image token). Actually, the PDF spec is underspecified in this case, as it does not provide well-defined criteria for when the inline image data ends, so all implementations need a bit of guesswork.

Depending on the inline image data, this can result in one of these error messages:
- Unexpected character '0x00a5' in content stream. ('0x00a5' may be a different value) ( http://www.stillhq.com/pdfdb/000351/data.pdf )
- The given key was not present in the dictionary. (found in confidential client PDFs only, sorry)

Resolved by scanning for <whitespace>EI<whitespace> instead of just EI. This works well for all encountered PDFs. Also, I had a peek at the iText implementation, and they do the same (except that in cases where they can determine the exact length of the inline image data from its metadata, they use that instead; but that is very uncommon, as most inline images, including all which I encountered, are compressed).

Also added extra checks for EOF (untested).

Patch attached.

pdfsharp-688.zip [593 Bytes]
Downloaded 168 times

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group