PDFsharp & MigraDoc Foundation https://forum.pdfsharp.net/ |
|
Bug: LZW decoder fails to emit some bytes https://forum.pdfsharp.net/viewtopic.php?f=3&t=3410 |
Page 1 of 1 |
Author: | Gerben Vos [ Thu Aug 04, 2016 2:33 pm ] |
Post subject: | Bug: LZW decoder fails to emit some bytes |
What happens: http://www.stillhq.com/pdfdb/000590/data.pdf causes PDFsharp to emit the warning "Invalid number of operands". (Note: this pdf later causes PDFsharp to run out of memory. See http://forum.pdfsharp.net/viewtopic.php?f=3&t=3411 .) Cause: This pdf contains an LZW-encoded contents stream. When decoding this stream, PDFsharp fails to emit some bytes. Here's a diff between the data as decoded by QPDF and PDFsharp (the 0A bytes at the beginning and end may or may not be significant): — 000590-obj3-qpdf.txt 2016-02-16 11:39:45.635258700 +0100 +++ 000590-obj3-pdfsharp.txt 2016-02-16 11:40:00.814040300 +0100 @@ -1,3 +1,4 @@ +0A 71 0D 30 @@ -142,7 +143,6 @@ 2E 33 33 -33 20 54 63 @@ -10470,7 +10470,6 @@ 2E 31 31 -31 20 39 37 @@ -10710,7 +10709,6 @@ 2E 39 39 -39 20 37 33 @@ -25930,7 +25928,6 @@ 30 20 30 -20 30 20 6B @@ -30328,4 +30325,3 @@ 0D 66 0D -0A Note: It would also be nice if the LZW decoder supported /EarlyChange . |
Author: | Gerben Vos [ Mon Aug 15, 2016 12:53 pm ] |
Post subject: | Re: Bug: LZW decoder fails to emit some bytes |
I fixed this locally by replacing the LZW decoder with the LZW decoder from libtiff, translating it from C to C#. Since libtiff uses the BSD license and PdfSharp the MIT license, I am not sure if this can be included as-is into PdfSharp. Also, I want to investigate this a bit further, to see if I can figure out the difference between the two implementations, to see if I can figure out how to fix the PdfSharp implementation directly. Also, the code could probably be C#-ified a bit further. Code available on request. |
Author: | Gerben Vos [ Thu Aug 09, 2018 5:29 pm ] | ||
Post subject: | Re: Bug: LZW decoder fails to emit some bytes | ||
Attached my code. The "2" subdirectory contains the newer, more C#-ified code. But I may or may not have removed a few lines from that which seemed not functional but later I thought they might still be important, and therefore I've included an older version in the "1" subdirectory, which is a more direct translation of the code from libtiff. Both work identically for all PDFs in the test set I used. Both versions are BSD-licensed like the code it's derived from (the older version is missing the copyright note though); feel free to distribute it further. This code compiled with the 1.50-beta3b version of PdfSharp. I haven't tried it with newer versions.
|
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group https://www.phpbb.com/ |