PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Bug: LZW decoder fails to emit some bytes
https://forum.pdfsharp.net/viewtopic.php?f=3&t=3410
Page 1 of 1

Author:  Gerben Vos [ Thu Aug 04, 2016 2:33 pm ]
Post subject:  Bug: LZW decoder fails to emit some bytes

What happens:

http://www.stillhq.com/pdfdb/000590/data.pdf causes PDFsharp to emit the warning "Invalid number of operands".

(Note: this pdf later causes PDFsharp to run out of memory. See http://forum.pdfsharp.net/viewtopic.php?f=3&t=3411 .)

Cause:

This pdf contains an LZW-encoded contents stream. When decoding this stream, PDFsharp fails to emit some bytes.

Here's a diff between the data as decoded by QPDF and PDFsharp (the 0A bytes at the beginning and end may or may not be significant):

— 000590-obj3-qpdf.txt 2016-02-16 11:39:45.635258700 +0100
+++ 000590-obj3-pdfsharp.txt 2016-02-16 11:40:00.814040300 +0100
@@ -1,3 +1,4 @@
+0A
71
0D
30
@@ -142,7 +143,6 @@
2E
33
33
-33
20
54
63
@@ -10470,7 +10470,6 @@
2E
31
31
-31
20
39
37
@@ -10710,7 +10709,6 @@
2E
39
39
-39
20
37
33
@@ -25930,7 +25928,6 @@
30
20
30
-20
30
20
6B
@@ -30328,4 +30325,3 @@
0D
66
0D
-0A

Note:

It would also be nice if the LZW decoder supported /EarlyChange .

Author:  Gerben Vos [ Mon Aug 15, 2016 12:53 pm ]
Post subject:  Re: Bug: LZW decoder fails to emit some bytes

I fixed this locally by replacing the LZW decoder with the LZW decoder from libtiff, translating it from C to C#.

Since libtiff uses the BSD license and PdfSharp the MIT license, I am not sure if this can be included as-is into PdfSharp. Also, I want to investigate this a bit further, to see if I can figure out the difference between the two implementations, to see if I can figure out how to fix the PdfSharp implementation directly.

Also, the code could probably be C#-ified a bit further.

Code available on request.

Author:  Gerben Vos [ Thu Aug 09, 2018 5:29 pm ]
Post subject:  Re: Bug: LZW decoder fails to emit some bytes

Attached my code. The "2" subdirectory contains the newer, more C#-ified code. But I may or may not have removed a few lines from that which seemed not functional but later I thought they might still be important, and therefore I've included an older version in the "1" subdirectory, which is a more direct translation of the code from libtiff. Both work identically for all PDFs in the test set I used.

Both versions are BSD-licensed like the code it's derived from (the older version is missing the copyright note though); feel free to distribute it further.

This code compiled with the 1.50-beta3b version of PdfSharp. I haven't tried it with newer versions.

Attachments:
Pdf.Filters.zip [6.7 KiB]
Downloaded 712 times

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/