PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 9:15 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Thu Aug 04, 2016 2:33 pm 
Offline

Joined: Tue Aug 02, 2016 9:56 am
Posts: 40
Location: Amsterdam, The Netherlands
What happens:

http://www.stillhq.com/pdfdb/000590/data.pdf causes PDFsharp to emit the warning "Invalid number of operands".

(Note: this pdf later causes PDFsharp to run out of memory. See http://forum.pdfsharp.net/viewtopic.php?f=3&t=3411 .)

Cause:

This pdf contains an LZW-encoded contents stream. When decoding this stream, PDFsharp fails to emit some bytes.

Here's a diff between the data as decoded by QPDF and PDFsharp (the 0A bytes at the beginning and end may or may not be significant):

— 000590-obj3-qpdf.txt 2016-02-16 11:39:45.635258700 +0100
+++ 000590-obj3-pdfsharp.txt 2016-02-16 11:40:00.814040300 +0100
@@ -1,3 +1,4 @@
+0A
71
0D
30
@@ -142,7 +143,6 @@
2E
33
33
-33
20
54
63
@@ -10470,7 +10470,6 @@
2E
31
31
-31
20
39
37
@@ -10710,7 +10709,6 @@
2E
39
39
-39
20
37
33
@@ -25930,7 +25928,6 @@
30
20
30
-20
30
20
6B
@@ -30328,4 +30325,3 @@
0D
66
0D
-0A

Note:

It would also be nice if the LZW decoder supported /EarlyChange .

_________________
Gerben Vos
Developer


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 12:53 pm 
Offline

Joined: Tue Aug 02, 2016 9:56 am
Posts: 40
Location: Amsterdam, The Netherlands
I fixed this locally by replacing the LZW decoder with the LZW decoder from libtiff, translating it from C to C#.

Since libtiff uses the BSD license and PdfSharp the MIT license, I am not sure if this can be included as-is into PdfSharp. Also, I want to investigate this a bit further, to see if I can figure out the difference between the two implementations, to see if I can figure out how to fix the PdfSharp implementation directly.

Also, the code could probably be C#-ified a bit further.

Code available on request.

_________________
Gerben Vos
Developer


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 09, 2018 5:29 pm 
Offline

Joined: Tue Aug 02, 2016 9:56 am
Posts: 40
Location: Amsterdam, The Netherlands
Attached my code. The "2" subdirectory contains the newer, more C#-ified code. But I may or may not have removed a few lines from that which seemed not functional but later I thought they might still be important, and therefore I've included an older version in the "1" subdirectory, which is a more direct translation of the code from libtiff. Both work identically for all PDFs in the test set I used.

Both versions are BSD-licensed like the code it's derived from (the older version is missing the copyright note though); feel free to distribute it further.

This code compiled with the 1.50-beta3b version of PdfSharp. I haven't tried it with newer versions.


Attachments:
Pdf.Filters.zip [6.7 KiB]
Downloaded 712 times

_________________
Gerben Vos
Developer
Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 39 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group