PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Tue Mar 19, 2024 6:44 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 3 posts ] 
Author Message
PostPosted: Thu Aug 04, 2016 3:15 pm 
Offline

Joined: Tue Aug 02, 2016 9:56 am
Posts: 40
Location: Amsterdam, The Netherlands
We have two files that cause PdfSharp to hang while reading it. Both are, unfortunately, confidential client files.

One is actually an uncompressed ZIP file (using the store method) containing two PDFs. Because there is some flexibility in searching for the PDF header and trailer, the ZIP header and trailer are skipped, so the header of the first PDF and the trailer of the second PDF are recognized, and hilarity ensues.

The other is a PDF file with an XRef stream that has a strange xref entry; not sure yet if it is actually corrupt or not, or if there is an additional bug in PdfSharp.

Anyway, in both cases PdfSharp ends up trying to read an object at an incorrect offset in the file, thinks it is a hexadecimal string (which it isn't), and ends up in an infinite loop.

If more detail is needed, I can try to construct a non-confidential PDF exhibiting the problem.

Patch: see attachment.

I also fixed according to the spec (but didn't test) the case when a hexadecimal number is incomplete at the end of a string.


Attachments:
pdfsharp-674.zip [522 Bytes]
Downloaded 479 times

_________________
Gerben Vos
Developer
Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 04, 2016 3:21 pm 
Offline

Joined: Tue Aug 02, 2016 9:56 am
Posts: 40
Location: Amsterdam, The Netherlands
Note that the entire appendix H, "Compatibility and Implementation Notes", which contains the bit about scanning the first and last 1024 bytes of the file for the header and trailer, has disappeared from the ISO standard that functions as the PDF 1.7 spec.

Also, I wonder, if the header is found at an offset, would then all object offsets in the xref table be relative to the start of that header, instead of to the beginning of the file? Has someone tested already how Adobe handles this?

_________________
Gerben Vos
Developer


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 16, 2016 12:49 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3092
Location: Cologne, Germany
Thanks for the patch.


We have seen files that had about 20000 zero bytes after the trailer and the %%EOF mark - and Adobe Reader opens them without complaining.

Too many files do not have the trailer within the last 1024 bytes.

I have no idea about your header offset question.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 3 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 12 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group