PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

"file may be corrupt" while it is not
https://forum.pdfsharp.net/viewtopic.php?f=2&t=3717
Page 1 of 1

Author:  JeIC2 [ Tue Jan 23, 2018 9:29 am ]
Post subject:  "file may be corrupt" while it is not

Hello!

I use PDFSharp 1.50.4740-beta5 to merge 2 pdfs into 1 file. I used 1.32.3057 before on my PDF's, but now I kept getting the famous "Cannot handle iref streams. " on other PDFs. So I solved that by using the 1.50 one. However, now on the PDF I included, and some others, i get the error:

"Unexpected character '0xffff' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp, please send us your PDF file."

However, it is a legit PDF, as I can open it and read it. Also what is interesting to note, is that it does work with 1.30. However, then I can't merge it with the ones that i get the iref streams error, as noted as before. Any way to solve this error? Is this a bug? Help would be appreciated.

Attachments:
PDFthatwontmerge.zip [158.18 KiB]
Downloaded 423 times

Author:  Thomas Hoevel [ Tue Jan 23, 2018 10:21 am ]
Post subject:  Re: "file may be corrupt" while it is not

Hi!
JeIC2 wrote:
However, it is a legit PDF, as I can open it and read it.
The file is corrupt.
At position 164767 a stream begins. The length of the stream is given as 9979 bytes.
The size of the file is just 171690 bytes, so there are at most 6921 bytes content for that stream, not the 9979 given in the header.
I call that "corrupt".

Yes, Adobe Reader can open the file. And when I use "Save as" in Adobe Reader, I get a file that can be opened with PDFsharp.
Once again Adobe Reader does a better job when it comes to dealing with corrupt files. Adobe Reader sets the length of that stream to 390.

There are some pull requests on GitHub that are meant to improve how PDFsharp deals with corrupted files.
We did not evaluate those changes yet, so they are not included in beta5.
Feel free to try them and please let us know if any of those fixes helps with your file.
https://github.com/empira/PDFsharp/pulls

QPDF also identifies the file as corrupt:
Quote:
checking 119406-VKF_926516_20171003_081415.pdf
PDF Version: 1.3
File is not encrypted
File is not linearized
WARNING: 119406-VKF_926516_20171003_081415.pdf (object 11 0, file position 174748): EOF while reading token
WARNING: 119406-VKF_926516_20171003_081415.pdf (object 11 0, file position 164769): attempting to recover stream length
WARNING: 119406-VKF_926516_20171003_081415.pdf (object 11 0, file position 164769): recovered stream length: 1859

It comes up with a different stream length than Adobe Reader.

Author:  duvidas85 [ Fri Feb 02, 2018 1:54 pm ]
Post subject:  Re: "file may be corrupt" while it is not

The changes on this pull request fixed the problem for me:
https://github.com/empira/PDFsharp/pull/39

I'm using this code since 3rd of February and it's behaving very well. I merged more than 500 different pdfs and only saw an error in 1 file (Unexpected token '\xE3' in PDF stream. The file may be corrupted. If you think this is a bug in PDFsharp)

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/