PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

PdfSharp.Pdf.IO.PdfReaderException
https://forum.pdfsharp.net/viewtopic.php?f=3&t=3401
Page 1 of 1

Author:  ben.mcintyre [ Thu Jul 28, 2016 1:46 am ]
Post subject:  PdfSharp.Pdf.IO.PdfReaderException

Hi All,

I'm getting a PdfSharp.Pdf.IO.PdfReaderException: Token '6' was not expected.
The PDF I'm reading opens up fine in my PDF reader software, so I don't think it's corrupted.

It happens in PdfReader.Open, at this bit (marked *** HERE ***):

Code:
                // Read all indirect objects.
                for (int idx = 0; idx < count; idx++)
                {
                    PdfReference iref = irefs[idx];
                    if (iref.Value == null)
                    {
                        try
                        {
                            Debug.Assert(document._irefTable.Contains(iref.ObjectID));
                            PdfObject pdfObject = parser.ReadObject(null, iref.ObjectID, false, false);  *** HERE ***
                            Debug.Assert(pdfObject.Reference == iref);
                            pdfObject.Reference = iref;
                            Debug.Assert(pdfObject.Reference.Value != null, "Something went wrong.");


It appears to be choking on the '6 0 obj' line below, expecting an 'endobj' before it.
I'm not familiar enough with the PDF spec to know if this is a correctly formed PDF, but the fact that my reader software happily opens it would suggest that this construct should be supported.

Code:
stream
 q 409 0 0 291 0 0 cm /x9 Do Q
endstream
endobj
11 0 obj
   32
endobj
5 0 obj
<< /Type /Font
   /Subtype /Type1
   /Name /f-0-0
   /BaseFont /Helvetica-Bold
   /Encoding /WinAnsiEncoding
>>
6 0 obj
<< /Type /Font
   /Subtype /Type1
   /Name /f-1-0
   /BaseFont /Helvetica
   /Encoding /WinAnsiEncoding
>>
1 0 obj
<< /Type /Pages
   /Kids [ 8 0 R ]
   /Count 1
>>
endobj
12 0 obj
<< /Creator (Win2PDF)
   /Producer (Win2PDF x64 7.6.0 - 2.6.7.1484.3 http://www.win2pdf.com)
   /Title (Unnamed Ironbark Document)
   /Author (Perth.admin)
   /Subject ()
   /Keywords ()
>>
endobj
13 0 obj
<< /Type /Catalog
   /Pages 1 0 R
>>
endobj


thanks for your time.

Ben McIntyre

Author:  Thomas Hoevel [ Thu Jul 28, 2016 8:16 am ]
Post subject:  Re: PdfSharp.Pdf.IO.PdfReaderException

Hi!
ben.mcintyre wrote:
I'm not familiar enough with the PDF spec to know if this is a correctly formed PDF, but the fact that my reader software happily opens it would suggest that this construct should be supported.
What's the behavior of Adobe Reader: does it prompt to save the file after viewing it?
Can we get that file for testing?

Several viewers can view several incorrect files ...

Author:  ben.mcintyre [ Tue Aug 02, 2016 7:22 am ]
Post subject:  Re: PdfSharp.Pdf.IO.PdfReaderException

here's the file. BTW, I don't have Adobe Reader installed. But I have to process these files direct from an email stream, so I can't be manually intervening. I may be able to do a text replace on the relevant part of the file though.

Attachments:
File comment: sample file
TestInv.zip [18.9 KiB]
Downloaded 461 times

Author:  ben.mcintyre [ Wed Aug 10, 2016 3:17 am ]
Post subject:  Re: PdfSharp.Pdf.IO.PdfReaderException

any progress?
sorry to hassle you, but i think this one is really easy to diagnose. it just comes down to whether you want to deal with it, versus it being the PDF generator's problem.

Author:  Thomas Hoevel [ Wed Aug 10, 2016 9:15 am ]
Post subject:  Re: PdfSharp.Pdf.IO.PdfReaderException

ben.mcintyre wrote:
sorry to hassle you, but i think this one is really easy to diagnose.
It's not easy to check this with the PDF Reference. I assume that "endobj" is required. However Adobe Reader does not prompt to fix the file, so it is OK for Adobe Reader.
I think it is a problem of the PDF generator.

Changing PDFsharp to work without the "endobj" would require substantial changes. We cannot work on this issue in the near future.

Author:  ben.mcintyre [ Thu Aug 11, 2016 12:36 am ]
Post subject:  Re: PdfSharp.Pdf.IO.PdfReaderException

no problem, thanks for your clarification.
I can probably just use text replacement with RegEx to strip the problem part out.
It's also possible that if the sender updates the generator to the latest version it may go away. Unfortunately, I don't have any influence over that.

appreciate your time.

Author:  Thomas Hoevel [ Mon Aug 15, 2016 11:26 am ]
Post subject:  Re: PdfSharp.Pdf.IO.PdfReaderException

I asked Stefan about this.

He says the PDF file shown here was definitely non-conforming.
He thinks it is possible to change PDFsharp with reasonable effort to work with the "5 0" and "6 0" objects shown here even if the "endobj" is missing.

At the moment I cannot give an ETA for the next version and I do not know whether or not the next version will include the change.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/