PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Sun Oct 22, 2017 4:40 am

All times are UTC




Post new topic Reply to topic  [ 7 posts ] 
Author Message
PostPosted: Thu Jul 28, 2016 1:46 am 
Offline

Joined: Thu Jul 28, 2016 1:15 am
Posts: 4
Hi All,

I'm getting a PdfSharp.Pdf.IO.PdfReaderException: Token '6' was not expected.
The PDF I'm reading opens up fine in my PDF reader software, so I don't think it's corrupted.

It happens in PdfReader.Open, at this bit (marked *** HERE ***):

Code:
                // Read all indirect objects.
                for (int idx = 0; idx < count; idx++)
                {
                    PdfReference iref = irefs[idx];
                    if (iref.Value == null)
                    {
                        try
                        {
                            Debug.Assert(document._irefTable.Contains(iref.ObjectID));
                            PdfObject pdfObject = parser.ReadObject(null, iref.ObjectID, false, false);  *** HERE ***
                            Debug.Assert(pdfObject.Reference == iref);
                            pdfObject.Reference = iref;
                            Debug.Assert(pdfObject.Reference.Value != null, "Something went wrong.");


It appears to be choking on the '6 0 obj' line below, expecting an 'endobj' before it.
I'm not familiar enough with the PDF spec to know if this is a correctly formed PDF, but the fact that my reader software happily opens it would suggest that this construct should be supported.

Code:
stream
 q 409 0 0 291 0 0 cm /x9 Do Q
endstream
endobj
11 0 obj
   32
endobj
5 0 obj
<< /Type /Font
   /Subtype /Type1
   /Name /f-0-0
   /BaseFont /Helvetica-Bold
   /Encoding /WinAnsiEncoding
>>
6 0 obj
<< /Type /Font
   /Subtype /Type1
   /Name /f-1-0
   /BaseFont /Helvetica
   /Encoding /WinAnsiEncoding
>>
1 0 obj
<< /Type /Pages
   /Kids [ 8 0 R ]
   /Count 1
>>
endobj
12 0 obj
<< /Creator (Win2PDF)
   /Producer (Win2PDF x64 7.6.0 - 2.6.7.1484.3 http://www.win2pdf.com)
   /Title (Unnamed Ironbark Document)
   /Author (Perth.admin)
   /Subject ()
   /Keywords ()
>>
endobj
13 0 obj
<< /Type /Catalog
   /Pages 1 0 R
>>
endobj


thanks for your time.

Ben McIntyre


Top
 Profile  
Reply with quote  
PostPosted: Thu Jul 28, 2016 8:16 am 
Offline
empira Employee
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 2720
Location: Cologne, Germany
Hi!
ben.mcintyre wrote:
I'm not familiar enough with the PDF spec to know if this is a correctly formed PDF, but the fact that my reader software happily opens it would suggest that this construct should be supported.
What's the behavior of Adobe Reader: does it prompt to save the file after viewing it?
Can we get that file for testing?

Several viewers can view several incorrect files ...

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Tue Aug 02, 2016 7:22 am 
Offline

Joined: Thu Jul 28, 2016 1:15 am
Posts: 4
here's the file. BTW, I don't have Adobe Reader installed. But I have to process these files direct from an email stream, so I can't be manually intervening. I may be able to do a text replace on the relevant part of the file though.


Attachments:
File comment: sample file
TestInv.zip [18.9 KiB]
Downloaded 68 times
Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 10, 2016 3:17 am 
Offline

Joined: Thu Jul 28, 2016 1:15 am
Posts: 4
any progress?
sorry to hassle you, but i think this one is really easy to diagnose. it just comes down to whether you want to deal with it, versus it being the PDF generator's problem.


Top
 Profile  
Reply with quote  
PostPosted: Wed Aug 10, 2016 9:15 am 
Offline
empira Employee
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 2720
Location: Cologne, Germany
ben.mcintyre wrote:
sorry to hassle you, but i think this one is really easy to diagnose.
It's not easy to check this with the PDF Reference. I assume that "endobj" is required. However Adobe Reader does not prompt to fix the file, so it is OK for Adobe Reader.
I think it is a problem of the PDF generator.

Changing PDFsharp to work without the "endobj" would require substantial changes. We cannot work on this issue in the near future.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Thu Aug 11, 2016 12:36 am 
Offline

Joined: Thu Jul 28, 2016 1:15 am
Posts: 4
no problem, thanks for your clarification.
I can probably just use text replacement with RegEx to strip the problem part out.
It's also possible that if the sender updates the generator to the latest version it may go away. Unfortunately, I don't have any influence over that.

appreciate your time.


Top
 Profile  
Reply with quote  
PostPosted: Mon Aug 15, 2016 11:26 am 
Offline
empira Employee
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 2720
Location: Cologne, Germany
I asked Stefan about this.

He says the PDF file shown here was definitely non-conforming.
He thinks it is possible to change PDFsharp with reasonable effort to work with the "5 0" and "6 0" objects shown here even if the "endobj" is missing.

At the moment I cannot give an ETA for the next version and I do not know whether or not the next version will include the change.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group