PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Failed to import on repeating endstream
https://forum.pdfsharp.net/viewtopic.php?f=3&t=3253
Page 1 of 1

Author:  MJLaukala [ Tue Dec 22, 2015 8:42 pm ]
Post subject:  Failed to import on repeating endstream

After testing, I found a bug that came in on beta 3 (1.50.3987). If a document contains a repeating 'endstream' symbol (i.e. endstream endstream endobj, the document will fail to import and throw an exception. In the GitHub source, the relevant code is located in https://github.com/empira/PDFsharp/blob/master/src/PdfSharp/Pdf.IO/Parser.cs at lines 293 and 296. Version 1.50.3915 does handle this occurrence but as this source is not availiable, I can't specifically point out what changed or why. So in the mean time, I will convert back to the working version until I can properly implement the github source into my code base or an a fix is made.

If I were to make a fix it would be as follows:
Code:
while ((symbol = ScanNextToken()) == Symbol.EndStream);

instead of:
Code:
ReadSymbol(Symbol.EndStream);
symbol = ScanNextToken();

at lines 292-293

Author:  MJLaukala [ Tue Dec 22, 2015 10:55 pm ]
Post subject:  Re: Failed to import on repeating endstream

A better solution would be this:
Code:
ReadSymbol(Symbol.EndStream);
while ((symbol = ScanNextToken()) == Symbol.EndStream);

because we would probably want an exception thrown if 'endstream' isn't at least where it is expected.

Author:  () => true [ Thu Dec 24, 2015 9:20 pm ]
Post subject:  Re: Failed to import on repeating endstream

Hi!
MJLaukala wrote:
If a document contains a repeating 'endstream' symbol (i.e. endstream endstream endobj, the document will fail to import and throw an exception.
Is this a valid PDF file? If not, then an exception would be the correct behaviour - at least for a strict mode. An error tolerating mode is a different story.

It would be good to have the PDF file to investigate the problem and verify possible fixes.

Author:  MJLaukala [ Mon Dec 28, 2015 6:07 pm ]
Post subject:  Re: Failed to import on repeating endstream

() => true wrote:
Hi!
MJLaukala wrote:
If a document contains a repeating 'endstream' symbol (i.e. endstream endstream endobj, the document will fail to import and throw an exception.
Is this a valid PDF file? If not, then an exception would be the correct behaviour - at least for a strict mode. An error tolerating mode is a different story.

It would be good to have the PDF file to investigate the problem and verify possible fixes.


As far as validity is concerned, It opens up fine in adobe. I don't currently have a pdf that I can share. I'll try to get one but it might take time. I did some further testing and if I remove the extra 'endstream' it still opens just fine in adobe but when I attempt to close it, adobe asks to save the file, assuming to correct for the missing 'endstream' that it expects. I did count the number of 'stream' and 'endstream' symbols in the original file and it does have 1 extra 'endstream'. However after I remove it and allowed adobe to save the file, the 'stream' and 'endstream' count is the same and the stream data looks to be adjusted. So with that knowledge, I'm guessing there is a particular valid case that requires 'endstreamendstream'. I'll see if the latest version with source before 1.50 has this same issue and if not, I'll compare the sources to find out what has changed.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/