PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Error opening document in Lexer:164
https://forum.pdfsharp.net/viewtopic.php?f=3&t=1424
Page 1 of 1

Author:  nc_kkj [ Wed Nov 17, 2010 10:10 am ]
Post subject:  Error opening document in Lexer:164

I have a pdf document that cannot be opened using the following code:

Code:
             
            var byteArray = File.ReadAllBytes("test.pdf");
            using (var pdfOut = new PdfDocument())
            {
                using (var msInput = new MemoryStream(byteArray))
                {
                    var inputDocument = PdfReader.Open(msInput, PdfDocumentOpenMode.Import);
                    foreach (PdfPage page in inputDocument.Pages)
                    {
                        pdfOut.AddPage(page);
                    }
                }
                using (var msOutput = new FileStream("out.pdf", FileMode.Create, FileAccess.Write))
                {
                    pdfOut.Save(msOutput);
                }
            }


I use the latest code of PdfSharp. When I debug using the source code, I can see the error occurs in Lexer.cs line 164, because it does not recognize a soft hyphen (char 173).

I can send you the pdf by e-mail for debugging purposes, but I'm not allowed to publish on the web.

Can you look at this error, please. It's very much appreciated :)

Author:  Thomas Hoevel [ Wed Nov 17, 2010 2:57 pm ]
Post subject:  Re: Error opening document in Lexer:164

I forwarded your file to our Lexer expert.

I'll post any feedback I get ASAP.
I'll be back in office on Monday, so maybe there won't be feedback this week.

Author:  nc_kkj [ Wed Nov 17, 2010 2:59 pm ]
Post subject:  Re: Error opening document in Lexer:164

Thank you for your reply. I await your feedback.

Author:  Thomas Hoevel [ Mon Dec 06, 2010 12:44 pm ]
Post subject:  Re: Error opening document in Lexer:164

Hi!

I got an answer from my boss.

Stefan Lange wrote:
Some streams have a superfluous byte at their end. For example the stream of object 10 ends with:

…É&ã0endstream endobj

There is an unexpected Zero ("0") before "endstream". I let the parser skip illegal bytes at the end of streams.

Then I ran into the next exception. The creation date was 2010/17/11:

/CreationDate (D:20101711102309)

There is no 17th month, so DateTime throws an exception. I bound month to the range of 1 .. 12.

After that hurdle SharpZipLib throws an exception because it cannot deflate a zipped stream.

I gave up. The tool you use to create the PDF files seems to be to very inaccurate. PDFsharp can only read well-defined PDF files. This feature cannot be changed without rewriting the parsers. So you cannot use PDFsharp to process files like this.


I'm sorry for the bad news.
While Adobe Reader 4.0 displays a creation date for this file, Adobe Acrobat 8.0 does not (because the month is out of range, I presume).

BTW: is it correct that the sixth page of the document is empty?

Author:  nc_kkj [ Mon Dec 06, 2010 1:01 pm ]
Post subject:  Re: Error opening document in Lexer:164

Hi Thomas + boss

Thanks for putting a great effort into this.

I have never had this problem before and I have used the tool for quite some while for different purposes.

So I was expecting that the reason could be something like this. I just needed better information why this particular pdf failed.

The pdf is not created by me, so now I will give the creator this information, so they can fix the problem (probably by updating their tool).

Thanks again for all the help - it helped me alot :)

Author:  SpaceKat (=^.^=) [ Tue Mar 22, 2011 11:28 pm ]
Post subject:  Re: Error opening document in Lexer:164

Hi There,

I have struck a similar issue where I am getting the following message :

Illegal character.
Parameter name: data

The error is raised on line 3 below.

My code block looks like this -

PdfDocument input = PdfReader.Open(pdf_in, PdfDocumentOpenMode.ReadOnly);
PdfPage p = input.Pages[0];
PdfDictionary.PdfStream stream = p.Contents.Elements.GetDictionary(0).Stream;
result = new PDFTextExtractor().ExtractTextFromPDFBytes(stream.Value);
p.Close();
input.Dispose();

I am creating a process which reads client statements (which are all pdf files) which parses the file, reads in the statement date, account number and other particulars etc.. and then streams these files to a database where the client can request to download their statement via the website. I have no access to the settlement systems which create the client statements (these are produced for us by an external company).

For Foreign Exchange and CFD statements the process works flawlessly however for many (not all) Futures statements I recieve the above message.

Any ideas how I might be able to resolve - or escape the offending character?


Many Thanks,

Jason

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/