PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 3:53 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Wed Nov 17, 2010 10:10 am 
Offline

Joined: Wed Nov 17, 2010 10:00 am
Posts: 3
I have a pdf document that cannot be opened using the following code:

Code:
             
            var byteArray = File.ReadAllBytes("test.pdf");
            using (var pdfOut = new PdfDocument())
            {
                using (var msInput = new MemoryStream(byteArray))
                {
                    var inputDocument = PdfReader.Open(msInput, PdfDocumentOpenMode.Import);
                    foreach (PdfPage page in inputDocument.Pages)
                    {
                        pdfOut.AddPage(page);
                    }
                }
                using (var msOutput = new FileStream("out.pdf", FileMode.Create, FileAccess.Write))
                {
                    pdfOut.Save(msOutput);
                }
            }


I use the latest code of PdfSharp. When I debug using the source code, I can see the error occurs in Lexer.cs line 164, because it does not recognize a soft hyphen (char 173).

I can send you the pdf by e-mail for debugging purposes, but I'm not allowed to publish on the web.

Can you look at this error, please. It's very much appreciated :)


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 17, 2010 2:57 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
I forwarded your file to our Lexer expert.

I'll post any feedback I get ASAP.
I'll be back in office on Monday, so maybe there won't be feedback this week.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 17, 2010 2:59 pm 
Offline

Joined: Wed Nov 17, 2010 10:00 am
Posts: 3
Thank you for your reply. I await your feedback.


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 06, 2010 12:44 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
Hi!

I got an answer from my boss.

Stefan Lange wrote:
Some streams have a superfluous byte at their end. For example the stream of object 10 ends with:

…É&ã0endstream endobj

There is an unexpected Zero ("0") before "endstream". I let the parser skip illegal bytes at the end of streams.

Then I ran into the next exception. The creation date was 2010/17/11:

/CreationDate (D:20101711102309)

There is no 17th month, so DateTime throws an exception. I bound month to the range of 1 .. 12.

After that hurdle SharpZipLib throws an exception because it cannot deflate a zipped stream.

I gave up. The tool you use to create the PDF files seems to be to very inaccurate. PDFsharp can only read well-defined PDF files. This feature cannot be changed without rewriting the parsers. So you cannot use PDFsharp to process files like this.


I'm sorry for the bad news.
While Adobe Reader 4.0 displays a creation date for this file, Adobe Acrobat 8.0 does not (because the month is out of range, I presume).

BTW: is it correct that the sixth page of the document is empty?

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Mon Dec 06, 2010 1:01 pm 
Offline

Joined: Wed Nov 17, 2010 10:00 am
Posts: 3
Hi Thomas + boss

Thanks for putting a great effort into this.

I have never had this problem before and I have used the tool for quite some while for different purposes.

So I was expecting that the reason could be something like this. I just needed better information why this particular pdf failed.

The pdf is not created by me, so now I will give the creator this information, so they can fix the problem (probably by updating their tool).

Thanks again for all the help - it helped me alot :)


Top
 Profile  
Reply with quote  
PostPosted: Tue Mar 22, 2011 11:28 pm 
Offline

Joined: Tue Mar 22, 2011 11:20 pm
Posts: 1
Hi There,

I have struck a similar issue where I am getting the following message :

Illegal character.
Parameter name: data

The error is raised on line 3 below.

My code block looks like this -

PdfDocument input = PdfReader.Open(pdf_in, PdfDocumentOpenMode.ReadOnly);
PdfPage p = input.Pages[0];
PdfDictionary.PdfStream stream = p.Contents.Elements.GetDictionary(0).Stream;
result = new PDFTextExtractor().ExtractTextFromPDFBytes(stream.Value);
p.Close();
input.Dispose();

I am creating a process which reads client statements (which are all pdf files) which parses the file, reads in the statement date, account number and other particulars etc.. and then streams these files to a database where the client can request to download their statement via the website. I have no access to the settlement systems which create the client statements (these are produced for us by an external company).

For Foreign Exchange and CFD statements the process works flawlessly however for many (not all) Futures statements I recieve the above message.

Any ideas how I might be able to resolve - or escape the offending character?


Many Thanks,

Jason


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 47 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group