PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

PdfSharp.Pdf.IO.Lexer.ScanNextToken error "not implemented"
https://forum.pdfsharp.net/viewtopic.php?f=2&t=1639
Page 1 of 1

Author:  DotNetSchnauz [ Wed Apr 27, 2011 12:35 pm ]
Post subject:  PdfSharp.Pdf.IO.Lexer.ScanNextToken error "not implemented"

When I attempt to open a PDF that was generated using MS ReportViewer I invariably get an assertion in Lexer.cs on line 163 (this is version 1.31) in ScanNextToken saying "Not Implemented".

The character code is 18.

This happens both in my own code, and in the excellent PdfMerge applicaiton written by Charles Van Lingen so I am reasonably sure it is not something I am doing wrong.

The PDF in question displays just fine, and when I run it through a number of different PDF viewers (Adobe, foxit, etc) they all seem to display it without difficulty.

I tried to attach the offending PDF to this post but the forum does not allow it.

Anyone else run into this problem?

Thanks very much... RKM

Author:  DotNetSchnauz [ Wed Apr 27, 2011 7:24 pm ]
Post subject:  Re: PdfSharp.Pdf.IO.Lexer.ScanNextToken error "not implemented"

So I found one problem at least. In the MS Report generated PDF files the "stream" token has an extra blank at the end of it.

I modified ScanKeyword so that it handles this case as follows:
Code:
      // Check known tokens
      switch (this.token.ToString().Trim())
      {
        case "obj":
          return this.symbol = Symbol.Obj;
..........


And that made it fail in a different spot. Unfortunately I am still stuck.

Anybody know how I can figure out what is wrong with this PDF (From PDFSharp's perspective)?

Any help gratefully received. Thanks... RKM

Author:  DotNetSchnauz [ Wed Apr 27, 2011 7:59 pm ]
Post subject:  Re: PdfSharp.Pdf.IO.Lexer.ScanNextToken error "not implemented"

So.... I found a way to make this work (for my case anyway) with a few changes to Lexer.cs

I change ScanNextToken line 158 to use a more restrictive custom IsLetter(ch) function instead of Char.IsLetter(ch). Using the debugger I observed some really strange values (in the high 200's for example) being returned as true by Char.IsLetter() and I theorized that maybe these should be treated as not letters.

Code:
      if (IsLetter(ch))
        return this.symbol = ScanKeyword();


which is implemented as follows

Code:
    private bool IsLetter(char c)
    {
        if (c == ' ') return true;
        if (c >= 'a' && c <= 'z') return true;
        if (c >= 'A' && c <= 'Z') return true;
        return false;
    }


Then I changed ScanKeyword() in two spots....

Line 307 I replaced Char.IsLetter(ch) with the same custom IsLetter(ch) method.

Code:
      while (true)
      {
        if (IsLetter(ch))
        {
          this.token.Append(ch);


and line 317 I changed the switch statement to handle errant whitespace characters

Code:
      // Check known tokens
      switch (this.token.ToString().Trim())
      {



Like I say. This worked for me, but there is a very real chance (almost a certainty) that I screwed something up by doing this. I would love feedback from someone who is a little more conversant in the PDF format than I am to tell me what I may have messed up with these changes.

Thanks...

RKM

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/