I've been working on a text extractor for pdfs using the PDFsharp library - first and foremost I'd like to thank everyone who has worked on this library. It's been a ton of help and I would have given up this project a long time ago without it.
Things are coming quite well, and for the most part I've finished this task. However, any content that use fonts that require a CMap don't extract correctly (understandably, as their bytes are mapped to unicode values). Are there any PDFsharp classes that can help out with this? I can always go into the ToUnicode stream and parse it out myself, but I don't believe in reinventing the wheel so I figured that I'd ask. I've noticed PdfSharp.Fonts.CMapInfo but am unsure of it's usage.
|