PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

Issues with unicode characters in the 0x1Fxxx range.
https://forum.pdfsharp.net/viewtopic.php?f=2&t=4230
Page 1 of 1

Author:  ejmw [ Wed Feb 03, 2021 4:07 pm ]
Post subject:  Issues with unicode characters in the 0x1Fxxx range.

Hello,

I'm hoping that someone can help me with an issue I'm having drawing text including unicode characters with values over 0x10000. I am using a font that includes these characters, but they are rendering with two question mark boxes, which leads me to believe that PdfSharp may be treating these values as two separate characters. Is there some other configuration option I am missing to get these characters to work correctly?

I've included some code below that I've written in a standalone C# console app that shows the problem. I am using version 1.50.5147 of the PdfSharp NuGet package.

Code:
   
class Program
    {
        static void Main(string[] args)
        {
            GlobalFontSettings.FontResolver = new FontResolver();

            var pdf = new PdfDocument();
            var page = pdf.AddPage();
            page.Orientation = PdfSharp.PageOrientation.Portrait;
            page.Size = PdfSharp.PageSize.Letter;

            using (var pdfDraw = XGraphics.FromPdfPage(page))
            {
                // I tried using this in the XFont c-tor and it didn't make any difference
                // var pdfFontOptions = new XPdfFontOptions(PdfFontEncoding.Unicode);

                var font = new XFont("Symbola", 12, XFontStyle.Regular);

                // Rendering this character does not work - I see two [?][?] boxes in the PDF.
                pdfDraw.DrawString(System.Net.WebUtility.HtmlDecode(" 🖳 "), font, XBrushes.Black, new XPoint(100, 200), StringFormats.BaseLineLeft);
                pdfDraw.DrawString("\U0001F5B3", font, XBrushes.Black, new XPoint(100, 300), XStringFormats.BaseLineLeft);

                // Rendering this character (< 0x10000) works as expected.
                pdfDraw.DrawString(System.Net.WebUtility.HtmlDecode(" &#9658;"), font, XBrushes.Black, new XPoint(200, 200), XStringFormats.BaseLineLeft);
                pdfDraw.DrawString("\U000025BA", font, XBrushes.Black, new XPoint(200, 300), XStringFormats.BaseLineLeft);

                pdf.Save("C:\\temp\\result.pdf");
            }
        }

        public class FontResolver : IFontResolver
        {
            public byte[] GetFont(string faceName)
            {
                if(faceName == "Symbola")
                {
                     return System.IO.File.ReadAllBytes("C:\\temp\\Symbola.ttf");
                }
                return null;
            }

            public FontResolverInfo ResolveTypeface(string familyName, bool isBold, bool isItalic)
            {
                if(familyName == "Symbola")
                {
                    return new FontResolverInfo("Symbola");
                }

                return PlatformFontResolver.ResolveTypeface(familyName, isBold, isItalic);
            }
        }
    }

Author:  TH-Soft [ Thu Feb 04, 2021 11:47 am ]
Post subject:  Re: Issues with unicode characters in the 0x1Fxxx range.

ejmw wrote:
I am using a font that includes these characters, but they are rendering with two question mark boxes, which leads me to believe that PdfSharp may be treating these values as two separate characters.
PDFsharp does not support surrogate pairs yet.
IIRC there is a pull request on GitHub that adds support for surrogate pairs. So maybe get the PDFsharp source code and apply the pull request.

Author:  ejmw [ Thu Feb 04, 2021 3:11 pm ]
Post subject:  Re: Issues with unicode characters in the 0x1Fxxx range.

OK - thanks for the info, I found that pull request and will give it a shot.

Thank you for the quick reply!

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/