PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

PDF pages are blank after splitting with PDFSharp
https://forum.pdfsharp.net/viewtopic.php?f=2&t=3288
Page 1 of 1

Author:  AgileDotnetter [ Fri Feb 05, 2016 3:26 pm ]
Post subject:  PDF pages are blank after splitting with PDFSharp

I haven't seen this particular situation mentioned anywhere on the boards.

I have a project where a vendor is sending us a single PDF containing a number of individual documents (1,800 individual documents, to be exact). Using PDFSharp, we created a routine to split these individual documents into their own PDF. This process splits the documents out correctly.

However, we are unable to open the resulting split PDFs with Adobe Acrobat or Adobe Reader. It opens fine in FoxIt, but we aren't going to be able to go to production without it opening via Adobe. We get the following error message on each page:

Dictionary keys must be direct name objects.

All pages in the document are blank, although their snapshots have the correct page dimensions/orientation.

I'll include the code below. I'd appreciate it if someone could tell me if I'm doing something wrong here, but if this is simply a limitation of PDFSharp I need to know so I can find a substitute tool.

Thanks for the assistance!

Nate Southerland

Code follows:
Code:
   //Clipped
            using (var combinedPdf = LoadCombinedPdf())
            {
                var firstPageIndices = GetEobFirstPageIndicesFromCombinedPdf(combinedPdf).ToArray();
                var eobPackets = SplitIntoEobPackets(combinedPdf, firstPageIndices);
                SaveEobPackets(eobPackets);
            }
   //Clipped


        private PdfDocument LoadCombinedPdf()
        {
            if (Directory.Exists(sourceDirectory))
            {
                var pdfsInSourceDirectory = Directory.GetFiles(sourceDirectory, "*.pdf");
                return PdfReader.Open(pdfsInSourceDirectory.First(), PdfDocumentOpenMode.Import);
            }
            return null;
        }

        private IEnumerable<int> GetEobFirstPageIndicesFromCombinedPdf(PdfDocument combinedPdf)
        {
            for (int i = 0; i < combinedPdf.PageCount; i++)
            {
                var text = combinedPdf.Pages[i].ExtractText();
                if (text.Contains(newPageToken)) 
                    yield return i;
            }
        }

        private IEnumerable<EobPdf> SplitIntoEobPackets(PdfDocument combinedPdf, int[] firstPageIndices)
        {
            var packets = new List<EobPdf>();
            foreach (var index in firstPageIndices)
            {
                var eobPdf = ConfigureNewEobPdf(combinedPdf.Pages[index]);
                var newDoc = new PdfDocument();
                newDoc.Version = combinedPdf.Version;
                newDoc.Info.Creator = "Creator";
                var lastPageIndex = combinedPdf.Pages.Count - 1;
                var goToNextPage = true;
                for (int i = index; goToNextPage; i++)
                {
                    newDoc.AddPage(combinedPdf.Pages[i]);
                    var nextIndex = i + 1;
                    goToNextPage = nextIndex <= lastPageIndex && !firstPageIndices.Contains(nextIndex);
                }
                eobPdf.Content = newDoc;
                packets.Add(eobPdf);
            }
            return packets;
        }

        private EobPdf ConfigureNewEobPdf(PdfPage eobFrontPage)
        {
            var rawMemberId = GetMemberId(eobFrontPage);
            var memberId = string.IsNullOrWhiteSpace(rawMemberId) ? "ERR" : rawMemberId;
            var letterDate = GetLetterDate(eobFrontPage);
            return new EobPdf()
            {
                MemberID = memberId,
                EobDate = letterDate,
                FullFileName = string.Format("{0}\\{1:yyyyMMdd}-{2} FileName.pdf", targetDirectory, letterDate, memberId)              };
        }

        private static string GetMemberId(PdfPage eobFrontPage)
        {
            var splitMemberIdLine = eobFrontPage.ExtractText().Where(p => p.IndexOf("Member ID:") > -1).First().Split(':');
            if (splitMemberIdLine.Count() > 1)
                return splitMemberIdLine[1].Trim();
            return string.Empty;
        }

        private DateTime GetLetterDate(PdfPage eobFrontPage)
        {
            var contents = eobFrontPage.ExtractText();
            DateTime letterDate = DateTime.MinValue;
            foreach (var phrase in contents)
            {
                if (DateTime.TryParse(phrase, out letterDate))
                    return letterDate;
            }
            return DateTime.MinValue;
        }

        private void SaveEobPackets(IEnumerable<EobPdf> eobPackets)
        {
            foreach (var eob in eobPackets)
            {
                if (Directory.Exists(targetDirectory))
                    SaveFile(eob);
            }
        }

        private void SaveFile(EobPdf file, int iterations = 0)
        {
            var newFilename = iterations == 0 ? file.FullFileName :
                GetIterativeFilename(file.FullFileName, iterations);
            if (File.Exists(newFilename))
                SaveFile(file, iterations + 1);
            else
                file.Content.Save(newFilename);
        }

        private string GetIterativeFilename(string fullFileName, int iterations)
        {
            var fileNameParts = fullFileName.Split('.');
            return string.Format("{0}{1}.pdf", fileNameParts[0], iterations);
        }

Author:  TH-Soft [ Fri Feb 05, 2016 4:18 pm ]
Post subject:  Re: PDF pages are blank after splitting with PDFSharp

Hi!

Could be a bug in PDFsharp.

Do you get the message "Dictionary keys must be direct name objects." for any PDF file you split or is this message specific to some PDF files only?
We'd be wasting our time if the error "Dictionary keys must be direct name objects." only comes with specific PDF files.

Do you use the latest version of PDFsharp 1.50?

Author:  AgileDotnetter [ Fri Feb 05, 2016 5:25 pm ]
Post subject:  Re: PDF pages are blank after splitting with PDFSharp

We're using 1.50. As a side note, the only 1.50 nuGet package is the beta version. Is that correct?

We get the message on every new page in every split PDF opened with an Adobe product - Acrobat or Reader.

I attempted to attach sample PDF files to this post, but wasn't able to. (Error message: "The image file you tried to attach is invalid.") I can email them separately if you'd like.

Thanks for the quick response!

Nate

Author:  TH-Soft [ Sat Feb 06, 2016 10:21 am ]
Post subject:  Re: PDF pages are blank after splitting with PDFSharp

ZIP the PDF. You can upload ZIP files up to 256 kiB to the forum.
viewtopic.php?f=2&t=832

If the ZIP file is larger, then maybe use a file hoster.

Author:  AgileDotnetter [ Wed Feb 10, 2016 1:45 pm ]
Post subject:  Re: PDF pages are blank after splitting with PDFSharp

Thanks for the quick response, Thomas, and sorry for the delay. Other projects took priority for a bit.

Uploaded the zip file. The file contains the original sample file as well as the results of the splitting process.

Attachments:
FirstCare Sample.zip [200.89 KiB]
Downloaded 519 times

Author:  AgileDotnetter [ Thu Feb 18, 2016 5:39 pm ]
Post subject:  Re: PDF pages are blank after splitting with PDFSharp

Hey folks,

I'm afraid this thread has rolled into the corner and been forgotten. If I can't get an answer we're going to have too look elsewhere for a commercial solution. We're probably willing to pay for support at this point, but the purchase page doesn't have any information about how to move forward.

Thanks to Thomas for his pro bono attention on this so far.

I'll check back in occasionally to see if there are any updates to the thread.

Thanks,

Nate

Author:  AgileDotnetter [ Thu Feb 18, 2016 10:53 pm ]
Post subject:  Re: PDF pages are blank after splitting with PDFSharp

Update:

I've discovered something interesting.

I ran a different, run-of-the-mill PDF through a simplified test version of the splitting process. This simplified test process reads the source file, splits the first two pages into new PDF documents, and saves them to the target directory.

Code:
            var rootDirectory = @"C:\PdfSpliterTests\";
            var sourceFilePath = string.Format("{0}normal.pdf", rootDirectory);
            var sourceDoc = PdfReader.Open(sourceFilePath, PdfDocumentOpenMode.Import);
            for (int i = 0; i < 2; i++)
            {
                var newDoc = new PdfDocument();
                newDoc.AddPage(sourceDoc.Pages[i]);
                var filepath = string.Format(@"{0}Target\Test Result {1}.pdf", rootDirectory, i);
                newDoc.Save(filepath);
            }


The resulting PDFs opened without issue. However, the original data PDF from our vendor still had the issue when run through this simplified process. So it doesn't appear that PDFSharp itself is the issue, unless there's some element in the data PDF that PDFSharp is altering in some errant way. It may be that I'm missing some step in creating the split PDF file that is causing a bad dictionary name to be entered.

Is there a simple way I can find which dictionary entry is the offending entry using PDFSharp? That could help me see if I'm missing a step, or whether there's some element that's running afoul of good naming conventions.

Author:  AgileDotnetter [ Wed Feb 24, 2016 8:16 pm ]
Post subject:  Re: PDF pages are blank after splitting with PDFSharp

Update, part deux:

Nevermind. Ultimately this was the same old issue with PDFSharp not fully implementing the 1.5 spec yet. (See FAQ here) We abandoned PDFSharp for a different tool.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/