PDFsharp & MigraDoc Foundation • View topic - PDF pages are blank after splitting with PDFSharp

View unanswered posts | View active topics

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Forum rules

Please read this before posting on this forum: Forum Rules

PDF pages are blank after splitting with PDFSharp

Moderator: Stefan Lange

Page 1 of 1

[ 8 posts ]

Print view

Previous topic | Next topic

Author

Message

AgileDotnetter

Post subject: PDF pages are blank after splitting with PDFSharp

Posted: Fri Feb 05, 2016 3:26 pm

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6

I haven't seen this particular situation mentioned anywhere on the boards.

I have a project where a vendor is sending us a single PDF containing a number of individual documents (1,800 individual documents, to be exact). Using PDFSharp, we created a routine to split these individual documents into their own PDF. This process splits the documents out correctly.

However, we are unable to open the resulting split PDFs with Adobe Acrobat or Adobe Reader. It opens fine in FoxIt, but we aren't going to be able to go to production without it opening via Adobe. We get the following error message on each page:

Dictionary keys must be direct name objects.

All pages in the document are blank, although their snapshots have the correct page dimensions/orientation.

I'll include the code below. I'd appreciate it if someone could tell me if I'm doing something wrong here, but if this is simply a limitation of PDFSharp I need to know so I can find a substitute tool.

Thanks for the assistance!

Nate Southerland

Code follows:

Code:

   //Clipped
            using (var combinedPdf = LoadCombinedPdf())
            {
                var firstPageIndices = GetEobFirstPageIndicesFromCombinedPdf(combinedPdf).ToArray();
                var eobPackets = SplitIntoEobPackets(combinedPdf, firstPageIndices);
                SaveEobPackets(eobPackets);
            }
   //Clipped


        private PdfDocument LoadCombinedPdf()
        {
            if (Directory.Exists(sourceDirectory))
            {
                var pdfsInSourceDirectory = Directory.GetFiles(sourceDirectory, "*.pdf");
                return PdfReader.Open(pdfsInSourceDirectory.First(), PdfDocumentOpenMode.Import);
            }
            return null;
        }

        private IEnumerable<int> GetEobFirstPageIndicesFromCombinedPdf(PdfDocument combinedPdf)
        {
            for (int i = 0; i < combinedPdf.PageCount; i++)
            {
                var text = combinedPdf.Pages[i].ExtractText();
                if (text.Contains(newPageToken))  
                    yield return i;
            }
        }

        private IEnumerable<EobPdf> SplitIntoEobPackets(PdfDocument combinedPdf, int[] firstPageIndices)
        {
            var packets = new List<EobPdf>();
            foreach (var index in firstPageIndices)
            {
                var eobPdf = ConfigureNewEobPdf(combinedPdf.Pages[index]);
                var newDoc = new PdfDocument();
                newDoc.Version = combinedPdf.Version;
                newDoc.Info.Creator = "Creator";
                var lastPageIndex = combinedPdf.Pages.Count - 1;
                var goToNextPage = true;
                for (int i = index; goToNextPage; i++)
                {
                    newDoc.AddPage(combinedPdf.Pages[i]);
                    var nextIndex = i + 1;
                    goToNextPage = nextIndex <= lastPageIndex && !firstPageIndices.Contains(nextIndex);
                }
                eobPdf.Content = newDoc;
                packets.Add(eobPdf);
            }
            return packets;
        }

        private EobPdf ConfigureNewEobPdf(PdfPage eobFrontPage)
        {
            var rawMemberId = GetMemberId(eobFrontPage);
            var memberId = string.IsNullOrWhiteSpace(rawMemberId) ? "ERR" : rawMemberId;
            var letterDate = GetLetterDate(eobFrontPage);
            return new EobPdf()
            {
                MemberID = memberId,
                EobDate = letterDate,
                FullFileName = string.Format("{0}\\{1:yyyyMMdd}-{2} FileName.pdf", targetDirectory, letterDate, memberId)              };
        }

        private static string GetMemberId(PdfPage eobFrontPage)
        {
            var splitMemberIdLine = eobFrontPage.ExtractText().Where(p => p.IndexOf("Member ID:") > -1).First().Split(':');
            if (splitMemberIdLine.Count() > 1)
                return splitMemberIdLine[1].Trim();
            return string.Empty;
        }

        private DateTime GetLetterDate(PdfPage eobFrontPage)
        {
            var contents = eobFrontPage.ExtractText();
            DateTime letterDate = DateTime.MinValue;
            foreach (var phrase in contents)
            {
                if (DateTime.TryParse(phrase, out letterDate))
                    return letterDate;
            }
            return DateTime.MinValue;
        }

        private void SaveEobPackets(IEnumerable<EobPdf> eobPackets)
        {
            foreach (var eob in eobPackets)
            {
                if (Directory.Exists(targetDirectory))
                    SaveFile(eob);
            }
        }

        private void SaveFile(EobPdf file, int iterations = 0)
        {
            var newFilename = iterations == 0 ? file.FullFileName :
                GetIterativeFilename(file.FullFileName, iterations);
            if (File.Exists(newFilename))
                SaveFile(file, iterations + 1);
            else
                file.Content.Save(newFilename);
        }

        private string GetIterativeFilename(string fullFileName, int iterations)
        {
            var fileNameParts = fullFileName.Split('.');
            return string.Format("{0}{1}.pdf", fileNameParts[0], iterations);
        }

Top

TH-Soft

Post subject: Re: PDF pages are blank after splitting with PDFSharp

Posted: Fri Feb 05, 2016 4:18 pm

PDFsharp Expert

Joined: Sat Mar 14, 2015 10:15 am
Posts: 971
Location: CCAA

Hi!

Could be a bug in PDFsharp.

Do you get the message "Dictionary keys must be direct name objects." for any PDF file you split or is this message specific to some PDF files only?
We'd be wasting our time if the error "Dictionary keys must be direct name objects." only comes with specific PDF files.

Do you use the latest version of PDFsharp 1.50?

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)

Top

AgileDotnetter

Post subject: Re: PDF pages are blank after splitting with PDFSharp

Posted: Fri Feb 05, 2016 5:25 pm

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6

We're using 1.50. As a side note, the only 1.50 nuGet package is the beta version. Is that correct?

We get the message on every new page in every split PDF opened with an Adobe product - Acrobat or Reader.

I attempted to attach sample PDF files to this post, but wasn't able to. (Error message: "The image file you tried to attach is invalid.") I can email them separately if you'd like.

Thanks for the quick response!

Nate

Top

TH-Soft

Post subject: Re: PDF pages are blank after splitting with PDFSharp

Posted: Sat Feb 06, 2016 10:21 am

PDFsharp Expert

Joined: Sat Mar 14, 2015 10:15 am
Posts: 971
Location: CCAA

ZIP the PDF. You can upload ZIP files up to 256 kiB to the forum.
viewtopic.php?f=2&t=832

If the ZIP file is larger, then maybe use a file hoster.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)

Top

AgileDotnetter

Post subject: Re: PDF pages are blank after splitting with PDFSharp

Posted: Wed Feb 10, 2016 1:45 pm

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6

Thanks for the quick response, Thomas, and sorry for the delay. Other projects took priority for a bit.

Uploaded the zip file. The file contains the original sample file as well as the results of the splitting process.

Attachments:

FirstCare Sample.zip [200.89 KiB]
Downloaded 521 times

Top

AgileDotnetter

Post subject: Re: PDF pages are blank after splitting with PDFSharp

Posted: Thu Feb 18, 2016 5:39 pm

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6

Hey folks,

I'm afraid this thread has rolled into the corner and been forgotten. If I can't get an answer we're going to have too look elsewhere for a commercial solution. We're probably willing to pay for support at this point, but the purchase page doesn't have any information about how to move forward.

Thanks to Thomas for his pro bono attention on this so far.

I'll check back in occasionally to see if there are any updates to the thread.

Thanks,

Nate

Top

AgileDotnetter

Post subject: Re: PDF pages are blank after splitting with PDFSharp

Posted: Thu Feb 18, 2016 10:53 pm

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6

Update:

I've discovered something interesting.

I ran a different, run-of-the-mill PDF through a simplified test version of the splitting process. This simplified test process reads the source file, splits the first two pages into new PDF documents, and saves them to the target directory.

Code:

            var rootDirectory = @"C:\PdfSpliterTests\";
            var sourceFilePath = string.Format("{0}normal.pdf", rootDirectory);
            var sourceDoc = PdfReader.Open(sourceFilePath, PdfDocumentOpenMode.Import);
            for (int i = 0; i < 2; i++)
            {
                var newDoc = new PdfDocument();
                newDoc.AddPage(sourceDoc.Pages[i]);
                var filepath = string.Format(@"{0}Target\Test Result {1}.pdf", rootDirectory, i);
                newDoc.Save(filepath);
            }

The resulting PDFs opened without issue. However, the original data PDF from our vendor still had the issue when run through this simplified process. So it doesn't appear that PDFSharp itself is the issue, unless there's some element in the data PDF that PDFSharp is altering in some errant way. It may be that I'm missing some step in creating the split PDF file that is causing a bad dictionary name to be entered.

Is there a simple way I can find which dictionary entry is the offending entry using PDFSharp? That could help me see if I'm missing a step, or whether there's some element that's running afoul of good naming conventions.

Top

AgileDotnetter

Post subject: Re: PDF pages are blank after splitting with PDFSharp

Posted: Wed Feb 24, 2016 8:16 pm

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6

Update, part deux:

Nevermind. Ultimately this was the same old issue with PDFSharp not fully implementing the 1.5 spec yet. (See FAQ here) We abandoned PDFSharp for a different tool.

Top

Page 1 of 1

[ 8 posts ]

Board index » PDFsharp & MigraDoc » Support

All times are UTC

Who is online

Users browsing this forum: No registered users and 56 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum