PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Aug 22, 2024 2:17 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Fri Feb 05, 2016 3:26 pm 
Offline

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6
I haven't seen this particular situation mentioned anywhere on the boards.

I have a project where a vendor is sending us a single PDF containing a number of individual documents (1,800 individual documents, to be exact). Using PDFSharp, we created a routine to split these individual documents into their own PDF. This process splits the documents out correctly.

However, we are unable to open the resulting split PDFs with Adobe Acrobat or Adobe Reader. It opens fine in FoxIt, but we aren't going to be able to go to production without it opening via Adobe. We get the following error message on each page:

Dictionary keys must be direct name objects.

All pages in the document are blank, although their snapshots have the correct page dimensions/orientation.

I'll include the code below. I'd appreciate it if someone could tell me if I'm doing something wrong here, but if this is simply a limitation of PDFSharp I need to know so I can find a substitute tool.

Thanks for the assistance!

Nate Southerland

Code follows:
Code:
   //Clipped
            using (var combinedPdf = LoadCombinedPdf())
            {
                var firstPageIndices = GetEobFirstPageIndicesFromCombinedPdf(combinedPdf).ToArray();
                var eobPackets = SplitIntoEobPackets(combinedPdf, firstPageIndices);
                SaveEobPackets(eobPackets);
            }
   //Clipped


        private PdfDocument LoadCombinedPdf()
        {
            if (Directory.Exists(sourceDirectory))
            {
                var pdfsInSourceDirectory = Directory.GetFiles(sourceDirectory, "*.pdf");
                return PdfReader.Open(pdfsInSourceDirectory.First(), PdfDocumentOpenMode.Import);
            }
            return null;
        }

        private IEnumerable<int> GetEobFirstPageIndicesFromCombinedPdf(PdfDocument combinedPdf)
        {
            for (int i = 0; i < combinedPdf.PageCount; i++)
            {
                var text = combinedPdf.Pages[i].ExtractText();
                if (text.Contains(newPageToken)) 
                    yield return i;
            }
        }

        private IEnumerable<EobPdf> SplitIntoEobPackets(PdfDocument combinedPdf, int[] firstPageIndices)
        {
            var packets = new List<EobPdf>();
            foreach (var index in firstPageIndices)
            {
                var eobPdf = ConfigureNewEobPdf(combinedPdf.Pages[index]);
                var newDoc = new PdfDocument();
                newDoc.Version = combinedPdf.Version;
                newDoc.Info.Creator = "Creator";
                var lastPageIndex = combinedPdf.Pages.Count - 1;
                var goToNextPage = true;
                for (int i = index; goToNextPage; i++)
                {
                    newDoc.AddPage(combinedPdf.Pages[i]);
                    var nextIndex = i + 1;
                    goToNextPage = nextIndex <= lastPageIndex && !firstPageIndices.Contains(nextIndex);
                }
                eobPdf.Content = newDoc;
                packets.Add(eobPdf);
            }
            return packets;
        }

        private EobPdf ConfigureNewEobPdf(PdfPage eobFrontPage)
        {
            var rawMemberId = GetMemberId(eobFrontPage);
            var memberId = string.IsNullOrWhiteSpace(rawMemberId) ? "ERR" : rawMemberId;
            var letterDate = GetLetterDate(eobFrontPage);
            return new EobPdf()
            {
                MemberID = memberId,
                EobDate = letterDate,
                FullFileName = string.Format("{0}\\{1:yyyyMMdd}-{2} FileName.pdf", targetDirectory, letterDate, memberId)              };
        }

        private static string GetMemberId(PdfPage eobFrontPage)
        {
            var splitMemberIdLine = eobFrontPage.ExtractText().Where(p => p.IndexOf("Member ID:") > -1).First().Split(':');
            if (splitMemberIdLine.Count() > 1)
                return splitMemberIdLine[1].Trim();
            return string.Empty;
        }

        private DateTime GetLetterDate(PdfPage eobFrontPage)
        {
            var contents = eobFrontPage.ExtractText();
            DateTime letterDate = DateTime.MinValue;
            foreach (var phrase in contents)
            {
                if (DateTime.TryParse(phrase, out letterDate))
                    return letterDate;
            }
            return DateTime.MinValue;
        }

        private void SaveEobPackets(IEnumerable<EobPdf> eobPackets)
        {
            foreach (var eob in eobPackets)
            {
                if (Directory.Exists(targetDirectory))
                    SaveFile(eob);
            }
        }

        private void SaveFile(EobPdf file, int iterations = 0)
        {
            var newFilename = iterations == 0 ? file.FullFileName :
                GetIterativeFilename(file.FullFileName, iterations);
            if (File.Exists(newFilename))
                SaveFile(file, iterations + 1);
            else
                file.Content.Save(newFilename);
        }

        private string GetIterativeFilename(string fullFileName, int iterations)
        {
            var fileNameParts = fullFileName.Split('.');
            return string.Format("{0}{1}.pdf", fileNameParts[0], iterations);
        }


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 05, 2016 4:18 pm 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 971
Location: CCAA
Hi!

Could be a bug in PDFsharp.

Do you get the message "Dictionary keys must be direct name objects." for any PDF file you split or is this message specific to some PDF files only?
We'd be wasting our time if the error "Dictionary keys must be direct name objects." only comes with specific PDF files.

Do you use the latest version of PDFsharp 1.50?

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Fri Feb 05, 2016 5:25 pm 
Offline

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6
We're using 1.50. As a side note, the only 1.50 nuGet package is the beta version. Is that correct?

We get the message on every new page in every split PDF opened with an Adobe product - Acrobat or Reader.

I attempted to attach sample PDF files to this post, but wasn't able to. (Error message: "The image file you tried to attach is invalid.") I can email them separately if you'd like.

Thanks for the quick response!

Nate


Top
 Profile  
Reply with quote  
PostPosted: Sat Feb 06, 2016 10:21 am 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 971
Location: CCAA
ZIP the PDF. You can upload ZIP files up to 256 kiB to the forum.
viewtopic.php?f=2&t=832

If the ZIP file is larger, then maybe use a file hoster.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 10, 2016 1:45 pm 
Offline

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6
Thanks for the quick response, Thomas, and sorry for the delay. Other projects took priority for a bit.

Uploaded the zip file. The file contains the original sample file as well as the results of the splitting process.


Attachments:
FirstCare Sample.zip [200.89 KiB]
Downloaded 520 times
Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 18, 2016 5:39 pm 
Offline

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6
Hey folks,

I'm afraid this thread has rolled into the corner and been forgotten. If I can't get an answer we're going to have too look elsewhere for a commercial solution. We're probably willing to pay for support at this point, but the purchase page doesn't have any information about how to move forward.

Thanks to Thomas for his pro bono attention on this so far.

I'll check back in occasionally to see if there are any updates to the thread.

Thanks,

Nate


Top
 Profile  
Reply with quote  
PostPosted: Thu Feb 18, 2016 10:53 pm 
Offline

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6
Update:

I've discovered something interesting.

I ran a different, run-of-the-mill PDF through a simplified test version of the splitting process. This simplified test process reads the source file, splits the first two pages into new PDF documents, and saves them to the target directory.

Code:
            var rootDirectory = @"C:\PdfSpliterTests\";
            var sourceFilePath = string.Format("{0}normal.pdf", rootDirectory);
            var sourceDoc = PdfReader.Open(sourceFilePath, PdfDocumentOpenMode.Import);
            for (int i = 0; i < 2; i++)
            {
                var newDoc = new PdfDocument();
                newDoc.AddPage(sourceDoc.Pages[i]);
                var filepath = string.Format(@"{0}Target\Test Result {1}.pdf", rootDirectory, i);
                newDoc.Save(filepath);
            }


The resulting PDFs opened without issue. However, the original data PDF from our vendor still had the issue when run through this simplified process. So it doesn't appear that PDFSharp itself is the issue, unless there's some element in the data PDF that PDFSharp is altering in some errant way. It may be that I'm missing some step in creating the split PDF file that is causing a bad dictionary name to be entered.

Is there a simple way I can find which dictionary entry is the offending entry using PDFSharp? That could help me see if I'm missing a step, or whether there's some element that's running afoul of good naming conventions.


Top
 Profile  
Reply with quote  
PostPosted: Wed Feb 24, 2016 8:16 pm 
Offline

Joined: Fri Feb 05, 2016 2:58 pm
Posts: 6
Update, part deux:

Nevermind. Ultimately this was the same old issue with PDFSharp not fully implementing the 1.5 spec yet. (See FAQ here) We abandoned PDFSharp for a different tool.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 50 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group