PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Wed Aug 04, 2021 11:34 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Tue Jun 08, 2021 7:27 pm 
Offline

Joined: Tue Jun 08, 2021 7:13 pm
Posts: 3
I'm using PDFsharp to merge many PDFs (stored on disk) into one PDF. Sometimes the end product PDF can be as large as 700MB. I'm using the sample code provided that basically creates an output PdfDocument, adds pages to it, and then calls outputDocument.Save(destinationPath), so the amount of memory used is about the same as the size of documents produced. Here's a link to the sample:

http://www.pdfsharp.net/wiki/concatenat ... ample.ashx

I tried to use a FileStream in the constructor of PdfDocument when creating the output, that did not seem to work. Somebody suggested that I write a certain number of files, close the PDF, re-open using PdfReader.Open() and continue. Not sure how that would work seeing as I think PdfReader.Open() will load the whole document in memory as far as I know, but I tried it and sure enough it did not look like memory consumption decreased.

Below is the code for a simple console app that tries to merge 2000 files, it closes the doc every 500 pages and re-opens. I'm using PDFsharp-MigraDoc-gdi 1.50.5147, targeting .NET Framework 4.5.

If this cannot be done with PdfSharp, would MigraDocs be any help?

Code:
using System;
using System.Collections.Generic;
using System.IO;
using PdfSharp.Pdf;
using PdfSharp.Pdf.IO;

namespace PdfSharpMergeTest
{
    class Program
    {
        public static void Main(string[] args)
        {
            var files = new List<string>();
            var basePath = AppDomain.CurrentDomain.BaseDirectory;

            for (var i = 0; i < 2000; i++)
            {
                files.Add($"{basePath}\\sample.pdf");
            }
            DoMerge(files, $"{basePath}\\output.pdf");
        }

        private static void DoMerge(List<string> paths, string destinationFile)
        {

            var directory = Path.GetDirectoryName(destinationFile);

            if (!Directory.Exists(directory))
            {
                Directory.CreateDirectory(directory);
            }

            var outputDocument = new PdfDocument();
            var count = 0;

            // Iterate files
            foreach (string path in paths)
            {
                // Open the document to import pages from it.
                try
                {
                    var inputDocument = PdfReader.Open(path, PdfDocumentOpenMode.Import);

                    // Iterate pages
                    for (int idx = 0; idx < inputDocument.PageCount; idx++)
                    {
                        // Get the page from the external document...
                        PdfPage page = inputDocument.Pages[idx];
                        // ...and add it to the output document.
                        outputDocument.AddPage(page);
                    }

                    inputDocument.Dispose();
                   
                    count++;
                    if (count % 500 == 0 || count == paths.Count)
                    {
                        outputDocument.Save(destinationFile);
                        outputDocument.Dispose();

                        if (count < paths.Count)
                        {
                            outputDocument = PdfReader.Open(destinationFile, PdfDocumentOpenMode.Import);
                        }
                    }
                }
                catch (Exception ex)
                {
                    Console.WriteLine(ex.Message);
                    Console.WriteLine(ex.StackTrace);
                }
            }
        }
    }
}


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 09, 2021 7:55 am 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 655
Location: CCAA
Discussed and answered on SO:
https://stackoverflow.com/a/67885787/162529

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Wed Jun 09, 2021 1:59 pm 
Offline

Joined: Tue Jun 08, 2021 7:13 pm
Posts: 3
TH-Soft wrote:
Discussed and answered on SO:
https://stackoverflow.com/a/67885787/162529


Perhaps I'm doing something wrong? Am I closing and re-opening the right way? I tried with a variety of intervals, here are my results.

4 page 140KB sample file, produces 273MB output file

no interval, 21 seconds, max memory 330MB
1000 interval, 30 seconds, max memory 490MB
500 interval, 55secs, max memory 710MB
250 interval, 1min 35sec, max memory 780MB
100 interval, 2min 55secs, max memory 850mb

So not only is reducing the interval making the memory use worse, it's also significantly slowing down the application, which I expected since I assume it's a fairly expensive operation. But it's buying me nothing on the memory front, it's actually making it worse. I do see the memory drop as things run, but invariably it climbs back higher and higher.


Top
 Profile  
Reply with quote  
PostPosted: Thu Jun 10, 2021 1:15 pm 
Offline

Joined: Tue Jun 08, 2021 7:13 pm
Posts: 3
Is anybody able to confirm that the code above would be the correct way to close and re-open the file? I'm perfectly fine with the result being "what you are trying to accomplish is not possible", I just don't want to give up if there's something else I could be doing. I think I'll be able to live with the memory consumption, but if I can avoid any risk I will.


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 10 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group