PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Thu Mar 28, 2024 9:43 pm

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Wed May 08, 2019 11:24 am 
Offline

Joined: Wed May 08, 2019 10:14 am
Posts: 3
Hello everyone,

I work with an array of PDFDocuments
Code:
PdfDocument[] results;
.
Is there a way to merge this array of documents to one big PDFDocument without writing it to the disk?

I have already tried to create an array of pages and add them to a document but I can't because they must belong to a document.
I already searched in the web and went over these solutions which did not help me because I do not want to write the PDF to disk before merging them.

http://www.pdfsharp.net/wiki/Concatenat ... ample.ashx
http://pdfsharp.com/PDFsharp/index.php% ... temid%3D60
http://pdfsharp.net/wiki/CombineDocuments-sample.ashx

I work with version 1.51.5185-beta

Cheers and thanks for the help


Top
 Profile  
Reply with quote  
PostPosted: Wed May 08, 2019 12:41 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
You can write the PDFs to MemoryStream objects if you do not want to write them to disk.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Wed May 08, 2019 3:34 pm 
Offline

Joined: Wed May 08, 2019 10:14 am
Posts: 3
But then I would have to write all pdfs to memory stream just to afterwards read them again. Is there no way to use PDFDocuments directly? I have to work with massive amounts of PDF's and being time efficient is very important for my project.


Top
 Profile  
Reply with quote  
PostPosted: Wed May 08, 2019 4:42 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
PAquaticus wrote:
Is there no way to use PDFDocuments directly?
Why do you create multiple files in the first place? Simply create one big file right from the start.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Mon May 13, 2019 8:39 am 
Offline

Joined: Wed May 08, 2019 10:14 am
Posts: 3
Thomas Hoevel wrote:
PAquaticus wrote:
Is there no way to use PDFDocuments directly?
Why do you create multiple files in the first place? Simply create one big file right from the start.


Because I want to create up to a million pdf's in as little time as possible for a project I am working on. I have time restrictions so I have to make use of everything I can. Therefore I create the pdf's in parallel. But writing every single one to the disk does bottleneck. My Nvme can handle many iop's but I read in other posts in this forum that Pdfsharp is not threadsafe which still appears to be true in my case. So I would like to concatenate them and write them as one big file.


Top
 Profile  
Reply with quote  
PostPosted: Mon May 13, 2019 4:06 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
PAquaticus wrote:
My Nvme can handle many iop's but I read in other posts in this forum that Pdfsharp is not threadsafe which still appears to be true in my case.
Sounds as if you are not sure whether PDFsharp is threadsafe.
AFAIK the only issues are with font handling - and there are known workarounds. Posts about version 1.32 and older can be outdated.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 15, 2019 5:39 am 
Offline

Joined: Fri Nov 15, 2019 5:10 am
Posts: 8
Thomas Hoevel wrote:
Why do you create multiple files in the first place? Simply create one big file right from the start.

Hi,

I have a similar use case. Say, I have 100 transactions to generate receipts for. I create multiple one-page files and place them into individual folders (say, C:\receipts\1\receipt.pdf, C:\receipts\2\receipt.pdf, ... C:\receipts\99\receipt.pdf, C:\receipts\100\receipt.pdf). But I also need to create one large batch file that contains all of the 100 individual one-page receipts.

What I am doing right now (and it works), is to loop through the 100 records and, in each of the 100 iterations:
  1. Generate a PdfPage.
  2. Add the one page to a PdfDocument
  3. Save the document to the appropriate folder
  4. Re-open the just generated file
  5. Create a new Pdfpage from the 1st (and only) page in it.
  6. Append the page to the batch PdfDocument

After the loop exits, save the 100-page batch PDF file.

This works well, but re-opening 100 (or 1000 or 5000) files seems inefficient. What I would like to do is avoid Steps 4 and 5. Instead, I'd like to add the PdfPage from Step 1 to two different PdfDocuments.

Is it possible? Thank you for any suggestions.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 15, 2019 7:20 am 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 909
Location: CCAA
pdfuser1 wrote:
This works well, but re-opening 100 (or 1000 or 5000) files seems inefficient.
I'd modify step 1 to create two identical PDF pages at the same time - add one to a single-page PDF, add the other to the big PDF.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 15, 2019 2:17 pm 
Offline

Joined: Fri Nov 15, 2019 5:10 am
Posts: 8
TH-Soft wrote:
pdfuser1 wrote:
This works well, but re-opening 100 (or 1000 or 5000) files seems inefficient.
I'd modify step 1 to create two identical PDF pages at the same time - add one to a single-page PDF, add the other to the big PDF.

This is a great suggestion, but I actually generate the page by opening a one-page PDF template file (and then add small chunks of text to it). So, I would have to open the template twice, which is still just inefficient (the same number of file system I/O operations).

Ideally, I would want to be able to make a copy of my PdfPage object. But I couldn't get .Clone() to work. :( Is there a way to copy/clone a page or a document object? Perhaps, I am not using the .Clone() method correctly? Is there example I could follow?

Thank you again.


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 15, 2019 2:53 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
pdfuser1 wrote:
Is there a way to copy/clone a page or a document object?
I don't know if "Clone()" works properly for pages.

With respect to file size of the big file, it is most likely better to create the big file first and split that into many single-page files later.
To minimize disk IO, save the file to a MemoryStream and read it from there.
Things that are somewhat inefficient can still be very fast on modern computers.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 15, 2019 4:39 pm 
Offline

Joined: Fri Nov 15, 2019 5:10 am
Posts: 8
Thomas Hoevel wrote:
pdfuser1 wrote:
Is there a way to copy/clone a page or a document object?
I don't know if "Clone()" works properly for pages.

With respect to file size of the big file, it is most likely better to create the big file first and split that into many single-page files later.
To minimize disk IO, save the file to a MemoryStream and read it from there.
Things that are somewhat inefficient can still be very fast on modern computers.
Yes, agree. Even with the current "inefficient" process, I can generate 5000 individual files, save them to individual networked folders and then generate a 5000-page 800MB batch file and save it to the network, all under 10 minutes. And that only happens once a year in my case. Most other times, it's a few dozens (or, at worst, a few hundred files).

So, all this is more of a learnign experience for me, in case if I ever need to worry about efficiency in another project. With that said, I have no experience with Streams or MemoryStreams, in general. I did try to go that route, but got nowhere. I probably wasn't closing the stream correctly because Adobe Acrobat was prompting me to save the file before closing. The file size on disk was indicative of a 3-page document, but I couldn't see anything beyond the first page. Is there a good example I can follow?

I truly appreciate your suggestions so far and am very thankful for the PDFsharp library. It's a great tool that we, developers, get to use for free. :shock:


Top
 Profile  
Reply with quote  
PostPosted: Fri Nov 15, 2019 5:05 pm 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3095
Location: Cologne, Germany
The only "official" PDFsharp sample that uses MemoryStream is the web server sample:
http://www.pdfsharp.net/wiki/Clock-sample.ashx

There are many examples for MemoryStream on the web. After saving the document to the MemoryStream and before reading it again, you have to set the position of the MemoryStream to 0.

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Mon Nov 25, 2019 5:49 pm 
Offline

Joined: Fri Nov 15, 2019 5:10 am
Posts: 8
I got saving to MemoryStream working. It was very easy. I was definitely overthinking it.

Saving to a MemoryStream instead of to the file system and thus avoiding an extra read from the file system saves me about 25% of the time. 5,000 individual one-page PDFs and then a 5,000-page batch document are produced 2 minutes faster (in 6:40 mins vs. 8:40).


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 130 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group