PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Tue Mar 19, 2024 5:25 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Wed Apr 18, 2018 7:26 pm 
Offline

Joined: Wed Apr 18, 2018 7:17 pm
Posts: 6
I'm trying to get the x and y coordinates of all the images in a pdf document (relative to the page they are on) using PdfSharp.

Is it possible to do this with JUST PDFSharp? If not then with MigraDoc?

I'm able to get the image pixel data, width, height, and some other metadata, but I cannot for the life of me figure out how to get the x and y of the images.

Any help is appreciated!


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 18, 2018 8:17 pm 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 905
Location: CCAA
Hi!
prw56 wrote:
Is it possible to do this with JUST PDFSharp?
The name is PDFsharp.
How do you define "just"?

An image can be used many times within one PDF file, at various positions and different sizes.
Look at this example:
http://pdfsharp.net/wiki/XForms-sample.ashx

PDFsharp does not really parse the code that describes the pages - that's left as an exercise to the reader. Pay attention to transformations and such.

prw56 wrote:
If not then with MigraDoc?
You're kidding - MigraDoc uses PDFsharp to create PDF files, but doesn't do any PDF magic of its own.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Wed Apr 18, 2018 8:37 pm 
Offline

Joined: Wed Apr 18, 2018 7:17 pm
Posts: 6
Quote:
How do you define "just"?

Where no other external libraries besides PdfSharp are used.

Quote:
You're kidding - MigraDoc uses PDFsharp to create PDF files, but doesn't do any PDF magic of its own.

I asked because I was looking for a way to access rendering info, because I figured the positions might be accessible from the rendering object. All my google searches for something like that kept taking me to MigraDoc, which makes sense I guess (because AFAIK PdfSharp doesn't render anything).

Quote:
An image can be used many times within one PDF file, at various positions and different sizes.
Look at this example:
http://pdfsharp.net/wiki/XForms-sample.ashx


I assumed that the same image would be reused, but where are the positions of instances of an image stored?

I can't manage to identify the object that stores the position info in PDFXplorer either, but I know its got to be present in the pdf stream, so I assume I'm looking in the wrong place or its split up somehow.


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 19, 2018 6:17 am 
Offline
PDFsharp Expert
User avatar

Joined: Sat Mar 14, 2015 10:15 am
Posts: 905
Location: CCAA
prw56 wrote:
Quote:
How do you define "just"?

Where no other external libraries besides PdfSharp are used.
Then the answer is yes.
You just need a lot of own code.

As I understand it the image can for example be used in an XPdfForm object which in turn can be used as a resource for a PdfPage.
So inside the page the "image" is known as "/I0" or "/Fm0" and you have to look for the code that draws "/I0" or "/Fm0". Then you have to find which image "/I0" refers to and you have to parse the preceding instructions that set transformations etc.
Do this recursively if the referenced object is a form object ("/Fm0") that draws an image indirectly.

It's so complicated that I would not try to do it.
There is a tool (but I forgot the name) that finds the DPI for images in PDF files. If you can find the DPI then you also can find the positions. I'm afraid the tool is closed source.

Attention to details is very important for a quest like this. The name of the library is still PDFsharp.

_________________
Best regards
Thomas
(Freelance Software Developer with several years of MigraDoc/PDFsharp experience)


Top
 Profile  
Reply with quote  
PostPosted: Thu Apr 19, 2018 9:37 am 
Offline
PDFsharp Guru
User avatar

Joined: Mon Oct 16, 2006 8:16 am
Posts: 3092
Location: Cologne, Germany
Information about parsing the page contents can be found here (3rd post):
https://github.com/empira/PDFsharp/issues/49

Other samples can be found when you search this forum for "extract text".

_________________
Regards
Thomas Hoevel
PDFsharp Team


Top
 Profile  
Reply with quote  
PostPosted: Fri Apr 20, 2018 4:51 pm 
Offline

Joined: Wed Apr 18, 2018 7:17 pm
Posts: 6
While I was digging around in PdfExplorer I noticed that everytime I added an image there would be a new quartet of lines in the Contents element of the page that held the images, in the case of 2 images it looks like this:

q
288 0 0 242.64 70.56 489.6 cm
/IMpndHOabj Do
Q
q
204.48 0 0 185.03999 381.6 547.2 cm
/IMdNmzydvg Do
Q

There are 4 non-zero numbers in each quartet that I have found refer to: actual width, actual height, x (from left side of page to left side of image), and y (from bottom of page to bottom of image).

Now I'm trying to test editing the stream to move the image in PdfSharp, but I'm having trouble identifying the encoding of the Contents stream. Do you guys know where to retrieve the encoding from?

(Here's the pdf file I'm using btw: https://nofile.io/f/Nm4EsYYmDsW/singleImage+(2).pdf)


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 61 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group