PDFsharp & MigraDoc Foundation
https://forum.pdfsharp.net/

How to get the position of an image relative to its page
https://forum.pdfsharp.net/viewtopic.php?f=2&t=3768
Page 1 of 1

Author:  prw56 [ Wed Apr 18, 2018 7:26 pm ]
Post subject:  How to get the position of an image relative to its page

I'm trying to get the x and y coordinates of all the images in a pdf document (relative to the page they are on) using PdfSharp.

Is it possible to do this with JUST PDFSharp? If not then with MigraDoc?

I'm able to get the image pixel data, width, height, and some other metadata, but I cannot for the life of me figure out how to get the x and y of the images.

Any help is appreciated!

Author:  TH-Soft [ Wed Apr 18, 2018 8:17 pm ]
Post subject:  Re: How to get the position of an image relative to its page

Hi!
prw56 wrote:
Is it possible to do this with JUST PDFSharp?
The name is PDFsharp.
How do you define "just"?

An image can be used many times within one PDF file, at various positions and different sizes.
Look at this example:
http://pdfsharp.net/wiki/XForms-sample.ashx

PDFsharp does not really parse the code that describes the pages - that's left as an exercise to the reader. Pay attention to transformations and such.

prw56 wrote:
If not then with MigraDoc?
You're kidding - MigraDoc uses PDFsharp to create PDF files, but doesn't do any PDF magic of its own.

Author:  prw56 [ Wed Apr 18, 2018 8:37 pm ]
Post subject:  Re: How to get the position of an image relative to its page

Quote:
How do you define "just"?

Where no other external libraries besides PdfSharp are used.

Quote:
You're kidding - MigraDoc uses PDFsharp to create PDF files, but doesn't do any PDF magic of its own.

I asked because I was looking for a way to access rendering info, because I figured the positions might be accessible from the rendering object. All my google searches for something like that kept taking me to MigraDoc, which makes sense I guess (because AFAIK PdfSharp doesn't render anything).

Quote:
An image can be used many times within one PDF file, at various positions and different sizes.
Look at this example:
http://pdfsharp.net/wiki/XForms-sample.ashx


I assumed that the same image would be reused, but where are the positions of instances of an image stored?

I can't manage to identify the object that stores the position info in PDFXplorer either, but I know its got to be present in the pdf stream, so I assume I'm looking in the wrong place or its split up somehow.

Author:  TH-Soft [ Thu Apr 19, 2018 6:17 am ]
Post subject:  Re: How to get the position of an image relative to its page

prw56 wrote:
Quote:
How do you define "just"?

Where no other external libraries besides PdfSharp are used.
Then the answer is yes.
You just need a lot of own code.

As I understand it the image can for example be used in an XPdfForm object which in turn can be used as a resource for a PdfPage.
So inside the page the "image" is known as "/I0" or "/Fm0" and you have to look for the code that draws "/I0" or "/Fm0". Then you have to find which image "/I0" refers to and you have to parse the preceding instructions that set transformations etc.
Do this recursively if the referenced object is a form object ("/Fm0") that draws an image indirectly.

It's so complicated that I would not try to do it.
There is a tool (but I forgot the name) that finds the DPI for images in PDF files. If you can find the DPI then you also can find the positions. I'm afraid the tool is closed source.

Attention to details is very important for a quest like this. The name of the library is still PDFsharp.

Author:  Thomas Hoevel [ Thu Apr 19, 2018 9:37 am ]
Post subject:  Re: How to get the position of an image relative to its page

Information about parsing the page contents can be found here (3rd post):
https://github.com/empira/PDFsharp/issues/49

Other samples can be found when you search this forum for "extract text".

Author:  prw56 [ Fri Apr 20, 2018 4:51 pm ]
Post subject:  Re: How to get the position of an image relative to its page

While I was digging around in PdfExplorer I noticed that everytime I added an image there would be a new quartet of lines in the Contents element of the page that held the images, in the case of 2 images it looks like this:

q
288 0 0 242.64 70.56 489.6 cm
/IMpndHOabj Do
Q
q
204.48 0 0 185.03999 381.6 547.2 cm
/IMdNmzydvg Do
Q

There are 4 non-zero numbers in each quartet that I have found refer to: actual width, actual height, x (from left side of page to left side of image), and y (from bottom of page to bottom of image).

Now I'm trying to test editing the stream to move the image in PdfSharp, but I'm having trouble identifying the encoding of the Contents stream. Do you guys know where to retrieve the encoding from?

(Here's the pdf file I'm using btw: https://nofile.io/f/Nm4EsYYmDsW/singleImage+(2).pdf)

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
https://www.phpbb.com/