Right now we use PDFBox to extract text. They have something called PDFTextStripperByArea where you can provide a searchArea and it will get the text from that area provided. Does PDFSharp have something like that where I can get the text myself?
Here is a part of the PDFBox code:
Code:
public static String extractSSN(final PDPage page) throws IOException
{
// Stripper object.
final PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
// Set the area to search on the PDF.
final Rectangle searchArea = new Rectangle(0, 130, 100, 10);
stripper.addRegion("ssn", searchArea);
// Extract the text from the area, then pluck the ssn from it.
stripper.extractRegions(page);
String text = stripper.getTextForRegion("ssn").replaceAll("c", "%");
text = URLDecoder.decode(text, "UTF-8");
// Return the portion of the string we need.
String output = "";
try
{
output = text.substring(18, 29);
}
catch (final Exception ex)
{
output = "EMPTY";
}
return output;
}