PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Tue Mar 19, 2024 11:04 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 1 post ] 
Author Message
 Post subject: Test if a file is valid
PostPosted: Fri Sep 23, 2016 2:11 pm 
Offline

Joined: Fri Sep 23, 2016 2:09 pm
Posts: 1
Location: Italy
I'm processing an huge list of PDF files and I need to understand which files are right and which are corrupted. In case I find some file corrupted, I can access a service which serves me the right file, and I substitute it in the list.
That recovery has a cost, in terms of processing power and bandwidth, so I would like to avoid as much as possible to access that service.
I need to process only the file corrupted, so I would need a function to invoke for each file in the list.

I would like to have an evaluation function based on some free library that tells me if the PDF file is good or corrupt, in the same way Acrobat Reader would show me any page or it displays an alert with the error.
My file could be corrupted for several reasons:

- it has been previously transferred with wrong encoding during a file transfer (FTP binary/ASCII)
- it could have been truncated by a previous and unstable FTP upload
- it could be wrong for unknown reasons

At this time I developed this solution, which is far from being optimal, due to unimplemented features in PdfSharp:

Code:
Dim doLoaded As Boolean
Dim tested As Integer
Dim numberOfPages As Integer
Dim reader As iTextSharp.text.pdf.PdfReader = Nothing

Try
  ' quick'n dirty test
  tested = PdfSharp.Pdf.IO.PdfReader.TestPdfFile(file)
 
  ' open attempt with iTextSharp
  reader = New PdfReader(New RandomAccessFileOrArray(file, True), Nothing)
  numberOfPages = reader.NumberOfPages
  doLoaded = (tested <> 0) AndAlso (numberOfPages > 0)
  If (Not doLoaded) Then Return False
 
  ' open attempt with PdfSharp
  doc = PdfSharp.Pdf.IO.PdfReader.Open(file, PdfSharp.Pdf.IO.PdfDocumentOpenMode.ReadOnly)
  np = doc.PageCount
  doLoaded = (tested <> 0) AndAlso (np > 0)
  If doLoaded Then
  For i = 0 To np - 1
  a = doc.Pages.Item(i)
  Next
Catch ex As Exception
  If ex.Message = "Cannot handle iref streams. The current implementation of PDFsharp cannot handle this PDF feature introduced with Acrobat 6." Then
    doLoaded = True
  End If
Catch ex As Exception
  doLoaded = False
Finally
  If (Not reader Is Nothing) Then
    reader.Close()
  End If
End Try


Any suggestions?


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 58 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group