PDFsharp & MigraDoc Foundation

PDFsharp - A .NET library for processing PDF & MigraDoc Foundation - Creating documents on the fly
It is currently Mon Jul 15, 2024 5:15 am

All times are UTC


Forum rules


Please read this before posting on this forum: Forum Rules



Post new topic Reply to topic  [ 4 posts ] 
Author Message
PostPosted: Tue Nov 09, 2010 10:14 am 
Offline

Joined: Tue Nov 09, 2010 10:03 am
Posts: 2
Hello,

I need to generate PDF from any file uploaded by the user of application may it be txt, doc, docx, xls, xlsx, jpg, gif, etc.

Can I achieve this using PDFsharp and/or MigraDoc?

Any help for this is much appreciated. Thanks..


Top
 Profile  
Reply with quote  
PostPosted: Wed Nov 10, 2010 1:07 pm 
Offline
Supporter

Joined: Fri May 15, 2009 3:28 pm
Posts: 96
Hi,

so long as you can extract data from the uploaded file, you can put that data into a pdf. I.e with a .txt file it would be simple:
-read text from the uploaded FileStream (or save the filestream, then open it and read the text) into a string
-create new pdfsharp/migradoc objects
-add text
-render pages to pdf (if u used migradoc)
-save pdf/output to browser

with regards to simple converting any file format into pdf, its not really that simple. I would allow a particular set of filetypes to be uploaded, and then specify rules for extracting data from each. image files can be directly turned into pdfs, but they MUST be saved on the server first as images cant be added to a pdf from a memory stream. they have to exist on the filesystem the code is running from.

If you need it to be truely anyformat, then I would suggest having something like CutePDF installed on the users machines to "print to pdf" (rather than pieces of paper on a printer, a pdf file is created instead)... then the native applications they used to create/view the files and CutePDF can do the work.

Hope this helps,

Mike


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 18, 2010 6:24 am 
Offline

Joined: Tue Nov 09, 2010 10:03 am
Posts: 2
Hi mikesowerbutts,

I am trying with following code:

Code:
    private void Test()
    {
        string sourceFileName = @"C:\My_Files\Test.txt";
        //string sourceFileName = @"C:\My_Files\Test.doc";
        //string sourceFileName = @"C:\My_Files\Test.xlsx";

        StringBuilder FileTextBuilder = new StringBuilder();
        using (FileStream ReadStream = File.OpenRead(sourceFileName))
        {
            byte[] DataTransit = new byte[ReadStream.Length + 1];
            UTF8Encoding DataEncoding = new UTF8Encoding(true);
            while (ReadStream.Read(DataTransit, 0, DataTransit.Length) > 0)
            {
                FileTextBuilder.Append(DataEncoding.GetString(DataTransit));
            }
        }

        string sourceContent = FileTextBuilder.ToString();

        PdfDocument document = new PdfDocument();
        document.Info.Title = "Created with PDFsharp";

        PdfPage page = document.AddPage();
        XGraphics gfx = XGraphics.FromPdfPage(page);
        XFont font = new XFont("Verdana", 10, XFontStyle.Regular);

        gfx.DrawString(sourceContent, font, XBrushes.Black,
          new XRect(0, 0, page.Width, page.Height),
          XStringFormats.TopLeft);

        string destinationFileName = @"C:\My_Files\Test.pdf";
        document.Save(destinationFileName);
        document.Close();
    }


Now the problem is that when I try to read a well-formatted text file, the resultant PDF file is not in the same formatting, as it is expecting STRING parameter and not BYTE[]. In case of a doc file, junk characters are written in the PDF.

Let me know, if this is the approach you suggested, or something else.

Thank you.


Top
 Profile  
Reply with quote  
PostPosted: Thu Nov 18, 2010 12:30 pm 
Offline
Supporter

Joined: Fri May 15, 2009 3:28 pm
Posts: 96
Hi,

I would suggest a more specific way of extracting data for each filetype, something more like this:
Code:
private void Test()
    {
        string sourceFileName = @"C:\My_Files\Test.txt";
        //string sourceFileName = @"C:\My_Files\Test.doc";
        //string sourceFileName = @"C:\My_Files\Test.xlsx";

        StringBuilder FileTextBuilder = new StringBuilder();
        string TBFormatted  = "";
    if(sourceFileName.Contains(".txt"){
        using (FileStream ReadStream = File.OpenRead(sourceFileName))
        {
            byte[] DataTransit = new byte[ReadStream.Length + 1];
            UTF8Encoding DataEncoding = new UTF8Encoding(true);
            while (ReadStream.Read(DataTransit, 0, DataTransit.Length) > 0)
            {
                FileTextBuilder.Append(DataEncoding.GetString(DataTransit));
            }
            //read in your .txt file to a string and format it. Pdfsharp/MigraDoc like to use /r for new line characters, whereas other formatters use /n and sometimes /r/n, it will depend on the content
            TBFormatted = FormatTXT(TextBuilder.ToString());
        }
     }
     if(sourceFileName.Contains(".doc"){
                 //read in your .doc file to a string and format it
          TBFormatted = ExtractDOC(sourceFileName);
      }
      else if(sourceFileName.Contains(".xlsx"){
           //read in your .xlsx file to a string and format it. 
          TBFormatted = ExtractXLSX(sourceFileName);
      }

        string sourceContent = TBFormatted;
        PdfDocument document = new PdfDocument();
        document.Info.Title = "Created with PDFsharp";

        PdfPage page = document.AddPage();
        XGraphics gfx = XGraphics.FromPdfPage(page);
        XFont font = new XFont("Verdana", 10, XFontStyle.Regular);

        gfx.DrawString(sourceContent, font, XBrushes.Black,
          new XRect(0, 0, page.Width, page.Height),
          XStringFormats.TopLeft);

        string destinationFileName = @"C:\My_Files\Test.pdf";
        document.Save(destinationFileName);
        document.Close();
    }
private string FormatTXT(string input){
    return something;
}
private string ExtractDOC(string input){
    return something;
}
private string ExtractXLSX(string input){
    return something;
}


That code probably isnt 100% correct as ive not tested it and im slightly unfamiliar with the way your reading the test from the File, but hopefully you get the idea.

I read text in like this:
Code:
public static string LoadFile(string filePath)
        {
            TextReader tr = new StreamReader(filePath);
            string _fileString = tr.ReadToEnd();
            tr.Close();
            tr.Dispose();
            return _fileString;
        }


I think that as the formats are different, then you will have to extract the data from them in different ways. .txt shouldnt be too much of an issue, but for .doc, .xls look at the Office 2003 format, and for .docx and .xlsx look at office 2007 format and Open XML (as this is the format used to save Office 2007+ files) Open XML has its own set of objects for interpreting office 2007 files (and also OpenOffice.org i believe) - look at http://openxmldeveloper.org for more info.
Becuase different fileformats store data in different ways, this is why I think you should limit the number of possible upload formats as you will have to address each one seperately...

Hope this helps,

Mike


Top
 Profile  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 4 posts ] 

All times are UTC


Who is online

Users browsing this forum: wolfram3035 and 37 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Privacy Policy, Data Protection Declaration, Impressum
Powered by phpBB® Forum Software © phpBB Group