Home » Visual StudioRSS

Convert html to pdf using iTextSharp

can anyone provide me a sample code who have used iTextsharp(chapter 0707) to convert HTML to PDF.... Please help.... urgent...!!!

 

19 Answers Found

 

Answer 1

what's the problem? Is the provided tutorial not working? 7070 (included below) converts chap0702 to a pdf document?

  Console.WriteLine("Chapter 7 example 7: parsing the html  from example 2");
       
  // step 1: creation of a document-object
  Document document = new Document(PageSize.A4, 80, 50, 30, 65);
           
  // step 2:
  // we create a writer that listens to the document
  // and directs a XML-stream to a file
  PdfWriter.getInstance(document, new FileStream("Chap0707.pdf", FileMode.Create));
       
  // step 3: we parse the document
  HtmlParser.parse(document, "Chap0702.html");

you can find a different set of tutorials here

 

Answer 2

 

No, its not working.... i tried it.... but still it didn't work....
 

Answer 3

Sorry, you marked it is answered... does your problem still exist?

can you tell me in more detail what the problem is?

 

Answer 4

Ok? I guess if any other dumbasses are out there, please make sure you have the following at the top:

using iTextSharp.text;
using iTextSharp.text.html;
using iTextSharp.text.pdf;


Seems I forgot the html  one which is why I could not see HtmlParser, but I found this on my one although I swore I had it. Ugh, always double check your declarations so I hope this helps other idiots like me... :)  

 

 

 ORIGINAL POST:

I think what he wants to say is that this does not work because HtmlParser does not seem to be part of iTextSharp. I keep seeing various examples floating around online about converting HTML to pdf  but they allO reference HtmlParser which does NOT seem to exist within itextsharp  so where is this coming from? Do we have to install something else? Let me know, thanks...

To clarify before people start to say that HtmlParser is within iTextSharp... I am using the .NET version from http://sourceforge.net/projects/itextsharp/ so this is NOT THE JAVA version but the .NET VERSION and it does not seem to have HtmlParser, so can someone explain to me if I am doing something wrong?

 

Answer 5

I were having same, but then I found, in new version we need to use HTMLWorker.ParseToList() instead of HtmlParser......

 

Answer 6

 I got itextsharp  version 4.1.6.0 and got it to work like this

for parsing html  not in a file...:

 

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
using System.Collections;
using System.Text;
using iTextSharp.text.xml;
using iTextSharp.text.html;

 


    public partial class itexttest : System.Web.UI.Page
    {
        protected void Page_Load(object sender, EventArgs e)
        {
            //create document
          Document document = new Document();
            try {
                //writer - have our own path!!! and see you have write permissions...
       PdfWriter.GetInstance(document, new FileStream(Server.MapPath("/") + "WordDoc/" + "parsetest.pdf", FileMode.Create));
           document.Open();
                //html -text - kan be from database or editor too
 String htmlText="<font  " +
" color=\"#0000FF\"><b><i>Title One</i></b></font><font   " +
" color=\"black\"><br><br>Some text here<br><br><br><font   " +
" color=\"#0000FF\"><b><i>Another title here   " +
" </i></b></font><font   " +
" color=\"black\"><br><br>Text1<br>Text2<br><OL><LI>hi</LI><LI>how are u</LI></OL>";

 //make an arraylist ....with STRINGREADER since its no IO reading file...
ArrayList htmlarraylist = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(new StringReader(htmlText), null);
                //add the collection to the document
for (int k = 0; k < htmlarraylist.Count; k++)
{
    document.Add((IElement)htmlarraylist[k]);
}

                document.Add(new Paragraph("And the same with indentation...."));

// or add the collection to an paragraph
                // if you add it to an existing non emtpy paragraph it will insert it from
                //the point youwrite -
Paragraph mypara = new Paragraph();//make an emtphy paragraph as "holder"
mypara.IndentationLeft = 36;
mypara.InsertRange(0, htmlarraylist);
document.Add(mypara);
document.Close();       
 


  
  }
  catch (Exception exx) {
   Console.Error.WriteLine(exx.StackTrace);
   Console.Error.WriteLine(exx.Message);
  }
 }

}

 

good luck.... 

 

Answer 7

just for info

i used null as argument for stylesheet -  but you kan add one of course if you want to...

i use an editor and all the formatting is done to the db as html... 

 

Answer 8

I am getting this error at ArrayList htmlarraylist = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList (new StringReader(htmlText), null); Cannot implicitly convert  type 'system.collection,generic.list to system.collections.arraylist

 

Answer 9

What version of DLL are you using. Practically you should be able to cast it by converting it to arraylist

if you send me the codesnippet you have and the dll version i will try to help you out

 

Cannot implicitly convert  type 'system.collection,generic.list to system.collections.arraylist -->

that means you cant convert it to arraylist? - did you try?

ArrayList htmlarraylist = (ArrayList)iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(new StringReader(htmlText), null);

 

Answer 10

Sorry wrote before checking. Cannot convert  array to generic.

I have the opposite if i make a generic list "cannot convert ..arraylist to generic.." if i change

must be the dll version - or are you maybe using VS2010 - framework 4.0? something must be different... i use VS2008 .net 3.5

but did you try to just make it generic list ?

of  for example ielements List<IElement> htmlarraylist? then you should be able to iterate with count..

a generic list example

 List<string> telling = new List<string>();

            telling.Add("How");
            telling.Add("Are");
            telling.Add("You");
            for (int k = 0; k < telling.Count; k++)
            {
                Response.Write(telling[k]);
            }

look what kind of list the parse returns ....i just guess ielements but you may have to cast it . Hope this helps you.

 

Answer 11

 I just had to try out of curiosity...

I used VS 2010 and the new itextsharp  dll (5.0.2.0)

i got the same error  - that i cant convert  arraylist to generic.

- and just made a generic list of IElements and it worked - i post the code  here you see its the same except for the List<IEleme.....

good luck.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
using System.Collections;
using System.Text;
using iTextSharp.text.xml;
using iTextSharp.text.html;

public partial class Default2 : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        //create document
        Response.Write(Server.MapPath("."));
        Document document = new Document();
        try
        {
            //writer - have our own path!!!
            PdfWriter.GetInstance(document, new FileStream(Server.MapPath(".") + "parsetest.pdf", FileMode.Create));
            document.Open();
            //html -text - kan be from database or editor too
            String htmlText = "<font  " +
        " color=\"#0000FF\"><b><i>Title One</i></b></font><font   " +
        " color=\"black\"><br><br>Some text here<br><br><br><font   " +
        " color=\"#0000FF\"><b><i>Another title here   " +
        " </i></b></font><font   " +
        " color=\"black\"><br><br>Text1<br>Text2<br><OL><LI>hi</LI><LI>how are u</LI></OL>";

            //make an arraylist ....with STRINGREADER since its no IO reading file...
           
            List<IElement> htmlarraylist = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(new StringReader(htmlText), null);

            //add the collection to the document
            for (int k = 0; k < htmlarraylist.Count; k++)
            {
                document.Add((IElement)htmlarraylist[k]);
            }

            document.Add(new Paragraph("And the same with indentation...."));

            // or add the collection to an paragraph
            // if you add it to an existing non emtpy paragraph it will insert it from
            //the point youwrite -
            Paragraph mypara = new Paragraph();//make an emtphy paragraph as "holder"
            mypara.IndentationLeft = 36;
            mypara.InsertRange(0, htmlarraylist);
            document.Add(mypara);
            document.Close();

 


        }
        catch (Exception exx)
        {
            Console.Error.WriteLine(exx.StackTrace);
            Console.Error.WriteLine(exx.Message);
        }
    }
}

 

Answer 12

when i got the same error as you did - >

 Cannot implicitly convert  type 'System.Collections.Generic.List<iTextSharp.text.IElement>' to 'System.Collections.ArrayList'.....

the error also provided the list objects ... 

so it was changed- but its good to know anyhow.

 

Answer 13

hoping I can hop onto this thread...

I'm storing html formatted text in a database field, mostly just paragraphs, ordered and unordered lists. I'm trying to figure out how I can fetch that data and map it to an Acrofield in a already established PDF template file. I'm able to map it right now and parse the html  using HTML Worker, but I'm losing all of the formatting (no paragraphs, no bulleted lists, etc.).

Do you have any suggestions.

Thanks 

 
 

Answer 15

my bad. 

 

Answer 16

You can try the Winnovative html to pdf converter library for .net or the free html to pdf online service. 

 

Answer 17

Hello, my post has to do with the code  above as I am using it to parse HTML. I have ran into a problem with this code with images. 

Using itextSharp to convert  an html  file to pdf  , I am unable to successfully parse an html file on the web server backend even when doing "absolute" or "relative" image referencing itextSharp fails and says "unable to find file c:\my_image.jpg". This is something I don't understand because I didn't specify c:\my_image.jpg as a path!

Here is a snippet of the HTML which I build in with a Stringbuilder before using it.:

            strSelectUserListBuilder.Append("<table border='0' width='600' cellspacing='0' cellpadding='0'>" + strNL.ToString());
            strSelectUserListBuilder.Append("<tr>" + strNL.ToString());
            strSelectUserListBuilder.Append("<td>" + strNL.ToString());
            strSelectUserListBuilder.Append("<p align='center'><img border='0' src='images/ResumeTopBorderBrown.jpg' width='600' height='10'><br>" + strNL.ToString());
            strSelectUserListBuilder.Append("<font face='Arial' size='3' color='#876E3A'><b>Consultants<br></b></font>" + strNL.ToString());
            strSelectUserListBuilder.Append("<img border='0' src='images/ResumeBottomBorderBrown.jpg' width='600' height='10'>" + strNL.ToString());
            strSelectUserListBuilder.Append("</td>" + strNL.ToString());
            strSelectUserListBuilder.Append("</tr>" + strNL.ToString());

            strSelectUserListBuilder.Append(" <tr><td>" + strNL.ToString());

            //RippleEffectStaffList(Status);

            strSelectUserListBuilder.Append("</td></tr>" + strNL.ToString());

            // FOOTER
            strSelectUserListBuilder.Append("<tr>" + strNL.ToString());
            strSelectUserListBuilder.Append("<td>" + strNL.ToString());
            strSelectUserListBuilder.Append("<p align='center'><img border='0' src='images/ResumeBottomBorderBrown.jpg' width='600' height='10'>" + strNL.ToString());
            strSelectUserListBuilder.Append("</td>" + strNL.ToString());
            strSelectUserListBuilder.Append("</tr>" + strNL.ToString());
            strSelectUserListBuilder.Append("</table>" + strNL.ToString());
            strSelectUserListBuilder.Append("<br><br><br>" + strNL.ToString());


 
Here is the code which is basically the same as in this thread:

            //create document
            Response.Write(Server.MapPath("." + @"\Resumes"));
            Document document = new Document();
            try
            {
                //writer - have our own path!!!
                PdfWriter.GetInstance(document, new FileStream(Server.MapPath(".") + @"\Resumes\HTML-to-PDF.pdf", FileMode.Create));
                document.Open();




                //Here is where your HTML source goes................
                String htmlText = strSelectUserListBuilder.ToString();


                //make an arraylist ....with STRINGREADER since its no IO reading file...

                List<IElement> htmlarraylist = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(new StringReader(htmlText), null);

                //add the collection to the document
                for (int k = 0; k < htmlarraylist.Count; k++)
                {
                    document.Add((IElement)htmlarraylist[k]);
                }

                //document.Add(new Paragraph("And the same with indentation...."));

                // or add the collection to an paragraph
                // if you add it to an existing non emtpy paragraph it will insert it from
                //the point youwrite -
                Paragraph mypara = new Paragraph();//make an emtphy paragraph as "holder"
                mypara.IndentationLeft = 36;
                mypara.InsertRange(0, htmlarraylist);
                document.Add(mypara);
                document.Close();




            }
            catch (Exception exx)
            {
                Response.Write("<br>____________________________________<br>");
                Response.Write("<br>Error: " + exx + "<br>");
                Response.Write("<br>StackTrace: " + exx.StackTrace + "<br>");
                Response.Write("<br>strPDFDocument: " + strPDFDocument.ToString() + "<br>");
                Response.Write("<br>strSelectUserListBuilder: " + strSelectUserListBuilder.ToString() + "<br>");

                //Console.Error.WriteLine(exx.StackTrace);
                //Console.Error.WriteLine(exx.StackTrace);
            }
            finally
            {
                //document.Close();
            }


 

Here is the ERROR that I am getting with the code:

Error: System.Net.WebException: Could not find a part of the path 'c:\images\ResumeTopBorderBrown.jpg'. ---> 
System.Net.WebException: Could not find a part of the path 'c:\images\ResumeTopBorderBrown.jpg'. --->
System.IO.DirectoryNotFoundException: Could not find a part of the path 'c:\images\ResumeTopBorderBrown.jpg'.


 

As you can also see I am using Response.Write at the end to show the display as a test at the end and it shows the images fine in the browser.

What is or could be the issue here? 

 

 

Answer 18

Hi

the problem is that itextsharp  wants to put a image to the pdf with  iTextSharp.text.Image and writing only short path in image tag confuses the parser who dont know where the full path is.

i have done this with ordinary pdfs and when you input an image   you use this-->

iTextSharp.text.Image.GetInstance(Server.MapPath("/") + "/images/Mypic.jpg");  as an example.

But when you have a html  "text" it wants the actual  url - (for a reason i dont really know)

so to solve your problem parsing it on the webserver you can add this before your html-string

String UrlDirectory = Request.Url.GetLeftPart(UriPartial.Path);
UrlDirectory = UrlDirectory.Substring(0, UrlDirectory.LastIndexOf("/")+1);

that give you the url (http://yoursite/subcat/)

to this you just add the imagetab -->

<img border='0' src='" + UrlDirectory + "/images/ResumeBottomBorderBrown.jpg' width='600' height='10'>

so it can find the picture

 

so the solution would be like this -->(all code)

i have tried it with your code and it works on VS 2008 and itextsharp 5.0.2

hope this will help you out

good luck!

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.IO;
using System.Collections;
using iTextSharp.text.xml;
using iTextSharp.text.html;

public partial class Default3 : System.Web.UI.Page
{
    String strSelectUserListBuilder = "";
    protected void Page_Load(object sender, EventArgs e)
    {
        //create document  
        //Response.Write(Server.MapPath("."));
        Document document = new Document();
        try
        {
            String UrlDirectory = Request.Url.GetLeftPart(UriPartial.Path);
            UrlDirectory = UrlDirectory.Substring(0, UrlDirectory.LastIndexOf("/")+1);
            Response.Write(UrlDirectory);
            //writer - have our own path!!!  
            PdfWriter.GetInstance(document, new FileStream(Server.MapPath(".") + @"HTML-to-PDF.pdf", FileMode.Create));
            document.Open();
          strSelectUserListBuilder = "<table border='0' width='600' cellspacing='0' cellpadding='0'>" +
        "<tr>" +
        "<td>" +
        "<p align='center'><img border='0' src='" + UrlDirectory + "/images/ResumeBottomBorderBrown.jpg' width='600' height='10'><br>" +
        "<font face='Arial' size='3' color='#876E3A'><b>Consultants<br></b></font>" +
          "<img border='0' src='" + UrlDirectory + "images/ResumeBottomBorderBrown.jpg' width='600' height='10'>" +
        "</td>" +
        "</tr>" +
        " <tr><td>" +
        "</td></tr>" +

        // FOOTER  
        "<tr>" +
        "<td>" +
            "<p align='center'><img border='0' src='"  + UrlDirectory   + "images/ResumeBottomBorderBrown.jpg' width='600' height='10'>" +
        "</td>" +
        "</tr>" +
        "</table>" +
        "<br><br><br>";
            //Here is where your HTML source goes................  
            String htmlText = strSelectUserListBuilder.ToString();


            //make an arraylist ....with STRINGREADER since its no IO reading file...  

            List<IElement> htmlarraylist = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(new StringReader(htmlText), null);

            //add the collection to the document  
            for (int k = 0; k < htmlarraylist.Count; k++)
            {
                IElement x = (IElement)htmlarraylist[k];
                Response.Write(x.Type.ToString() + "#<br>");
                // document.Add((IElement)htmlarraylist[k]); 


            }

            //document.Add(new Paragraph("And the same with indentation...."));  

            // or add the collection to an paragraph  
            // if you add it to an existing non emtpy paragraph it will insert it from  
            //the point youwrite -  
            Paragraph mypara = new Paragraph();//make an emtphy paragraph as "holder"  
            mypara.IndentationLeft = 36;
            mypara.InsertRange(0, htmlarraylist);
            document.Add(mypara);
            document.Close();

 


        }
        catch (Exception exx)
        {
            Response.Write("<br>____________________________________<br>");
            Response.Write("<br>Error: " + exx + "<br>");
            Response.Write("<br>StackTrace: " + exx.StackTrace + "<br>");
            Response.Write("<br>strSelectUserListBuilder: " + strSelectUserListBuilder.ToString() + "<br>");

            //Console.Error.WriteLine(exx.StackTrace);  
            //Console.Error.WriteLine(exx.StackTrace);  
        }
        finally
        {
            //document.Close();  
        }
    }
}

 

 

 

 

Answer 19

ps

ofcourse the correct path is + UrlDirectory + "images/ as in the 2 last but it seems to work with + UrlDirectory + "/images/ too... 

 
 
 

<< Previous      Next >>


Microsoft   |   Windows   |   Visual Studio   |   Follow us on Twitter