Save HTML webpage to a PDF Document


You get to access it offline. Its more readable. You can annotate, leave comments, etc from a PDF Client (Adobe Acrobat Reader anyone?). You can even track your progress inside the document. If your client is really friendly it can even reopen the document from where you left it.

Well for me its merely a matter of convenience. I always prefer a PDF manual instead of an HTML/CHM one for some reason (think:; Does it really matter if PDF is three times the size of the HTML archive? Storage isn’t really a pushing concern these days. Is it?


Multiple HTML pages to a Single PDF.

There are a lot of websites which offer to convert an HTML document to PDF on the fly. Doesn’t serve the purpose and I not very big on registering everywhere!

For our little experiment lets pick a URL. I suggest

(for further reading go to by Eric S. Raymond).

#1 Download web pages recursively using wget

# create a directory under home; Think: less clutter
$ mkdir ~/our_little_experiment; cd ~/our_little_experiment;

# download the webpage and recursively download all those webpages which are linked from this page in the current directory.
$ wget -v -r

FINISHED –2012-10-07 20:15:47–
Total wall clock time: 2m 51s
Downloaded: 16 files, 164K in 1.9s (88.7 KB/s)

A little later …

# lets look at the files generated by wget (an awesome tool btw!)
$ cd ~/our_little_experiment/

$ ls -1

# index.html is actually chapter 1 so renaming it to ar01s01.html
$ mv index.html ar01s01.html

#2 install htmldoc

$ sudo apt-get install htmldoc

#3 create the PDF document

$ htmldoc --webpage -t pdf14 -v -f catb_cathedral_bazaar.pdf *.html

Output: catb_cathedral_bazaar.pdf


All the links work perfectly.

I’d be happy to guide anyone doing it on w32.

Please don’t use “win” as an abbreviation for Microsoft Windows in GNU software or
documentation. In hacker terminology, calling something a “win” is a form of praise. If you
wish to praise Microsoft Windows when speaking on your own, by all means do so, but not
in GNU software. Usually we write the name “Windows” in full, but when brevity is very
important (as in file names and sometimes symbol names), we abbreviate it to “w”. For
instance, the files and functions in Emacs that deal with Windows start with ‘w32’.

— GNU Standards


Happy Hacking!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s