↑↑ Home ↑ TeX tricks

Generating high-quality portable PDF files

The usual way to compile a TeX source file is to generate a .dvi file with the tex or latex command and then convert it into a PostScript file with dvips. If a PDF file is required, it can be generated from the PostScript by ps2pdf. This can be problematic in two respects: the quality of images may degrade for no apparent reason, and the resulting PDF file may not display correctly on other systems.

A long time ago, I found on someone else's home page a dvips command line which prevents both problems. (It seems to be extremely well hidden, as I did not manage to find it again. However, I made a note of it.) Here it is:

ps2pdf -sPAPERSIZE=a4  -dCompatibilityLevel=1.3  \
 -dEmbedAllFonts=true  -dSubsetFonts=true  -dMaxSubsetPct=100  \
 -dAutoFilterColorImages=false  -dColorImageFilter=/FlateEncode  \
 -dAutoFilterGrayImages=false  -dGrayImageFilter=/FlateEncode  \
 -dAutoFilterMonoImages=false  -dMonoImageFilter=/CCITTFaxEncode  \
 document.ps  document.pdf

I have since learned to understand the options. They are named, but hardly explained in the ps2pdf documentation which consists of the file Ps2pdf.htm in the Ghostscript documentation directory (use locate Ps2pdf.htm to find it).

The important thing for the image quality is AutoFilter...Images=false and ...ImageFilter=/FlateEncode. The first disables the automatic determination by Ghostscript of the "best" compression format, which tends to favour /DCTEncode, lossy JPEG encoding. The second set of options manually set the compression method to the lossless (de)flate encoding for colour and greyscale images and to CCITT encoding for monochrome images.

The other options are for maximum compatibility of the generated PDF file. CompatibilityLevel sets the PDF version. The remaining options concern embedding of fonts into the generated PDF. EmbedAllFonts=true is self-explanatory and causes the output file to be readable even on systems which lack some of the fonts used. SubsetFonts=true together with MaxSubsetPct=100 causes the fonts to be embedded partly only, however many characters from them may be used. This protects you from lawsuits if you use copyrighted fonts, as embedding a font in full amouts to an illegal copy. Last, the option -sPAPERSIZE=a4 doesn't seem necessary unless you convert from some other size; replace a4 by letter if that is the paper size you use.

An alternative way to arrive at a PDF file, if you do not require a PostScript file, is to use pdftex or pdflatex instead of tex or latex. In my experience, pdflatex embeds all fonts by default, as subsets, so you are safe on both the compatibility and the copyright issue. However, to be able to use pdflatex, you have to convert graphics into PDF format (or PNG for pixel graphics). To avoid any loss of quality, this should be done with the same ps2pdf command line shown above. The options relating to font embedding should not be omitted, as vector graphics can contain text which requires fonts. The paper size option should be omitted.

As an aside, the options of ps2pdf above can be required in different contexts as well. That is because ps2pdf is just a script calling the Ghostscript interpreter (gs) and passes its options to it unchanged. gs can be used for tasks as diverse as concatenating PDF files, with the command line

gs -dBATCH -dNOPAUSE -dSAFER -sDEVICE=pdfwrite -sOUTPUTFILE=output.pdf  \
   <ps2pdf options>  source1.pdf source2.pdf ...

where <ps2pdf options> stands for the options given above. The options conserving image quality are especially useful when putting the scanned pages of a document together (even the large copier at my office outputs single-page PDF files unless you can put a stack of loose pages into its automatic feed). You can use gs with the same command line and only one source file to embed fonts into a PDF document without regenerating it, provided the fonts are available on the system where you do it. Unfortunately the resulting document can be significantly larger, not because of the embedded fonts, but because gs is inefficient at re-encoding the images (you can see that it is not due to the fonts by trying -dEmbedAllFonts=false).

You can use the pdffonts command to find out which of the fonts used in a PDF document are embedded, and whether they are embedded as subsets.