Optimize PDF files - Part II
Your applications can write incredibly small PDF files if you know what you're doing. This article is intended for programmers who create PDF files programmatically using custom routines. Read Part I if you are a user saving PDFs and also to gain a general understanding of PDF optimization.
- Part I
- Part II
Amaze your users by saving small, high-quality PDF files. Users expecting multimegabyte PDFs will be pleased to find out your application requires only tens of kilobytes, even just a few kilobytes, for a simple PDF report.
This article assumes you are writing PDF files programmatically. It further assumes you are doing this with a PDF writer module or class for which you have the source code so that you can actually fine tune the PDF output. You need to know quite a bit about the PDF file format to take advantage of these techniques.
The article focuses on PDF v1.3. The optimizations are potentially as useful with other versions as well. Get PDF Reference, Adobe Portable Document Format Version 1.3, to follow the tricks.
Font optimization
Let's start with the obvious optimization: fonts.
Optimization #1: Don't embed fonts
Did you know fonts don't need to be embedded in PDF? Font embedding is optional. The PDF standard allows you to use any font, whether or not it exists on the reader's machine. If a required font is not found, PDF reader applications use font metrics (in /FontDescriptor
) to find a reasonable replacement font.
Indeed, in many cases font embedding will unnecessarily bloat the file. Consider an application that creates reports, which are mainly used by the same user on the same PC. They will display perfectly well as long as they are on the same PC (unless the user happens to uninstall the required fonts). When common fonts are used, you have good chances the fonts will always show up correctly.
Optimization #2: Use standard fonts
PDF comes with 5 standard font families. The families are Times, Helvetica, Courier, Symbol and ZapfDingbats. All PDF readers support these standard fonts. Except for ZapfDingbats, the other fonts are similar to standard Windows fonts.
The standard fonts will not need embedding. That's why you can safely use them.
PDF font | Windows font | Sample |
---|---|---|
Times | Times New Roman | Times is a serif font |
Helvetica | Arial | Helvetica is a sans-serif font |
Courier | Courier New | Courier is a fixed-width font |
Symbol | Symbol | Symbol Symbol is, well, a symbol font |
ZapfDingbats | (ZapfDingbats) | ZapfDingbats includes symbols and ornaments |
Optimized representation of text and numeric values
Don't bloat PDFs by representing text and numeric values with too many bytes. You can potentially do the same with less.
Optimization #3: Use PDFDocEncoding
This is a relatively small optimization, but simple enough. Text strings, such as /Subject
in file info or /Title
in /Outlines
, can be in either Unicode or PDFDocEncoding. Unicode takes twice the space: two bytes per character compared to one byte with PDFDocEncoding. Use Unicode only when the content cannot be represented in PDFDocEncoding.
Note that PDFDocEncoding contains a wider range of characters than WinAnsiEncoding or MacRomanEncoding. It's good news for optimizers.
Optimization #4: Optimize number of decimal digits
This optimization is for representing all numeric values in PDF. Use only as few decimals as required. It's unnecessary to bloat the file with too many useless decimals.
Write a utility function that rounds values for you. Supposing you need 2 decimals precision, the function should round like this:
1.2345 → 1.23
1.2000 → 1.2
1.0000 → 1
Stream optimization
Now we get to optimizing streams, the actual page content.
Optimization #5: Clip to viewable area
This rule is especially important if you're drawing a part of a larger graphic into PDF. When drawing graphics objects or text it's a good idea to check for page boundaries. If no part of the object will be visible, there's no point adding the respective drawing operations in the PDF file. The result will be invisible anyway. Besides, hidden data in a PDF is a security concern.
Optimization #6: Don't repeat operators unnecessarily
PDF keeps track on the currently selected color, line width, font and so on. You don't need to select the color each time you draw a line. Only set the color when it needs to change. The same goes for line width, line cap style and other drawing attributes. Keep track on the current attributes. Only change them when you need.
Optimization #7: Close polygons
When drawing a polygon, there is no need to draw the last edge (with the l
operator). Close polygons with the h
operator instead. This closes a subpath by appending a straight line segment from the current point to the starting point.
Even better optimizations are available. Instead of h
, use s
to close and stroke the path and B
to close, fill and stroke. There are even more options, see Path-painting operators in the PDF specification.
Optimization #8: Use shortcuts for splines
The default way to draw a spline curve from the current point to (x3,x3) with (x1,y1) and (x2,y2) as the control points is this:
x1 y1 x2 y2 x3 y3 c
If the current point and (x1,y1) are the same, there is a shorter form:
x2 y2 x3 y3 v
If points (x2,y2) and (x3,y3) are the same, use this shorter form:
x1 y1 x3 y3 y
Optimization #9: Use color shortcuts
The standard operators to set color are:
0.123 0.123 0.123 rg
0.123 0.123 0.123 RG
These operators take 3 values: Red, Green and Blue.
For black, gray or white colors you don't need the full RGB color space. Grayscale is enough. To select black, use one of these operators:
0 g
0 G
For white, use these:
1 g
1 G
You can do the same for any shade of gray. To select 0.123 gray, use one of the following:
0.123 g
0.123 G
Optimization #10: Compress
Compress streams to get the size down. Get a copy of the zlib library to do the compression for you. zlib is relatively straightforward to use.
Visual Basic note: VB6 cannot call the regular zlib.dll, but you can use zlibwapi.dll instead.
Small PDF samples
Here are small PDF samples with vector graphics and text. The graphic was originally created with Visustin.
- Uncompressed PDF (3987 bytes) is an optimized, but not compressed, PDF.
- Compressed PDF (2274 bytes) is the same document compressed with zlib.
- Word PDF (4641 bytes) is the same document saved by Word 2007 (with optimal settings). Not large, yet twice the size! The difference is largely due to the stream (page content). Word also unnecessarily added character widths in
9 0 obj
, even though the document should really use a PDF standard font with built-in widths.
- Part I
- Part II
Optimize PDF files - Part II
URN:NBN:fi-fe201002171381