Posts for the month of November 2021

Filtering embedded timestamps from PNGs

PNG files created by ImageMagick include an embedded timestamp. When programatically generating images, sometimes an embedded timestamp is undesirable. If you want to ensure that your input data always generates the exact same output data, bit-for-bit, embedded timestamps break what you're doing. Cryptographic signature schemes, or content-addressible storage mechanisms do not allow for even small changes to file content without changing their signature or address. Or if you are regenerating files from some source and tracking changes in the generated files, updated timestamps just add noise.

The PNG format has an 8-byte header followed by chunks with a length, type, content, and checksum. The embedded timestamp chunk has a type of tIME. For images created directly by ImageMagick, there are also creation and modification timestamps in tEXt chunks.

To see this for yourself:

convert -size 1x1 xc:white white-pixel-1.png
sleep 1
convert -size 1x1 xc:white white-pixel-2.png
cmp -l white-pixel-{1,2}.png

That will generate two 258-byte PNG files, and show the differences between the binaries. You should see output something like this:

122   3   4
123 374 142
124  52 116
125 112 337
126 123 360
187  63  64
194 156 253
195 261  26
196  32  44
197  30 226
236  63  64
243  37 332
244 354 113
245 242 234
246 244  52

I have a project where I want to avoid these types of changes in PNG files generated from processed inputs. We can remove these differences from the binaries by iterating over the chunks and dropping those with a type of either tIME or tEXt. So I wrote a bit of (Python 3) code (png_chunk_filter.py) that allows filtering specific chunk types from PNG files without making any other modifications to them.

./png_chunk_filter.py --verbose --exclude tIME --exclude tEXt \
    white-pixel-1.png white-pixel-1-cleaned.png
./png_chunk_filter.py --verbose --exclude tIME --exclude tEXt \
    white-pixel-2.png white-pixel-2-cleaned.png
cmp -l white-pixel-{1,2}-cleaned.png

Because of the --verbose option, you should see this output:

Excluding tEXt, tIME chunks
Found IHDR chunk
Found gAMA chunk
Found cHRM chunk
Found bKGD chunk
Found tIME chunk
Excluding tIME chunk
Found IDAT chunk
Found tEXt chunk
Excluding tEXt chunk
Found tEXt chunk
Excluding tEXt chunk
Found IEND chunk
Excluding tEXt, tIME chunks
Found IHDR chunk
Found gAMA chunk
Found cHRM chunk
Found bKGD chunk
Found tIME chunk
Excluding tIME chunk
Found IDAT chunk
Found tEXt chunk
Excluding tEXt chunk
Found tEXt chunk
Excluding tEXt chunk
Found IEND chunk

The cleaned PNG files are each 141 bytes, and both are identical.

usage: png_chunk_filter.py [-h] [--exclude EXCLUDE] [--verbose] filename target

Filter chunks from a PNG file.

positional arguments:
  filename
  target

optional arguments:
  -h, --help         show this help message and exit
  --exclude EXCLUDE  chunk types to remove from the PNG image.
  --verbose          list chunks encountered and exclusions

The code also accepts - in place of the filenames to read from stdin and/or write to stdout so that it can be used in a shell pipeline.

Another use for this code is stripping every unnecessary byte from a png to acheive a minimum size.

./png_chunk_filter.py --verbose \
    --exclude gAMA \
    --exclude cHRM \
    --exclude bKGD \
    --exclude tIME \
    --exclude tEXt \
    white-pixel-1.png minimal.png

That strips our 258-byte PNG down to a still-valid 67-byte PNG file.

Filtering of PNG files solved a problem I faced; perhaps it will help you at some point as well.