Filtering embedded timestamps from PNGs
PNG files created by ImageMagick include an embedded timestamp. When programatically generating images, sometimes an embedded timestamp is undesirable. If you want to ensure that your input data always generates the exact same output data, bit-for-bit, embedded timestamps break what you're doing. Cryptographic signature schemes, or content-addressible storage mechanisms do not allow for even small changes to file content without changing their signature or address. Or if you are regenerating files from some source and tracking changes in the generated files, updated timestamps just add noise.
The PNG format has an 8-byte header followed by chunks with a length, type, content, and checksum. The embedded timestamp chunk has a type of tIME. For images created directly by ImageMagick, there are also creation and modification timestamps in tEXt chunks.
To see this for yourself:
convert -size 1x1 xc:white white-pixel-1.png sleep 1 convert -size 1x1 xc:white white-pixel-2.png cmp -l white-pixel-{1,2}.png
That will generate two 258-byte PNG files, and show the differences between the binaries. You should see output something like this:
122 3 4 123 374 142 124 52 116 125 112 337 126 123 360 187 63 64 194 156 253 195 261 26 196 32 44 197 30 226 236 63 64 243 37 332 244 354 113 245 242 234 246 244 52
I have a project where I want to avoid these types of changes in PNG files generated from processed inputs. We can remove these differences from the binaries by iterating over the chunks and dropping those with a type of either tIME or tEXt. So I wrote a bit of (Python 3) code (png_chunk_filter.py) that allows filtering specific chunk types from PNG files without making any other modifications to them.
./png_chunk_filter.py --verbose --exclude tIME --exclude tEXt \ white-pixel-1.png white-pixel-1-cleaned.png ./png_chunk_filter.py --verbose --exclude tIME --exclude tEXt \ white-pixel-2.png white-pixel-2-cleaned.png cmp -l white-pixel-{1,2}-cleaned.png
Because of the --verbose option, you should see this output:
Excluding tEXt, tIME chunks Found IHDR chunk Found gAMA chunk Found cHRM chunk Found bKGD chunk Found tIME chunk Excluding tIME chunk Found IDAT chunk Found tEXt chunk Excluding tEXt chunk Found tEXt chunk Excluding tEXt chunk Found IEND chunk Excluding tEXt, tIME chunks Found IHDR chunk Found gAMA chunk Found cHRM chunk Found bKGD chunk Found tIME chunk Excluding tIME chunk Found IDAT chunk Found tEXt chunk Excluding tEXt chunk Found tEXt chunk Excluding tEXt chunk Found IEND chunk
The cleaned PNG files are each 141 bytes, and both are identical.
usage: png_chunk_filter.py [-h] [--exclude EXCLUDE] [--verbose] filename target Filter chunks from a PNG file. positional arguments: filename target optional arguments: -h, --help show this help message and exit --exclude EXCLUDE chunk types to remove from the PNG image. --verbose list chunks encountered and exclusions
The code also accepts - in place of the filenames to read from stdin and/or write to stdout so that it can be used in a shell pipeline.
Another use for this code is stripping every unnecessary byte from a png to acheive a minimum size.
./png_chunk_filter.py --verbose \ --exclude gAMA \ --exclude cHRM \ --exclude bKGD \ --exclude tIME \ --exclude tEXt \ white-pixel-1.png minimal.png
That strips our 258-byte PNG down to a still-valid 67-byte PNG file.
Filtering of PNG files solved a problem I faced; perhaps it will help you at some point as well.
Attachments (1)
-
png_chunk_filter.py
(2.8 KB) -
added by retracile 3 years ago.
png_chunk_filter.py
Download all attachments as: .zip
Comments
No comments.