Thumbnail Saving

The thumbnail filename is determined by a hashfunction. This paper proposes the use of MD5 as hash mechanism in the following way.

  1. You need the absolute URI for the original file.

  2. Calculate the MD5 hash for this URI. Not for the file it points to! This results in a 128bit hash and is represented by a hexadecimal number in a 32 character long string.

  3. The last 30 characters from this string determines the filename of the thumbnail. It will be stored in the subdirectories determined by the first two string characters of the hash.

  4. According to the dimension of the thumbnail you must create a directory named after the first hash character within the appropriate size dir in ~/.thumbnails. And another one named after the second hash character within the first created one.

An example will illustrate this.

Example 1. Saving a thumbnail

Consider we have the file ~/photos/me.png. We want to create a medium sized thumbnail for it, which means it will have the maximum size of 96x96 pixels and will be stored in the directory ~/.thumbnails/96x96. The absoulte uri for it is in this example file:///home/jens/photos/me.png.

The MD5 hash for this uri as a hex string is c6ee772d9e49320e97ec29a7eb5b1697. Following the steps above we must create (or use the existing) 'c' directory within ~/.thumbnails/96x96. And then the directory named '6' in the 'c' one. This results in the following directory hierarchy where the last 30 hash characters are used as thumbnail filename:

/home/jens/.thumbnails/96x96/c/6/ee772d9e49320e97ec29a7eb5b1697

Permissions

A few words regarding permissions: Every thumbnail file and all the directories (including the ~/.thumbnails dir) must have set the file permissions to 600 (only the owner has read and write permissions, see "man chmod" for details). If a user creates a thumbnail for a file where only he has read-permissions this assures that no other user can take a glance on it through the backdoor with the thumbnails.

Advantages of this approach

Previously versions of this standard used a very different mechanism. But this one has some very important advantages:

  1. Works for all kinds of possible file locations, since its based only on the textual URI representation of a file. This way files located on the locale filesystem or a samba, http, ftp or WebDAV server can be treated equally.

  2. It results in a flat directory hierarchy which assures fast access. Since the hash uses only hexadecimal characters there is a maximum number of 16 possible subdirectories on both subdir levels. If there are 256 thumbnails in every of the leafes of the directory tree we store already 16*16*256 = 65536 thumbnails!

  3. Due to the usage of the MD5 hash its unlikely that there occur clashes between two different thumbnails, even if it's theoretically possible. But the probability is very low and can be ignored in this context. The worst case would be that a thumbnail will overwrite another valid one. Ok, if they have exactly the same modification time it is theoretically possible too that a wrong thumbnail for a file will be displayed (see Detect Modifications).

  4. It's very easy to implement.

Note

There do exist a lot of different library implementations for the MD5 hash algorithm. If you don't want to add yet another library dependency to support thumbnailing in your program you can eg. use the RFC 1321 implementation by L. Peter Deutsch. It adds only 1.5kb sourecode in two files to your project and can be used without much restrictions.