Redirected from page "MediaArtStorageSpec"

Clear message

Media Art Storage Spec

Version: This is a Draft

Authors

Goal

This specification provides a mechanism for applications to store and retrieve artwork associated with media content, like music from an album, the logo for a radio station, or a graphic representing a podcast. The storage medium for artwork is a file system, inside a user's home directory.

UPDATE 2010-02-18: Banshee now implements a simplified version of this spec, see bug 520516, comment 58

UPDATE 2013-10-02: There is now a GNOME platform library which implements this spec, see https://git.gnome.org/browse/libmediaart/

Things to think about

  • :D support for saving multiple pieces of artwork for a given identifier (i.e. the front cover, the back cover, pages in a booklet)

    • What about using the prefix for this? (album-front-MD5.jpeg, album-back-MD5.jpeg, etc)
  • :D a way to mark artwork as temporary, to be GCed from the cache after some time (i.e. playing from last.fm, probably don't want to keep the cover art saved for too long if the album is not stored locally in the user's database)

Dependencies

This specification depends on the XDG Base Directory Specification for the definition of the XDG_CACHE_HOME variable.

Generating a unique and safe file name

Stripping characters

For any identifier being mentioned below you need to strip unwanted characters

  • Any combination of two (), {}, [], <> with anything written inside of the two. For example (01), [1], [10], <etc>, {f (bar) oo} must be stripped away

  • The list of unwanted characters is as follows: ()_{}[]!@#$^&*+=|\/"'?<>~`

    • There are many, many tracks and albums with titles entirely composed of characters from this set -- stripping them will prevent accurate cover art from being stored by any conforming media player. If the user has many different spellings of the same artist/album, I doubt they'll mind a few extra copies of the cover art hanging around.
  • Two subsequent whitespaces must be turned into one space
  • Leading spaces must be removed
  • Trailing spaces must be removed
  • Any tab is turned into a space
  • Non-ASCII text must be normalized to its non-combined, compatibility form (NFKD), using the g_utf8_normalize() function (or equivalent).

Also check out the sample character strip code in C.

Identifiers

You have to create two identifiers for the media. What to use is specified here:

  • Prefix "podcast" in case the media is a Podcast as A
  • The MD5 of the lowercase of podcast's title, filtered using character stripping, in case the media is a Podcast, as B
  • The MD5 of " " (whitespace), as C
    • Why? The "podcast-" prefix already prevents collisions with other works from the same artist. There's no need to add a constant string to the end of the filename.
  • Prefix "radio" in case the media is a Radio station's stream, as A
  • The MD5 of lowercase of the radio's station name, filtered using character stripping, in case the media is a Radio station's stream, as B
  • The MD5 of " " (whitespace), as C
    • Ditto as for podcasts.
  • Prefix "album" in case the media is part of an album, as A
  • You take the MD5 of the lowercase of artist filtered using character stripping falling back to the lowercase of track-artist filtered using character stripping falling back to " " (whitespace), as B.
    • Using the track artist for per-album art might prevent it from being cached properly. Better to use the album artist only. It's the responsibility of the media player to decide what the album artist is, in the case of absent metadata.
  • You take the MD5 of the lowercase of album filtered using character stripping falling back to " " (whitespace), as C.
  • Prefix "artist" in case the media is part of an artist, as A
  • You take the MD5 of the lowercase of artist filtered using character stripping falling back to the lowercase of track-artist filtered using character stripping falling back to " " (whitespace), as B.
    • Using the track artist for per-album art might prevent it from being cached properly. Better to use the album artist only. It's the responsibility of the media player to decide what the album artist is, in the case of absent metadata.
  • You take the MD5 of the lowercase of album filtered using character stripping falling back to " " (whitespace), as C.
  • Artist art is different from album art in that artist art focuses on the artist instead of on the album
  • Prefix "track", for per-track cover art (such as NIN's "Ghosts" album)
  • track-{digest track-artist}-{digest album-title}-{digest track-title}

If you want to store custom named art, you are required to use your productname as A, and the MD5 of whatever your come up with as B, and the MD5 of whatever your come up with as C. Any prefix A that you choose should match [a-z0-9]+. These self-chosen prefixes should not be mentioned here unless you plan to follow the exact specification for the prefix as defined in this specification (consider this specification as a global registration for prefixes).

Digests are calculated using the text encoded in UTF-8.

The storage

The location of the album-art image files is $XDG_CACHE_HOME/media-art. To determine the filename, concatenate A, "-", B, "-", C and ".jpeg".

$XDG_CACHE_HOME/media-art/A-B-C.jpeg

Additionally, when an application is writing data to this file, it should further append a .part suffix to the path. Only when the application is done writing to the file (i.e. it is downloading from a web service), should the file be moved to the final path. This prevents other applications from reading partially downloaded artwork.

However, applications which will download and store artwork according to this specification should check for the .part file before downloading, in case another application is already downloading the same artwork.

Example: Album of Metallica, And Justice For All

Lowercase of "Metallica" is "metallica" and for "And Justice For All" is "and justice for all. The MD5 for those two are 3c2234a7ce973bc1700e0c743d6a819c and 3d422ba022ae0daa8f5454ba7dfa0f9a. We separate them with a dash, we prepend to them $XDG_CACHE_HOME/media-art/, the kind (which is "album"), a dash and we append to them ".jpeg". That gives us:

A = album
B = "metallica"
C = "and justice for all"

/home/user/.cache/media-art/album-3c2234a7ce973bc1700e0c743d6a819c-3d422ba022ae0daa8f5454ba7dfa0f9a.jpeg

Example: Artist art for Metallica, And Justice For All

Lowercase of "Metallica" is "metallica" and for "And Justice For All" is "and justice for all. The MD5 for those two are 3c2234a7ce973bc1700e0c743d6a819c and 3d422ba022ae0daa8f5454ba7dfa0f9a. We separate them with a dash, we prepend to them $XDG_CACHE_HOME/media-art/, the kind (which is "artist"), a dash and we append to them ".jpeg". That gives us:

A = artist
B = "metallica"
C = "and justice for all"

/home/user/.cache/media-art/artist-3c2234a7ce973bc1700e0c743d6a819c-3d422ba022ae0daa8f5454ba7dfa0f9a.jpeg

The difference with album-art is that artist art focuses on the artist, instead of on the album.

Example: Radio ga ga

A = radio
B = "radio ga ga"
C = " "

/home/user/.cache/media-art/radio-b924ce08955675c6a30c745d18286d21-7215ee9c7d9dc229d2921a40e899ec5f.jpeg 

Example: World Soccer - Daily Podcast

A = podcast
B = "world soccer"
C = "daily podcast"

/home/user/.cache/media-art/podcast-d717b10ec8fb35b11644995deb04b721-08d299536e562915eb133e2676396d3f.jpeg

Local media art

In some situations it is desirable to have a local media art repository. This is a read-only collection of media art that is shared among different users or different computers. For example a CD-ROM with images, could include the media art for this media such that they do not need to be generated for every user or computer accessing this CD-ROM.

Because the URI of such media is not constant (a CD-ROM for example can be mounted at different locations) the media art should be in a relative path from the original image.

The location for local media art will be a subdirectory .mediaartlocal/ in the same directory as where the album's files are located. To determine the filename part you use the same rules as above.

* This can be tricky, because albums aren't necessarily contained in one directory. Whether art for an album is detected would depend on whether art is treated as a per-album or per-track object.

Example: Album of Metallica, And Justice For All on a USB stick

Lowercase of "Metallica" and "And Justice For All" are "metallica" and "and justice for all". The MD5 for those are 3c2234a7ce973bc1700e0c743d6a819c and 3d422ba022ae0daa8f5454ba7dfa0f9a. We separate them with a dash, we prepend to these /media/USBStick/Metallica/And Justice For All/.mediaartlocal/, the kind (which is "album"), a dash and we append ".jpeg". That gives us:

/media/USBStick/Metallica/And Justice For All/.mediaartlocal/album-3c2234a7ce973bc1700e0c743d6a819c-3d422ba022ae0daa8f5454ba7dfa0f9a.jpeg

Thumbnails of media art

Thumbnails of media art follows the thumbnail-spec. The URI used to determine the thumbnail path is the full URI pointing to the original media art. For the path to the thumbnail refer to the thumbnail-spec itself. A media art fetcher is allowed to store the normal and large thumbnails immediately after download of the media art is completed. A media art fetcher is however not required to do this by itself (the thumbnail infrastructure will or should take care of this, if the media art isn't thumbnailed yet).

Example: Thumbnails of the media art for Metallica, And Justice For All

Path to the original media art (see above):

/home/user/.cache/media-art/album-3c2234a7ce973bc1700e0c743d6a819c-3d422ba022ae0daa8f5454ba7dfa0f9a.jpeg

As a URI (just prepend file:// in this case):

file:///home/user/.cache/media-art/album-3c2234a7ce973bc1700e0c743d6a819c-3d422ba022ae0daa8f5454ba7dfa0f9a.jpeg

Paths to the thumbnails according to the Thumbnail spec:

/home/user/.thumbnails/normal/d76be6150d0684adeb46cc42c0f2648c.png
/home/user/.thumbnails/large/d76be6150d0684adeb46cc42c0f2648c.png

Heuristics for finding media art

The heuristics for media art are:

  • Files in the directory cache specification that conform/match have priority over these rules
  • Files in the Local media art cache specification conform/match have priority over these rules

  • If the music file embeds album art, this is the first choice (for example PIC and APIC)
  • You strip the characters (like explained above) of the album and the artist strings
  • If the music file is in a directory together with other files, and there are JPEG files in that directory, then the following rules are to be taken (priority order):
    • If the album's track-count is known, no more than 3 more files and no fewer than 3-less files than the track-count can be in the folder, else we fail at heuristics
    • If the album's track-count is unknown and if more than 50 files or less than 8 files are in the folder, we fail at heuristics
    • Any file called the same as the artist, album or artist + " " + album ID3 tag plus .jpeg, .jpg. Case insensitive.
    • Any file called cover.* or front.* or album.* (jpg, jpeg). Case insensitive.
    • Any file case independent called cover.jpeg,jpg will serve as album art
    • If still no heuristics applied, we fail
    (These seem overly restrictive, and not relevant to the storing of cached media art. Perhaps the media players should be left to their own devices regarding how art is initially imported?)

DraftSpecs/MediaArtStorageSpec (last edited 2013-12-02 17:53:47 by WilliamJonMcCann)