Metadata API for E-mail

This document is obsolete for Evolution's Tracker EPlugin in Tracker's master and > 0.7 releases: it doesn't use this API anymore, instead it uses SPARQL Update directly against tracker-store.

Authors

  • Philip Van Hoof <philip at codeminded dot be>

About

Several applications on our desktop computers and mobile devices want to know about the metadata of E-mails. This document is an attempt at specifying a D-Bus API for retrieving this metadata from E-mail clients. The specification enforces a way so that the components will see themselves being updated (the E-mail client will send new data actively, as it's a push mechanism, not a pull mechanism).

Status

  • Implemented in Tracker as an EPlugin for Evolution. If you want these DBus interfaces on your system, install Tracker and its Evolution EPlugin together with Evolution and Evolution Data Server 2.25.5 or higher
  • Open question for FDO or XDG-list: org.freedesktop.email.metadata (current) or org.freedesktop.metadata.email
  • As a (official) FDO specification this is a draft

Why no D-Bus signals?

Because not everything on your desktop wants to receive these events in broadcast messages. Instead you want the E-mail client to send, especially the initial data, peer to peer. There's also a small amount of state involved: the last_modseq. With broadcasted D-Bus signals it would not be possible for a metadata recipient to tell the E-mail client exactly what it is that it should start sending. Unless you'd add a cookie to the D-Bus signals, of course. But that sounded even more ugly to me, to be honest. It would of course be possible to let the client pass a DBus Object that has signals that the service would invoke. I didn't see a reason to pick that over a DBus interface with methods that have to be implemented.

List of E-mail clients implementing this

Feel free to add your own E-mail client to this list if it implements this specification.

E-mail client

Service

Status

Version since support

Evolution

org.gnome.evolution

Implemented as a EPlugin in Tracker

Evolution: 2.25.5, Tracker: 0.7.0

KMail

org.kde.kmail

Implemented natively in KMail

KDE: 4.3, Tracker: 0.7.0

Modest

In progress

In progress

N/A

Thunderbird

com.mozilla.thunderbird

In progress (git://git.mymadcat.com/thunderbird-tracker)

N/A

Ontologies

More predicates might be added to this list at any later time. List predicates are simply repeated. If a predicate's value is the empty string and the predicate is a list-predicate, it means that the list is to be reset. Also take a look at the section List field predicates lower.

Shared

EMailMeta:MessageSubject

The subject of the message

EMailMeta:MessageSent

A date formatted as a string indicating when the E-mail was sent. Use the ISO 8601 format "YYYY-MM-DDThh:mm:ss.sTZD" (eg 1997-07-16T19:20:30.45+01:00)

EMailMeta:MessageFrom

A string with one E-mail address indicating who sent the E-mail (ie. "Real Name <real@name.com>")

EMailMeta:MessageTo

A string with one E-mail address indicating to whom the E-mail was sent (ie. "Real Name <real@name.com>"). This is a list predicate.

EMailMeta:MessageCc

A string with one E-mail address indicating to whom the E-mail was sent in "Carbon Copy" (ie. "Real Name <real@name.com>"). This is a list predicate.

EMailMeta:MessageSeen

"True" or "False" indicating whether or not the message was seen by the user

EMailMeta:MessageAnswered

"True" or "False" indicating whether or not the message was marked as answered

EMailMeta:MessageDeleted

"True" or "False" indicating whether or not the message was marked for deletion. This is not the same as the message having been "unset". A message that gets unset means that it's "wiped" or "expunged". EMailMeta:MessageDeleted=True means that the message only got flagged for deletion. It's likely that soon after an unset will occur for this message, but not certain. When the value is set to False it means that the user undeleted a message.

EMailMeta:MessageSize

The predicted size of the full message. This is not always accurate and could for example depend on the message's transfer-encoding.

EMailMeta:MessageForwarded

"True" or "False" indicating whether or not the message was forwarded

Your E-mail client's specific fields

Yes, feel free to add it here. This is a wiki, go ahead.

KMail's specific fields (KDE desktop)

Unfinished, being implemented as we speak

KMail:MessageIdMD5

Message ID in a MD5-ed string

KMail:MessageUID

The UID of the E-mail in a string as known on the service. For example the UID at the IMAP server of the message.

KMail:MessageTag

String containing a tag of the message. This is a list predicate.

KMail:MessageSerNum

Message serial number in KMail. This is a unique number per E-mail. Unique in KMail. As a string

KMail:MessageSpam

"True" or "False" indicating whether or not the message was marked as spam by either E-Mail client's spam plugins or by the user

KMail:MessageHam

"True" or "False" indicating whether or not the message was marked as ham by either E-Mail client's spam plugins or by the user

Modest's specific fields (Maemo mobile)

Unfinished

Evolution's specific fields (GNOME desktop)

You can find the code that defines these fields here: tracker-evolution-common.h

Evolution:MessageFile

An empty string or a filename that is formatted as RFC822 or MBox containing the source of the E-mail or formatted as a decoded attachment (Save File As format of an attachment). If the filename ends with "/!num" then num is the seek position in the file. The / here means the directory separator of the operating system. If empty string is used as value, it means that no file is yet available. This is a list predicate as multiple files can be involved.

Evolution:MessageUid

The UID of the E-mail in a string as known on the service. For example the UID at the IMAP server of the message.

Evolution:MessageFlagged

"True" or "False" indicating whether or not the message was marked as flagged

Evolution:MessageTag

A tag placed on a message formatted as "key=value". For boolean keys the value will be either "True" or "False". For example: "school=True" for E-mails that are tagged with "school". This is a list predicate.

Evolution:MessageJunk

"True" or "False" indicating whether or not the message was marked as junk by either E-Mail client's spam plugins or by the user

org.freedesktop.email.metadata.Manager

Path: /org/freedesktop/email/metadata/Manager.

This is the object that a registrar must use to register itself as a component that is interested in metadata.

<?xml version="1.0" encoding="UTF-8"?>
<node>
  <interface name="org.freedesktop.email.metadata.Manager">

Meaning of modification sequence

The modification sequence is a number that points to a certain state in the past. The registrar will give you as the last_modseq parameter what you (as E-mail source service) gave as modseq the last time you pushed anything to the registrar. If you for example passed 100 as modseq in a SetMany, the registrar went away, the registrar comes back with Register and passes you that 100 as last_modseq wheras your current modification sequence is now 112, then you know that the different that you need to push to the registrar is 112 - 100. Giving you a method to determine the exact delta that you must push. If your finest resolution is a second, then this number is compatible with time().

Registering yourself

    <method name="Register">
      <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/>
      <arg type="o" name="registrar_path" direction="in" />
      <arg type="u" name="last_modseq" direction="in" />
    </method>

Parameters:

  • registrar_path: The path of your registrar

  • last_modseq: Last received modification sequence. The E-mail client uses this to know which updates it must push and when.

  </interface>
</node>

List field predicates

List field predicates EMailMeta:MessageCc and EMailMeta:MessageTo are passed by having the same predicate ID multiple times.

SetMany ("mailURI", ["EMailMeta:MessageCc", "EMailMeta:MessageCc","EMailMeta:MessageCc"],
                    ["Some Body 1 <some@body1.com>",
                     "Some Body 2 <some@body2.com>",
                     "Some Body 3 <some@body3.com>"] )

org.freedesktop.email.metadata.Registrar

Path: The E-mail client uses sender = dbus_g_method_get_sender (context); in Register for the service, and registrar_path as passed for Register (see above) as path. Certain people refer to this technique as a DBus register pattern.

This is the interface your registrar must implement to receive metadata updates. The E-mail client will call these methods on your registrar. The method calls will inform you about unsets, updates and sets. A set and an update are the same thing: you just always assume the last set intents to update your already existing record. If a record didn't yet exist and a set occurs, you simply assume that you can or should create it. Or ignore it, if you don't need it.

<?xml version="1.0" encoding="UTF-8"?>
<node>
  <interface name="org.freedesktop.email.metadata.Registrar">

Sets metadata

    <method name="Set">
      <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/>
      <arg type="s" name="subject" direction="in" />
      <arg type="as" name="predicates" direction="in" />
      <arg type="as" name="values" direction="in" />
      <arg type="u" name="modseq" direction="in" />
    </method>

Parameters:

  • subject: the URI of the resource

  • predicates: predicates about subject to set

  • values: the values for the predicates of subject

  • modseq: the current modification sequence

    <method name="SetMany">
      <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/>
      <arg type="as" name="subjects" direction="in" />
      <arg type="aas" name="predicates" direction="in" />
      <arg type="aas" name="values" direction="in" />
      <arg type="u" name="modseq" direction="in" />
    </method>

Parameters:

  • subjects: the URI of the resources

  • predicates: predicates about subjects to set

  • values: the values for the predicates of subjects

  • modseq: the current modification sequence

Unsets metadata

    <method name="Unset">
      <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/>
      <arg type="s" name="subject" direction="in" />
      <arg type="u" name="modseq" direction="in" />
    </method>

Parameters:

  • subject: the URI of the resource to remove

  • modseq: the current modification sequence

    <method name="UnsetMany">
      <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/>
      <arg type="as" name="subjects" direction="in" />
      <arg type="u" name="modseq" direction="in" />
    </method>

Parameters:

  • subjects: the URI of the resources to remove

  • modseq: the current modification sequence

Cleanup metadata

    <method name="Cleanup">
      <annotation name="org.freedesktop.DBus.GLib.Async" value="true"/>
      <arg type="u" name="modseq" direction="in" />
    </method>

Parameters:

  • modseq: the current modification sequence

This method is a request to cleanup all records that originated from this service. This method will be called whenever last_modseq in Register (see above) was too old, for example. It allows the E-mail client to tell your registrar to start over, because the status of its E-mail metadata data is potentially too old for normal synchronization.

  </interface>
</node>

How to implement a metadata consumer

A complete example written in Vala can be found here.

ps. This is a hypothetical not-existing programming language that looks a little bit like Vala mixed with C#, Java and added with some high level language features for integrating with DBus.

public class My.Registrar implements org.freedesktop.email.metadata.Registrar {

  [Stored]
  private uint last_modseq = 0;

  public Registrar () {
      org.freedesktop.email.metadata.Manager manager =
            get_manager_for_service ("org.gnome.evolution");
      manager.Register (this, last_modseq);
  }

  private void SetInternal (string subject, string[] predicates, string[] values) {
       Resource r = TripleStore.Get ("Evolution", subject);
       int i = 0;
       bool create = false;

       if (r == null) {
          r = new Resource ();
          create = true;
       }

       foreach (string predicate in predicates) {
          r.update (predicate, values[i]);
          i++;
       }
 
       if (create)
           TripleStore.Create ("Evolution", r);

  }

  public void SetInternal (string subject, string[] predicates, string[] values, uint modseq) {
       this.SetInternal (subject, predicates, values);
       last_modseq = modseq;
  }

  public void Cleanup (uint modseq) {
     TripleStore.Cleanup ("Evolution");
      last_modseq = modseq;
  }

  public void SetMany (string[] subjects, string[][] predicates, string [][]values, uint modseq) {
     int i = 0;
     foreach (string subject in subjects) {
        this.SetInternal (subject, predicates[i], values[i]);
        i++;
     }
     last_modseq = modseq;
  }

  public void UnsetMany (string[] subjects, uint modseq) {
     foreach (string subject in subjects) {
        this.UnsetInternal (subject);
     }
     last_modseq = modseq;
  }

   private void UnsetInternal (string subject) {
      TripleStore.Remove ("Evolution", subject);
   }

   public void Unset (string subject, uint modseq) {
      this.UnsetInternal (subject)
      last_modseq = modseq;
   }
}

Example

In this example I use the modseq 1245 twice and afterward a new modification sequence 1246. That number must be the number that we got last time from either SetMany, Unset, UnsetMany, Cleanup or Set. The E-mail client is allowed to give you numbers like 1, 2, 3 and 4. It isn't required to give any meaning whatsoever to this number. The only meaning it must handle is the meaning it itself gave to it: being a registrar, don't do calculations with this number.

This is why the number is called modification sequence, it's not a timestamp. Sure it might look to you like a timestamp, it's by specification NOT a timestamp. It's a modification sequence.

Assume we're at modification sequence 1245 and we run the test application with it. While running the test application the user sets the \Seen property of an E-mail. We now see this event being caught by the test application. We also get a new modification sequence n# 1246.

$ ./valaclientsample 1245
Registrar for org.gnome.evolution
Activating org.gnome.evolution
[1246] setmany (org.gnome.evolution): 1
- [1246] setmany (org.gnome.evolution): email://1244374827.7697.0@lors/INBOX/14951
^C

Instead of using the new modification sequence we repeat using 1245. We get the setmany again:

$ ./valaclientsample 1245
Registrar for org.gnome.evolution
Activating org.gnome.evolution
[1247] setmany (org.gnome.evolution): 1
- [1247] setmany (org.gnome.evolution): email://1244374827.7697.0@lors/INBOX/14951

This time we do use the new modification sequence, we see that no setmany calls arrive:

$ ./valaclientsample 1246
Registrar for org.gnome.evolution
Activating org.gnome.evolution

How the E-mail client will behave and how your registrar should behave

Note

I have only written it down for Evolution at this moment. KMail is being implemented by the same developer who implemented the EPlugin for Evolution as we speak (that's me). I will try to make sure that the observable behaviour is as generic and shared as possible. Not all applications have the same capabilities but I will also attempt to make a list of the minimal set of expected behaviours.

Don't kill me for only mentioning Evolution here at this moment. I'm working hard to involve as much E-mail clients as possible.

Evolution

You'll pass a last_modseq parameter. If that parameter is 0 it means that Evolution will immediately start with a initial import of all E-mails into your registrar. If your last_modseq is too old then Evolution will start with a Cleanup call followed by a new initial import of all E-mails into your registrar. During this import it will mostly be calling your SetMany method. You should handle SetMany in such a way that it can cope with quite a lot of data: Evolution will group the items together in such a way that it doesn't hammer D-Bus too much. It will for example group metadata of ~ 2000 E-mails together in one SetMany call.

You should store the modseq whenever you receive either a Set, Unset, SetMany, UnsetMany or Cleanup. Evolution is indeed an observable and you are an observer: You want to be notified of the changes that happened "while you where not observing" too, of course. Evolution will calculate the delta of changes "since your last modseq" (because that's what you register yourself with as last_modseq parameter) and it will push a delta to your registrar.

While Evolution is running, and if your registrar keeps itself logged on, it might throw additional updates in your direction. Evolution will usually use the SetMany method for those. It's possible that Evolution uses Set, though. There are no API guarantees whether Set or SetMany will be used for this.

When Evolution deleted an E-mail then it'll either call Unset or UnsetMany. This is an "expunge" event, it's not just "mark as deleted"! The EMailMeta:MessageDeleted predicate tells you about when a message got "marked for deletion". It's possible that shortly after a EMailMeta:MessageDeleted=True Set or SetMany an Unset or UnsetMany will take place. It's NOT certain, however. It's also not the same. In case of a EMailMeta:MessageDeleted=False Set or SetMany you should recover from the "deleted" state. After Unset or UnsetMany it means that the message is "permanently" deleted. The EMailMeta:MessageDeleted predicate says nothing about permanent deletions.

Evolution will stop sending you these events if you unregister (just disconnect the DBus proxy. In DBus-GLib you do this by finalizing the DBusGProxy, or call the ReleaseName method on DBUS_PATH_DBUS with your DBus object's path as parameter). You should record the last modseq that you got in either Set, Unset, SetMany, UnsetMany or Cleanup for the next time you register yourself at Evolution's Manager as a registrar (use that number to register yourself).

Relevant bugs for Evolution's implementation

Sample data

Format of this sample data is Turtle. This doesn't mean that it goes over the D-Bus wire in this format. It's just a notation format for RDF triples.

The sample data 's format goes like this:

<subject> <predicateA> "valueA" ;
          <predicateB> "valueB" ;
          <predicateN> "valueN" .

Sample data in Turtle format:

Again. Sorry for making an example that would come from Evolution. I will add examples for KMail as soon as its implementation is finished.

<email://user@mailserver/INBOX/1> <rdf:type> <Evolution:Message>;
 <EMailMeta:MessageTo> "Some1 Body <some1@body.com>" ;
 <EMailMeta:MessageFrom> "Some2 Body <some2@body.com>" ;
 <EMailMeta:MessageCc> "Some3 Body <some3@body.com>" ;
 <EMailMeta:MessageCc> "Some4 Body <some4@body.com>" ;
 <EMailMeta:MessageTo> "Some5 Body <some5@body.com>" ;
 <EMailMeta:MessageSubject> "O, hai!" ;
 <EMailMeta:MessageSent> "1997-07-16T19:20:30.45+01:00" ;
        <Evolution:MessageFile> "/home/user/.evolution/mail/imap/user@mailserver/folders/INBOX/1.";
        <Evolution:MessageUid> "1" ;
        <EMailMeta:MessageSeen> "True" ;
        <EMailMeta:MessageJunk> "False" ;
        <EMailMeta:MessageAnswered> "False" ;
        <EMailMeta:MessageFlagged> "False" ;
        <Evolution:MessageTag> "about-school=True" ;
        <Evolution:MessageTag> "from-friends=True" ;
        <EMailMeta:MessageForwarded> "False" ;
        <EMailMeta:MessageDeleted> "False" ;
        <EMailMeta:MessageSize> "4000" .

Apps/Evolution/Metadata (last edited 2013-08-08 22:50:06 by WilliamJonMcCann)