Camel.Misc

This page discusses a few independent utility functions which don't belong elsewhere, and are too small to bother with a page on their own.

Camel.UTF8

A couple of handy UTF-8 related utility functions exist in Camel.

They came about because Camel handles a lot of external data, any of which may be badly formed. The glib api's are poor - it is too easy to write code that goes into infinite loops, or jumps off the end of strings. We also cannot enforce all-or-nothing; validating strings and throwing them away entirely if they are broken is both expensive and pointless, we want to be able to easily fail gracefully, or at least not crash.

So these functions were born to simplify UTF-8 string processing. No strict unicode validation is performed, only checks that the string obeys the basic encoding rules.

The get functions mimic the stdio fgets interface (or even more closely, the basic C, (c=*p++) 'interface'). A pointer to the current character position is provided, and the functions return 0 for the end of the string, or 0xFFFF for truncated strings for the limit function. The limit variant is used to scan binary or bounded data, and the normal version operates on c-strings. If a character is badly formed, it is simply dropped silently from the input, providing cheap and transparent string validation.

These functions are used in the same way you might use a C pointer to iterate a C string.

 guint32 camel_utf8_getc(const unsigned char **ptr);
 guint32 camel_utf8_getc_limit(const unsigned char **ptr, const unsigned char *end);

Then we have a simple string creation function. It will advance ptr to the next character position; you must have at least 4 bytes available at the current destination pointer point (Unicode Standard V3.0, section 3.8 - each UTF-8 sequence is a maximum of 4 octets).

 void camel_utf8_putc(unsigned char **ptr, guint32 c);

A GString utility function - this should be namespaced to Camel, or may no longer be necessary, when it was written GString didn't have UTF8 calls or they couldn't be found at the time.

 void g_string_append_u(GString *out, guint32 c);

Conversion functions for IMAP's version of UTF-7.

 char *camel_utf7_utf8(const char *ptr);
 char *camel_utf8_utf7(const char *ptr);

And also some helper functions for UCS2 conversion, which is used for interaction with Mozilla's NSS library (or was at one time).

 char *camel_utf8_ucs2(const char *ptr);
 char *camel_ucs2_utf8(const char *ptr);

Example: Scanning a UTF8 string

This is very simple:

        guint32 c;
        unsigned char *p = "This is a unicode string";
 
        while ((c = camel_utf8_getc(&p)) != 0)
                printf("U+%04x: %c", c, isprint(c)?c:'.');

Note the equivalent glib calls for a similar piece of code:

        guint32 c;
        gchar *p = "This is a unicode string";
 
        if (!g_utf8_validate(p, strlen(p), NULL))
                return;
 
        while ((c = g_utf8_get_char(p))) {
                printf("U+%04x: %c", c, isprint(c)?c:'.');
                p = g_utf8_next_char(p);
        }

If the string has not already been validated, it must be validated before passing it to these functions, otherwise the above loop could run off the end of the string.

Another alternative might be this:

        guint32 c;
        gchar *p = "This is a unicode string";
 
        while (p && (c = g_utf8_get_char(p))) {
                printf("U+%04x: %c", c, isprint(c)?c:'.');
                p = g_utf8_find_next_char(p, NULL);
        }

But this effectively scans the string twice, and needs a more complex inner loop.

Camel.StringUtils

A few simple string utility functions, that work on mail data types; that is, not based on locale. Some pre-date the g_ascii functions, or are just conveience functions.

Misc

A case insensitive set of functions for using with a GHashTable. Hmm, the code looks somewhat questionable for the hash function.

 int camel_strcase_equal(gconstpointer a, gconstpointer b);
 guint camel_strcase_hash(gconstpointer v);

Free a list of C strings/pointers stored in a GList - doesn't seem very widely useful.

 void camel_string_list_free(GList *string_list);

A bad implementation of a strstrcase function; that ends up just calling the locale-specific strncasecmp anyway. Needs fixing.

 char *camel_strstrcase(const char *haystack, const char *needle);

Simple ASCII case conversion functions. The first alters it's argument.

 const char *camel_strdown(char *str);
 char camel_tolower(char c);
 char camel_toupper(char c);

Camel.PString

This is something a bit more useful - it is a global reference-counted string table. Where there is the possibility that many strings may be duplicated (e.g. the To address of most e-mails), 'pstrings' may be used to store them. They are stored in a GHashTable, to ensure that each duplicate points to the same physical string, but the value stored in the table is the ref-count.

This one small utility saves significant memory managing folder data.

 const char *camel_pstring_strdup(const char *s);
 void camel_pstring_free(const char *s);

Camel.CharsetMap

This is a bunch of character set related functions for working primarily with 8-bit character sets.

They will scan blocks of characters and determine which specific character set, if any, can be used to represent all of the characters present. If none is found, it falls back to UTF-8.

There is an iterator based function and a simple-string helper.

 void camel_charset_init(CamelCharset *);
 void camel_charset_step(CamelCharset *, const char *in, int len);
 const char *camel_charset_best_name(CamelCharset *);
 
 const char *camel_charset_best(const char *in, int len);

And another helper function which will convert an ISO character set name to it's corresponding Windows charset name.

 const char *camel_charset_iso_to_windows (const char *isocharset);

Camel.FileUtils

This contains a handful of miscellaneous I/O related functions as well as some binary data file encoding and decoding functions.

A function exists to create a directory heirarchy in a single call.

 int camel_mkdir(const char *path, mode_t mode);

And a function which will encode any non-filesystem-safe characters using URL encoding. This takes a file name, not a path.

 char *camel_file_util_safe_filename(const char *name);

Create a save filename for a two-stage commit write. This takes a path which will have '.#' prepended to the filename part of it.

 char *camel_file_util_savename(const char *filename);

And then we have interruptible but atomic read and write functions. They will listen to Evolution/Camel.Operation cancellation events on the current thread and return -1 and set errno to EINTR in the event of user cancellation. Any other signals received are ignored to ensure a full read or write.

The write function will guarantee to write all of the data or return -1 for an error.

 ssize_t camel_read(int fd, char *buf, size_t n);
 ssize_t camel_write(int fd, const char *buf, size_t n);

Binary Encoders

These are used to write binary data files that if structured properly by the calling code should be relatively robust and compact. They are used for example to read and write the folder summary files, which contain information about every message in a folder.

There is a matching read and write function for each type - they must be used in strictly matching pairs, since each type may be encoded using different compression schemes.

 int camel_file_util_encode_fixed_int32(FILE *out, gint32);
 int camel_file_util_decode_fixed_int32(FILE *in, gint32 *);
 int camel_file_util_encode_uint32(FILE *out, guint32);
 int camel_file_util_decode_uint32(FILE *in, guint32 *);
 int camel_file_util_encode_time_t(FILE *out, time_t);
 int camel_file_util_decode_time_t(FILE *in, time_t *);
 int camel_file_util_encode_off_t(FILE *out, off_t);
 int camel_file_util_decode_off_t(FILE *in, off_t *);
 int camel_file_util_encode_size_t(FILE *out, size_t);
 int camel_file_util_decode_size_t(FILE *in, size_t *);
 int camel_file_util_encode_string(FILE *out, const char *);
 int camel_file_util_decode_string(FILE *in, char **);

Although these files have worked quite well over the years there is probably room for some improvement. One problem is that although a subsystem may version multiple levels of file or individual record, the versioning is strictly only for backward compatability. It would have been nice for data records to be sized, so that extra fields could just be appended to them without interfering with older versions.

Camel.HTMLParser

This is an internal object used by the indexing engine to strip tags from HTML files, and also the MIME decoding process to detect character set tags inside of HTML content. It is a particularly simple but adequate pull-driven HTML parser which tracks tags and content, and decodes entities.

It shouldn't be used in client code though; it is too limited and not supported.

Camel.ListUtils

These are currently unused, but were part of the work intended to move away from depending on e-util when Camel was inside evolution.

There is a doubly-linked and singly-linked list implementation which has a consistent and comparable API, that can be used to implement type-safe lists, queues (fifo's), and stacks.

Camel.DList

The double-linked list is based on Amiga's Exec list, the implementation originates from the public domain implementation in AmiWM. It has some particularly nice properties:

  • Adding list items to the head or tail of the list is equally as efficient.
  • Removing items from the head or tail of the list is equally as efficient.
  • Items can be removed from the list whilst walking it, in either direction.
  • Items can be removed without knowing the head of list pointer.
  • The list can be walked both forward or backward just as efficiently, and you never walk off the list (have a NULL node pointer).
  • Can be sub-classed for type-safe lists.

Each list node must contain a next and prev pointer. These would normally be type-specific pointers at the start of the structure to be stored in the list, rather than having any direct relation to CamelDListNode.

The list header itself consists of merging two separate empty HEAD and TAIL node pointers to overlap on the common NULL pointer. The structure must be initialised appropriately before use.

center

The diagram shows how the list header is built from merging a data-less HEAD and TAIL node. Instead of the code having to track the head and tail nodes separately (whether they are data-less or sentinal nodes), the combined HEAD/TAIL data is stored instead of the two pointers. Depending on how you see it, the overhead is either 1 pointer or you save three pointers for dummy tail/nodes.

 struct _CamelDListNode {
        struct _CamelDListNode *next;
        struct _CamelDListNode *prev;
 };
 
 struct _CamelDList {
        struct _CamelDListNode *head;
        struct _CamelDListNode *tail;
        struct _CamelDListNode *tailpred;
 };
 
 #define CAMEL_DLIST_INITIALISER(l) { (CamelDListNode *)&l.tail, 0, (CamelDListNode *)&l.head }
 
 void camel_dlist_init(CamelDList *v);
 CamelDListNode *camel_dlist_addhead(CamelDList *l, CamelDListNode *n);
 CamelDListNode *camel_dlist_addtail(CamelDList *l, CamelDListNode *n);
 CamelDListNode *camel_dlist_remove(CamelDListNode *n);
 CamelDListNode *camel_dlist_remhead(CamelDList *l);
 CamelDListNode *camel_dlist_remtail(CamelDList *l);
 int camel_dlist_empty(CamelDList *l);
 int camel_dlist_length(CamelDList *l);

The combination of the various add/rem head/tail functions lets any combination of list, queue and stack to be implemented using the same simple api. With only the addition of an insert function, ordered lists, and priority queues can easily be implemented.

Example: Adding to the list/queue/stack

In this example we will define a new node type MyNode which contains our actual data. Then we will loop, adding 100 list items to the end of the list. It is assumed that the list is properly initialised.

 struct _MyNode {
        struct _MyNode *next;
        struct _MyNode *prev;
        int size;
 };
 
 void buildlist(CamelDList *list)
 {
        int i;
        struct _MyNode *node;
  
        for (i=0;i<100;i++) {
                node = g_malloc(sizeof(*node));
 
                node->size = i;
                camel_dlist_addtail(list, (CamelDListNode *)node);
        }
 }

This example also shows that the next/prev pointers do not need to be initialised before use. Saving some processing time.

Example: Walking a list, removing some items

Now we walk the list of these items, removing any with an even size.

 void remeven(CamelDList *list)
 {
        struct _MyNode *next, *node;
 
        node = (struct _MyNode *)list.head;
        next = node->next;
        while (next) {
                if ((node->size & 1) == 0) {
                        camel_dlist_remove((CamelDListNode *)node));
                        free_mynode(node);
                }
                node = next;
                next = node->next;
        }
 }

Example: Clearing a list, node at a time

Another operation is to remove all of the items in a list. We could just walk the list and free the items as we go, and then reset the list node, or we could instead remove the items from either end of the list until there are none left. Although slightly less efficient, it has the advantage of the list always being in a consistent state, and is much more concise and readable.

 void remall(CamelDList *list)
 {
        struct _MyNode *node;
 
        while ((node = (struct _MyNode *)camel_dlist_remhead(list)))
                free_mynode(node);
 }

Camel.SList

This provides the same API, but for a single-linked list. Here no separate header is used, although internally the list head pointer is sometimes treated like a data-less node, to simplify the code.

Although the API is identical, the tradeoff is that by saving 2 pointers per list header and 1 pointer per node, only the addhead and remhead functions are O(1), all the others are O(n). The remove function also requires the list pointer.

 struct _CamelSListNode {
        struct _CamelSListNode *next;
 };
 
 struct _CamelSList {
        struct _CamelSListNode *head;
 };
 
 #define CAMEL_SLIST_INITIALISER(l) { 0 }
 
 void camel_slist_init(CamelSList *l);
 CamelSListNode *camel_slist_addhead(CamelSList *l, CamelSListNode *n);
 CamelSListNode *camel_slist_addtail(CamelSList *l, CamelSListNode *n);
 CamelSListNode *camel_slist_remove(CamelSList *l, CamelSListNode *n);
 CamelSListNode *camel_slist_remhead(CamelSList *l);
 CamelSListNode *camel_slist_remtail(CamelSList *l);
 int camel_slist_empty(CamelSList *l);
 int camel_slist_length(CamelSList *l);

Camel.NetUtils

Camel.NetUtils provides basic hostname lookup functions that simulate the getaddrinfo and getnameinfo interfaces if they are not available in the underlying operating system.

They are not merely just wrappers to these functions - in addition, they provide a thread-driven cancellable (using Evolution/Camel.Operation) name resolution interface, and return meaningful Evolution/Camel.Exception's if possible.

 struct addrinfo *camel_getaddrinfo(const char *name, const char *service,
                                    const struct addrinfo *hints, struct _CamelException *ex);
 void camel_freeaddrinfo(struct addrinfo *host);
 
 int camel_getnameinfo(const struct sockaddr *sa, socklen_t salen, char **host, char **serv,
                       int flags, struct _CamelException *ex);

This replaces the older pre 2.0 code which tried to use gethostbyname and had a bit of a mess trying to implement IPv6 in a clean way.

Camel.UrlScanner

This is an object used by the Evolution/Camel.MimeFilter#Camel.MimeFilterToHtml filter to scan for URL's inside of a text stream. It uses a trie as a pattern matching engine and calls callbacks when patterns are matched.

It should probably be considered an internal processor; it isn't documented in any useful way.

Camel.CertDB

The camel certdb stuff was originally created because the NSS CertDB interfaces didn't appear to work as expected.

I'm not exactly sure why; they work fine for the S/MIME code. There are no comments to shed light on the matter.

So Camel.CertDB has basically just become a hashtable for storing certificates on disk and allowing the user to override whether or not a cert's failure for any reason should be overriden.

Don't use it in new code.

Camel.JunkPlugin

This isn't really a plugin; but its idea was to become a plugin at some point. Perhaps it should have been CamelJunkDriver. It is used by CamelFilterDriver to perform junk tests - at least it's interface is.

It should be a proper CamelObject; but it it just a structure.

The Evolution junk filter plugin, EMJunkPlugin creates one of these callback structures to implement pluggable junk testers, but it hides the details entirely from the plugin code.

Since this is another undocumented object that I didn't write, I wont document it here.

Camel.Lock

Locking is notoriously messy on Unix mail systems, often using incompatible locking, no locking at all, or the locking fails on the given filesystem.

To try to simplify this, in a not-entirely-successful way, Camel allows the compiler of the software to set what locking schemes are to be tried - and the code will then try all of them.

It provides 3 separate locking implementations, and if a given implementation has been compiled out, it will just NOOP and return success.

 typedef enum {
        CAMEL_LOCK_READ,
        CAMEL_LOCK_WRITE,
 } CamelLockType;
 
 int camel_lock_dot(const char *path, CamelException *ex);
 int camel_lock_fcntl(int fd, CamelLockType type, CamelException *ex);
 int camel_lock_flock(int fd, CamelLockType type, CamelException *ex);
 
 void camel_unlock_dot(const char *path);
 void camel_unlock_fcntl(int fd);
 void camel_unlock_flock(int fd);

So unless you're building some specific locking system, you should just use camel_lock_folder, which will try all locking mechanisms. If any one of the locking mechanisms fails it will just ignore it. This seems like a bad idea, but without this, fcntl(2) locking failures over NFS will cause pointless failures. It will also clear stale 'dot' locks, and retry them if they are still active.

 int camel_lock_folder(const char *path, int fd, CamelLockType type, CamelException *ex);
 void camel_unlock_folder(const char *path, int fd);

Locking spools

In addition to these standard locking mechanisms which are used for local mailbox folders, another special locking mechanism is used for locking system spool files. For this locking we use a combination of a helper process which runs with just enough priviledges to lock the spool file, and an API which invokes and manages the locking process - called camel-lock-helper. This only performs file-based 'dot' locking; it creates a 'mailbox.lock' file in the same directory as the 'mailbox' file, and keeps it refreshed while the client still requires it locked. It also cleans up stale locks automatically.

These functions have an alternative locking api. The first returns a lock identifier, and the second unlocks it.

 int camel_lock_helper_lock(const char *path , CamelException *ex);
 int camel_lock_helper_unlock(int lockid);

The priviledges required are defined in the configure script and assigned by the Makefile. As a side-effect, installs as a user other than root may not install it with sufficient priviledges to check local spool mail.

Do we need to cover more about how camel-lock-helper works?

Apps/Evolution/Camel.Misc (last edited 2013-08-08 22:50:10 by WilliamJonMcCann)