GXml

GXml is an introspectable GObject API for XML programming. Most functionality is provided by wrapping libxml2. It endeavours to comply as best it can to the DOM Level 1 Core API (and eventually Level 2, Level 3, etc). In the future, it may provide its own native implementation, as well as provide a SAX API, an XPath API, etc; patches welcome. This summer, my mentor is Owen Taylor.

This summer is dedicated to polishing GXml and addressing issues that prevent users from using it. Planned tasks include

  • fixing memory management, and move to a model where the Document alone owns all nodes it creates or within its tree. DONE

  • fixing names (API breaks!), most prominently GXmlDomNode -> GXmlNode DONE

  • fixing error handling, rather than continue to misuse GError, we'll switch to printing G_WARNINGs when a program tries something it shouldn't DONE

  • provide more examples in C and JavaScript DONE

  • improve documentation (fix doc bugs; add more useful information) DONE

  • patch select projects (identified as yelp, glade, and libgdata) to further exercise the library in progress

  • squash bugzilla bugs mostly DONE

  • measure performance DONE

Memory Management

GXml's reference handling was previously a mess. It's written in Vala, which makes it a little easy to be thoughtless. Memory issues have been addressed in two parts

1. Valgrind and tests

Valgrind and gdb were used to better understand where memory was used, and what was the fault of GXml (e.g. reference cycles, and failing to free a lot of libxml2 allocated memory) and what was just a byproduct of GType et al. It has been used to create a collection of .supp suppression files that exist under tests/valgrind/ that help locate actual memory leaks. Consequently, most libxml2 memory leaks were squashed early on this summer.

2. Changing our ownership model

Previously, if you created a node, the caller owned it. If you inserted it into the Document's tree, then you both had a reference to it. You'd have to unref both the document and your created node. However, for the purposes of GXml, nodes only make sense while their document exists (yes, we can imagine situations where you might want Just A Node or part of a tree), so to greatly simplify reference handling, with input from my mentor Owen Taylor, the Document alone now owns references to GXml nodes. The user only unrefs the GXmlDocument that owns them. Methods that used to return strong references now return weak references.

Also GXmlDocument is responsible for freeing libxml2 memory we allocate. (Just makes sense.)

Decided against a gxml_document_delete_node () operation.

Name changes, API changes

GXmlDomNode -> GXmlNode

It used to be GXmlDomNode for the benefit of Vala programming. Because there's a GLib.Node, you'd always have to write GXml.Node. Changing away from Node allowed us to avoid having to explicitly specify the namespace. However, that resulted in an uglier API for C, which already requires more typing, and you'd end up with common, but ugly, functions like gxml_dom_node_get_node_name (); So, a small cosmetic benefit that also helps our node class better comply with the DOM Level 1 Core spec.

DomError -> DomException

Mostly due to DOM spec compliance. Also, it was DomError just to fit in with GError naming, but because it's not a GError domain anymore (see error handling changes below), it just follows the proper name.

Error Handling

API change

Near the start, one question that was raised was error handling. GErrors are meant for capturing runtime problems and not programmer errors. Reviewing[0] the possible DOM Exceptions specified, most seemed like programmer errors, so it was decided to switch to g_warning ()s instead. This also has the added benefit of shrinking the API for C users.

0. http://blog.kosmokaryote.org/2013/06/technology-domexception-its-nature-and.html

completing error checking

Most exception cases from the spec weren't actually being properly checked or checked at all yet. Consequently, a bunch of work has now been done to check those cases. They were identified in a bug[0] and I believe all but one are now checked and generate a warning. They generally do not instantly fail or return NULL though, as in many cases, libxml2 actually supports the errant behaviour. There is also a mechanism to determine what the last error reported was: GXml.last_error, which is a DomException value (where DomException is an enum). Developers can test GXml.last_error to see if everything is alright (== DomException.NONE) or, if there is an error, find out what type. Classic unix.

0. https://bugzilla.gnome.org/show_bug.cgi?id=703070

TODO

  • Still want more negative tests.

More Examples

Usage was expanded, and corrected. C developers should have a better idea of how to manage memory. JS developers should see more usage of properties.

I'm still making a JavaScript bookshelf application, but it's harder than I expected.

Documentation

Documentation is generated with Valadoc. The generated gtk-doc has issues. Some information found in the valadoc is absent from the gtk-doc (like hierarchy). Also, there are strange formatting issues (the table of contents is missing dashes for some classes). Some fixes have been made so far. Also, we want to link to the part of the DOM spec that each element is defined from, so a developer can refer back to it. This has been widely done this summer, but some of the links go to non-unique anchors and need to be updated.

Limitations of valadoc has prevented format pickiness.

Patch projects

To popularise GXml, we'll provide patches! So far, tentative patches have been written for three projects

  • yelp
  • glade
  • libgdata

The patches are not yet suitable due to API instability in GXml and changes in semantics (memory in particular). Mostly likely, the current patches will be replaced by new ones soon.

TODO

  • revise patches and push them upstream

Bugzilla

A bug-hunting I do go.

https://bugzilla.gnome.org/buglist.cgi?query_format=advanced;bug_status=UNCONFIRMED;bug_status=NEW;bug_status=ASSIGNED;bug_status=REOPENED;bug_status=NEEDINFO;product=gxml;known_name=gxml%20open;query_based_on=gxml%20open

In particular, there are two patches floating around, one for XPath support by Adam Ples, and one for a new approach to serialization (Daniel Espinosa). Sadly, both of these will probably need changes due to changes described above. :P

Measure Performance

To see how GXml performs compared to libxml2 and what the penalty for wrapping it is, look here: http://blog.kosmokaryote.org/2013/09/gnome-final-report-for-gxml-in-2013.html

In short, memory sees 15-20% increase, and time often varies based on the size of the file. The smaller the file, the more noticeable a time difference is, even around 50% for loading smaller files. Larger files obscure the difference due to all the time libxml2 takes to parse them.

Outreach/SummerOfCode/2013/Projects/RichardSchwarting_GXml (last edited 2013-12-03 18:34:15 by WilliamJonMcCann)