WAI-ARIA Automated Testing

This outlines how the W3C WAI-ARIA specification is tested in terms of its implementation in web browsers, and outlines how that could be automated. At present, testing is manual and tedious. The proposal is to automate the process, although, while the overall goal is "automate ARIA testing", it can be broken down into sub-tasks that are suitable for students. I outline these below.

First, however, a description of the subject matter, and how testing proceeds manually.

WAI-ARIA

WAI-ARIA is a W3C candidate recommendation standard that defines a set of attributes that, when added to markup, informs the browser how to expose that markup to a11y APIs. For the remainder of this proposal, the term "ARIA" is used. The details of how browsers are to publish ARIA markup is given in the User Agent Implementation Guide, hereafter "UAIG".

These two documents form the basis for ARIA testing.

Testing

The purpose of testing ARIA is to achieve a version 1.0 release of the specification. In W3C-speak, this is known as a "recommended standard". The requirement is to show that each MUST or MUST NOT in the main specification document and in the UAIG has at least two implementations.

To make this concrete, consider one feature of the spec, for example, the ARIA role "alertdialog". Testing the implementation means showing that a browser appropriately maps the ARIA role to the appropriate role in the a11y API (AT-SPI2, in the case of GNOME). According to the UAIG, the ARIA "alertdialog"' role is mapped by the browser to the AT-SPI2 ROLE_DIALOG role.

Each feature of the spec needs to be so tested. A "feature of the spec" includes the ARIA roles, and the ARIA states and properties. The UAIG has tables documenting how these roles, states, and properties are mapped to various a11y APIs. There is a "Role mapping" section describing general rules about roles and how they are mapped, as well as a table defining individual mappings. Similarly, the State and Property section documents how ARIA states and properties are handled, and includes a mapping table for them.

For each feature, there is a test case in a W3C data base that consists of:

a statement regarding an expected result, e.g., "Element with id 'test' having role alertdialog: The AAPI object has role ROLE_DIALOG".
a test html file with the proper markup for the test.
a way for a tester to indicate whether the test passed.

The test cases are currently under construction by the W3C. Note that construction of test cases is outside the purview of this proposal as it assumes the existence of the test cases. The test html files are publicly viewable. Keeping with the alert dialog example, the test file for the "alertdialog" role is found in Plain <div>; with role "alertdialog" and no states or properties. Unfortunately, at this point, the testable statements are not publicly available.

Given the number of features within ARIA, it is expected that there will be on the order of thousands of test cases. For example, ARIA 1.0 Candidate Recommendation Implementations shows a list of the testable statements that were automatically generated from the specification. Other statements are being written by hand.

Accerciser

How does a tester confirm the expected result? A tester must load the test html file for a test case into a browser, and then inspect what the browser publishes to the a11y API. Accerciser is the a11y API inspector used in the case of the GNOME desktop. Here is an outline of the steps a tester goes through to confirm proper mapping of ARIA information to the a11y API:

Find the relevant test case and note the expected outcome.
Find the associate test html document and load it into FireFox.
Switch to Accerciser.
Locate the FireFox subtree in Accerciser's tree view, specifically the sub-tree that represents the test html document window.
Locate the accessible object in the tree view that corresponds to the relevant element in the html.
Use Accerciser's "Interface view" to check the accessible object's role or properties, as appropriate for the test, to determine the actual outcome.
Confirm whether the actual outcome matches the expected, and report the result.

Automation

This section outlines the required sub-tasks in order to automate the above manual testing process. What's missing are the details of how to implement each sub-task. However, a sub-task or group of sub-tasks might qualify as a project for GNOME's Outreach Program for Women.

A process for selecting a test case, and noting the expected result.
- A sub-task here is to develop a data structure that encodes the expected result. Possibilities include XML and JSON. My preference is JSON since it can easily be read in by python (the language of Accerciser) to create an object, allowing object comparison as a way to check expected vs. actual.
A process to locate the associated html test file and load it into FireFox.
An addition to Accerciser to locate the branch of its a11y tree that corresponds to the html element(s) just loaded into FireFox.
- here, "an addition" means either adding built-in code to Accerciser to perform these operations, or implementing an Accerciser plugin.
An addition to Accerciser that compares the expected result against the actual result.
- again, "an addition' means either build-in code or a plugin.
A process (addition to Accerciser?) that records the outcome -- pass vs. fail (vs. indeterminate?).
- It would be ideal to hook this into the W3C's test harness and actually record the result on their server, but I doubt such access will be allowed for security reasons. Instead the results should be output on some other server and in a form that would allow someone with access to the W3C to upload the results.
- another data format is useful here, one that encodes the results. Should check to see if W3C already defines that, or what would be required for an easy upload of results to their server.

Other thoughts

Here's a rough idea for a JSON data structure that encodes a testable statement. In English, the testable statement is: "A <div> element with id='test', and an ARIA role of 'alertdialog' is mapped to the ROLE_DIALOG AT-SPI role":

  {
    'element': 'div',              // together with the 'id' field below, determines the relevant html element.
    'id': 'test',
    'testing' : 'role',            // what to check. could also be 'state', 'property', or 'name'.  Others?
    'role' : 'alertdialog',        // the ARIA value of what to check.
    'expected' : 'ROLE_DIALOG',    // the value here should be the corresponding constant from, say, py-atspi.
    'actual' : '',                 // to be filled in by the testing device.
    'success' : ''                 // to be filled in --'true', 'false', or 'indeterminate'.
  }

IBM has developed an XML format for the above for use with AccProbe and automated testing of IAccessible2 on Windows. Should check the licence of that format and see what we can reuse from it.
Is any of Brian Nitz' automated testing harness useful here?

This work is part of the ÆGIS (Ontario) Project. It is funded and supported by the Ontario Ministry of Economic Development and Innovation and the ÆGIS (Europe) Project.