Orca "Phase 2"

Orca Logo

We're now at GNOME 2.24. What we're seeing is that some of the bugs and feature requests are causing us to bump into design choices we made long ago. We don't regret making these choices, since we could have easily suffered from "analysis paralysis" rather than making decisions based upon what we knew at the time. However, we now sometimes spend a lot of time working around the design choices in order to fix a bug or implement a new feature. Futhermore, the AT-SPI/D-Bus work is coming to a point where we can start using it, so it might require some changes in Orca as well. In addition, different disability groups, such as people with dyslexia and other learning disabilities, think Orca could be modified to suit their needs. Finally, the code is difficult for newcomers, so we need to make it more approachable.

This seems like a good time to reflect upon where we are, what we've learned since GNOME 2.16, and where we want to go. This page is meant to be a somewhat freeform page to capture our thoughts. As we progress, we can formalize things into bugs and such. For now, however, let's just jot down some ideas.

Goals

This is a list of things we've learned as well as stuff we want to do. We should attempt to cover these in the refactor.

Eliminate global locusOfFocus

The original design of Orca called for different presentation managers. The original two were the focus tracking presentation manager and the hierarchical presentation manager. The notion was that you could switch between the two quickly, and they'd both need reference to the object with focus. As such, we maintain a global locus of focus state. Over time, the hierarchical presentation manager was depricated and we now live with the complexity of dealing with the global focus state. For example, a new locus of focus detected by a script results in a call to orca.py, which then calls the presentation manager, which then calls back into the script.

Let's eliminate the global locus of focus and have each script maintain its own notion of locus of focus. This could be customized somewhat per script (perhaps an instance of a LocusOfFocus class?), allowing for handling of special application and/or toolkit behavior. For example, focus might be on a table with an active descendant. Or, focus might be in a text area with a given caret position. Etc. Note also that this might mean changing the focus_tracking_presenter into a script_manager.

Handle per-script settings better

The global settings module was originally designed without the plan to do per-script settings. We shoe-horned the per-script settings into Orca in a rather hacky way in order to allow references to the settings module to persist throughout the code (i.e., we chose to do a more isolated change than make a larger global change).

Let's consider making each script have its own handle to its own settings and allow those settings to defer/delegate to global settings.

Provide formatting strings for speech and braille

Right now, the generators hardcode the information presented and the order in which it is presented to the users. Let's consider reducing the hardcoding in the generators and make them driven by formatting strings that define the type and order of information to present to the user. For example: "%l %v %r" might indicate present the label, the value, and the role. This thought may or may not work and needs a lot more fleshing out. If achievable, however, the user could potentially override the strings.

Related to this, we should also work with the learning disability community to determine what spoken information is useful to them and provide a means to turn on/off

Jon: I agree with Joanie's idea of rolestuff.py, I see it as follows

Scenario: We land on an item:

1. The messageDispatcher calls rolestuff.py with the object.

2. The specific role function returns a dictionary of properties including a possible overwridden speech and braille verbosityFormatString, and a textAttributesFormatString. (See note A) We would not have utterances and regions, but simply regions with properties. (See note B)

3. We can do clumping of properties/attributes at this stage. (see note C)

4. The messageDispatcher continues by replacing each format option by the desired value/structure. This is done for both the braille and speech verbosityStrings. It will also account for the textAttributesFormatString, and insert the desired attributes. (see note D)

5. Once done, we pass along the speech and braille regions to respective output function, and it is the output modules job to present the information correctly to the user. For example: The speech would say "orca start bold is a great end bold project" whereas the braille would simply underline "is a great"

Scenario readAll:

1. The messageDispatcher lives at the heart of the output process, so we replace the current sayAll in the speech, by a readAll loop in the messageDispatcher. Inside the loop, we call the speech output on that region, we output the brale for the region, and we move the magnification to the region. Hopefully this gives us a better chanse to keep the output modules in sync. Inside the readAll loop, we also have a delay timer, to account for peoples various reading speeds. (yes the speech may have finished reading the region, but the user may wish to read the region him/herself using magnification or braille, hence the need for the delay). Having readAll functionality regardless of speech output is also very useful, saving the user many keystrokes, if they have set their chosen reading speed.

2. orca keys are associated with <backward> and <forward>, which wakes up the timer and outputs the previous/next region. (The intention of this is to give the user speed reading functionality, as seen by jaws, when a readall is performed they can press left shift/right shift to read previous/next region).

Note A: dyslexic users, their verbosity format strings would not include the state, but only name and value. They can see its a checkbox and that it is checked, but would like Orca to read the label.

Note B: Indeed some of the properties could be speech related, hence Accs is just another field returned by the generator, and is passed onto the speech output module. This will also enable us to do on the fly language switching, if we detect somehow that "hej jag hetter Jon" is written in Swedish, then if a Swedish voice is available then we will use it.

Note C: This will enable us to perform searches based on attributes. A feature that has been wanted for some time.

Note D: This will enable us to add support for proof reading abilities, since the only diffrence between the normal mode and proof reading mode is the textAttributesFormatString, which selects which attributes we wish to be exposed to the braille/speech.

Better consider the braille-only user

We have a number of features that are speech-only right now (e.g., where am I, the tutorial messages, etc.). We need to determine how to better support the braille user.

Allow the user to easily override or eliminate keybindings

Right now, you always get Orca keybindings and they are hardcoded. Overriding them is very cumbersome. In addition, some users have requested that Orca not use any keybindings at all. As a result, we might consider making keybindings something that is done more in a customizable table than in code, and allow these keybindings to be offered as "overlays" or some such thing. This might also help with the problem of translating the mnemonic-style keybindings of Orca (e.g., Insert+s for speech).

Consider moving keybindings to the modules with which they are associated

This may be irrelevant depending on what we do w.r.t. the previous item. However.... Right now, all of the keybindings -- with the exception of those for structural navigation -- are defined in the scripts. Would it make more sense to, say, have the flat-review related keybindings in flat_review.py, the magnification keybindings in mag.py, etc.?

Make functionality use generators where possible

The where am I module is kind of its own separate beast. We should try to integrate it with the generators as much as we can. We may also try to determine how to present this information to braille users.

Joanie: The default script's locusOfFocusChanged() perhaps would be another candidate?

Joanie: I just did special handling in flat_review.py for combo boxes too. Another candidate for generators?

Joanie: That said, see the next item. :-)

Should rolenames.py be rolestuff.py??

Joanie: As a general rule, for any given accessible of ROLE_FOO, we presumably want to present the same basic information about that object, regardless of whether we're in StarOffice, Gecko, or Gedit. That gives the user the most consistent experience. Similarly, what we present when locusOfFocus changes is not the same as what we present when doing flat review, nor is it the same as what we present when doing whereAmI. However, the building blocks (names, roles, displayed text, selected state) largely are.

If each script had its own rolestuff.py through which it could return the contents being displayed in a combo box, the number of items in a list, the coordinates to be presented for a table cell, etc. to whatever module happened to be asking for that information, I would think that there'd be far less of a need for the scripts' custom speech and braille generators (and for that matter, the custom where_am_i's). Maybe this would also help accomplish the goal of the next item (Provide ACSS per role and other things)? And perhaps it could also be the place where the "short term (per event) caching" gets done?

Scenario: User presses Tab to move to a table cell.

Response: The default speech generator asks rolestuff.py for the header, the contents, the coordinates etc. Rolestuff does its script-specific magic foo to obtain this information, hands it back to the speech generator, and tucks it away for safe keeping. The default braille generator does its thing and asks rolestuff.py for similar information. Rolestuff notes that it's the same blessed cell and returns the cached stuff.

Scenario: User does whereAmI on the table cell.

Repsonse: whereAmI asks rolestuff.py (rather than the speech generator) for the info so that it can easily re-order it and insert pauses to make it all whereAmIish. Rolestuff sees that the current cell still hasn't changed and immediately returns the cached info.

I'm thinking we'd keep this table cell's info around until we find another table cell, at which point the stored cell info would be tossed in favor of the new cell info. Thus if the user presses Alt+Whatever to get into a menu, or Tab to land on a button, or for that matter Alt+Tab to switch to another application window, and then returns to that same cell (having not encountered another table cell in that application along the way), we can reuse the stored info.

I also wonder if the tutorial information is something that should be moved to rolestuff.

Provide ACSS per role and other things

Some users want finer granularity over the voices. One thing is to support custom ACSS instances per object role, but there are other things we should consider too, such as upper case, links, text selection, misspelling, quoted text in e-mails, a "system" voice, etc.

Provide short term (per event) caching

When doing things for speech and braille, we often calculate things multiple times. For example, determining the table headers for a cell is done once for speech and again for braille. We might consider doing something that provides a short term cache for storing the results of operations made while processing a single event. For example, the first time the column header label is calculated for a table cell when processing an event, the value could be saved away. Then, while still processing the same event, the next time the column header label is needed, it could be drawn from the cache instead of recalculated. It would be nice to make this somewhat automatic as well.

Provide intermittent/timed braille messages

A number of times, people have requested the desire to flash a message on the braille display temporarily. We should look to provide a way to do this in Orca. Instead of a timed message, perhaps we provide a means for the user to click on something on the braille display (e.g., a cursor routing key) to make the message go away.

Move braille input support to the script

Cursor routing keys, for example, are handled by the braille module. We should look to make braille.py feed into orca.py and have it deliver events.

Support better language switching

Orca needs to be able to dynamically determine the language of the text to present and speak/braille the language appropriately. Note that language should be given priority over all other things, including the ACSS, though one should try to honor as many ACSS values as possible.

Dump debug.py and use the logging module

The logging module should be able to provide us all we need. In addition, we might consider getting rid of the logging levels and just make logging types. That is, instead of LEVEL_ALL, I might consider saying I want to log the LOG_EVENT, LOG_INFO, LOG_WARNING, LOG_SEVERE, etc., types.

Joanie: (Putting this here for now, and briefly, so I don't forget to ask later). How can we best indicate what "underlining" in braille is taking place (for attributes, selection, etc.)? Having this info available in 'debug.out' output will be handy for troubleshooting; and we have no way (that I'm aware of) for testing this area for regressions.

Highlighting for Magnification

Some users use flat review and magnification at the same time. We should do a better job of highlighting the object of interest.

Highlighting for Say All

As words are spoken, we should attempt to highlight them.

Speaking just selected text

Some users want to highlight/select text and then have it spoken to them.

See what we can go about pulling some Gecko functionality into default.py

If we make the argument that a web interface subsumes a regular GUI interface, then there might be a lot of logic and technique we can incorporate into default.py.

Improve flat review

Flat review is here to stay. We can easily make some things more efficient, such as speaking the current line, word, character -- there shouldn't be a need to create a whole braille context for those kinds of things if all the user wants is a spelling of the current object with focus.

We also try to look into algorithms that provide a quicker and perhaps more localized way to build up the braille context. If possible, this can also help feed into the desire to have a 'screen mode' view of a window versus the 'logical mode' we provide today.

Don't assume standard navigation keys

We currently assume the standard navigation keys (e.g., up/down, left/right) are used to navigate text. This doesn't work well for apps that use other means to navigate text. For example, vi's navigation keys.

Only do the work that is needed

Right now, Orca always computes braille, speech, and magnification. It should only do the work that is necessary. For example, if braille is not used, don't compute the braille line.

Rethink presentation modes

Eitan: Will told me that back in the day, one of the primary navigation modes was going to be based off of the accessible tree's structure, and another mode would be "locus of focus" based. It seems today that focus driven presentation is the main, if not sole, presentation mode. But in reality we have at least two other presentation modes, "flat review", and "mouse review". It would be cool if these three modes would have a peer relationship and share a common class hierarchy. This would also allow sharing presentation code between these different modes, for example a check button would be spoken and rendered to braille in an identical manner in all modes. It would also allow users to extend Orca by deriving from a presentation class and "registering" a new presentation mode.

Will: I wonder if we should consider allowing multiple scripts per application?

Consider gconf for settings and also react to settings changes on the fly

The Python-based settings file is great for a lot of things, but we might consider going to gconf because we've had some pressure to do so. In addition, we may want to react to settings changes in gconf on the fly.

Figure out how to have multiple launchers for a singleton instance

We may want to allow the user to say "Launch the magnifier", "Give me a speech interface", "Give me braille" as separate items. A single running instance of Orca should be able to detect this and augment itself appropriately.

See if there's some way to "look down" the event queue

Joanie: More often than not, events seem to come in groups such as {focus, focused, caret-moved} when switching paragraphs, {text-changed:delete, caret-moved, text-changed:insert} when an app is rewriting displayed text, the goofy 6+ events that OOo Writer seems to like to give us when navigating by paragraph, and so on. Currently we handle events one by one and try to surmise what's going on based on keyevents, the locusOfFocus, the hierarchy/ancestry, object states, and the like. If we could figure out some way to "look down" the event queue and spot such groupings, we might be able to better judge what caused the event(s) and determine how to proceed. (Yes, it's a hard problem; no, it won't be a bullet-proof solution. Just suggesting we try. :-) )

Could/Should we consider threading?

Joanie: Performance issues are regularly raised by users, especially in regards to Firefox. I know nothing about threading other than the concept. :-) But based on that wee bit of information, I'm wondering if threading might help us with these issues. For instance, if the user is moving line by line, could we get the next line in a separate thread while the current line is being presented? If the user is moving by heading, could we get the first heading and present it while getting more headings in another thread? Etc.

Stop relying upon specific strings and specific hierarchies so much

Joanie: There are a number of places where we determine that a particular object is the specific object were after by looking at the ancestry. This works great unless there just so happens to be a similar hierarchy elsewhere in the app and/or the app developers change their hierarchy (e.g. the OOo guys changing the document view from ROLE_UNKNOWN to ROLE_DOCUMENT_FRAME in OOo 3.0 broke quite a bit -- at least they didn't change the number of objects!).

There are a few places where we look to see if the name of some object (e.g. the frame) ends with a particular string (e.g. the default script's locusOfFocusChanged() decides whether or not to do something based on whether or not the top level object's name happens to end with "Thunderbird" -- the app which, by the way, is currently going by the name of Shredder).

Wherever possible, we should be looking at conditions instead.

Refactoring Coding Guidelines

import logging
log = logging.getLogger('<modulename>')

Architecture/Implementation (in progress)

The old architecture tended to result in big humongous modules. This was mainly because a script was responsible for handling all the interaction with the application. The new architecture provides the notion of "plugins" that allow us to split functionality into separate modules. Let's take a look at this from the top down.

orca.py and script_manager.py

The main job of the orca.py module is to parse the command line, read in the global settings, create a singleton instance of script_manager.py:ScriptManager, and start the main loop via pyatspi.Registry.start(). orca.py no longer listens for events or does other work such as learn mode or key echo.

script_manager.py:ScriptManager is meant to be created as a singleton instance for Orca. Its main job is to create and manage script.py.Script instances and to make sure events make it to the right script. script_manager.py:ScriptManager registers for input device events and listens for window activation events. When it sees that a window has been activated, it makes sure a script has been created for the application associated with that window and calls that script the "active script". When it receives input devices events, script_manager.py:ScriptManager delivers them to the "active script" for processing.

TODO: learn mode may need to go in script_manager.py:ScriptManager

TODO: key echo may need to go in script_manager.py:ScriptManager

script.py

The script_manager.py:ScriptManager creates a single instance of script.py:Script for each application that is discovered on the desktop. The script.py:Script instances are created 'lazily' in that they are made only when a window for an application is activated.

The main job of a script.py:Script instance are to provide convenience functions that "normalize" the AT-SPI for the application it is working with and to manage plugin.py:Plugin instances. For example, it might work around quirks in the AT-SPI implementation of the toolkit for the application.

The scripts.default.py:Script instance is the primary script that attempts to normalize the AT-SPI for all toolkits. It is expected that one will subclass scripts.default.py:Script for toolkits (placing the subclassed script under the scripts.toolkits module) and applications (placing the subclassed script under the scripts.applications module).

As part of the AT-SPI "normalization", script.py:Script keeps track of two things in the application:

  1. focus: the current AT-SPI accessible that has the AT-SPI STATE_FOCUSED state

  2. locus: the current 'point of regard,' 'object of interest,' or however you want to call it. It's basically a list of accessible objects that express what the user currently cares about at the moment. TODO: need to define the exact stuff for each object, but the thoughts are that we'd have a dictionary with keys to get the following (TODO: probably also need something to express if the locus comprises a character, word, sentence, paragraph, line, accessible, "Where Am I", etc.):

    • accessible
    • name
    • value
    • caretOffset
    • startOffset
    • endOffset
    • string (and text attributes?)
    • row
    • column
    • activeDescendantInfo
    • textSelections

The script.py:Script class is a subclass of gobject.GObject, allowing it to emit events. It will emit focus-changed and locus-changed events when the values change, passing both the old and new values.

Introspection for AT-SPI Object Event Listeners and Input Event Handlers

The script.py:Script class supports introspection on itself to automatically discover and register AT-SPI object events and input device events. The way this works is via method naming conventions: AT-SPI object event listener methods end with the string "Listener" and input device event handler methods end with the string "Handler".

Each "Listener" method is expected to have an events attribute that is a list of strings representing the AT-SPI object event types the listener cares about. For example, the following code defines a _valueChangedListener method that listens for AT-SPI object:value-changed and object:property-change:accessible-value events.

def _valueChangedListener(self, event):
    """Called on AT-SPI object:value-changed and                            
       object:property-change:accessible-value events.

    Arguments:
    - self: this Script instance
    - event: a pyatspi.event.Event instance
    """
    ...do some very important stuff here...
_valueChangedListener.events = ["object:value-changed",
                                "object:property-change:accessible-value"]

Each "Handler" method is expected to have two attributes:

  1. A description attribute that is a human consumable description for what the handler does

  2. A bindings list that contains a list of input_binding.InputBinding instances that describe the input events that will cause the handler to be called.

For example, the following code defines a reviewCurrentLineHandler method that sets up a description attribute and a bindings attribute that contains a number of different bindings. Note that the clickCount attribute is generally not needed unless you have bindings that do one thing for a single click of a key and a double or triple click of the same key. In addition, the keyboardLayout attribute is only needed if you want to set up different bindings for desktop or laptop layouts.

def reviewCurrentLineHandler(self, inputEvent=None):
    """The reviewCurrentLineHandler handler.                                
    """
    ...do some very important stuff here...
# Translators: the 'flat review' feature of Orca                            
# allows the blind user to explore the text in a                            
# window in a 2D fashion.  That is, Orca treats all                         
# the text from all objects in a window (e.g.,                              
# buttons, labels, etc.) as a sequence of words in a                        
# sequence of lines.  The flat review feature allows                        
# the user to explore this text by the {previous,next}                      
# {line,word,character}.  This particular command will                      
# cause Orca to speak the current line.                                     
#                                                                           
reviewCurrentLineHandler.description = \
    _("Speaks the current flat review line.")
reviewCurrentLineHandler.bindings = [
    input_binding.KeyboardBinding(
        "KP_8",
        input_event.defaultModifierMask,
        input_event.NO_MODIFIER_MASK,
        clickCount = 1,
        keyboardLayout = settings.GENERAL_KEYBOARD_LAYOUT_DESKTOP),
    input_binding.KeyboardBinding(
        "KP_Up",
        input_event.defaultModifierMask,
        input_event.NO_MODIFIER_MASK,
        clickCount = 1,
        keyboardLayout = settings.GENERAL_KEYBOARD_LAYOUT_DESKTOP),
    input_binding.KeyboardBinding(
        "i",
        input_event.defaultModifierMask,
        input_event.ORCA_MODIFIER_MASK,
        clickCount = 1,
        keyboardLayout = settings.GENERAL_KEYBOARD_LAYOUT_LAPTOP),
    input_binding.BrailleBinding(
        brlapi.KEY_CMD_TOP_LEFT,
        input_event.NO_MODIFIER_MASK,
        input_event.NO_MODIFIER_MASK)
]

NOTE: script.py:Script instances generally should not listen directly for AT-SPI object events or input device events. Their subclass, plugin.py:Plugin, is meant to do this. However, there may be the occasional case where script.py:Script instances need to listen for events.

TODO: the script.py:Script subclasses currently hardcode their plugins (see scripts.default.py:_getPluginClasses). This should be more dynamic.

TODO: the focus and locus attributes of a script should probably be implemented as http://users.rcn.com/python/download/Descriptor.htm]. We expect, however, to be able to emit more than just the old and new values. We're toying with the idea of including the notion of "senses" that the change applies to: speech, audio, braille, magnification, visual enhancements, etc.

plugin.py

The plugin.py:Plugin class is a subclass of script.py:Script is the thing that does most of the work in Orca. plugin.py:Plugin instances are owned by a script.py:Script instance which passes them AT-SPI object and input device events in the order in which the plugins were added to the script.

In general, plugins listen for AT-SPI events (see the "Listener" introspection description under the script.py section above) and input device events (see the "Handler" introspection description under the script.py section above).

The primary purpose of listening for AT-SPI events is to change the focus and/or locus attribute of the script that owns it (self._script). This will cause the script to emit a focus-changed or locus-changed signal that can be handled by an object referred to as a "channel" (see below).

The primary purpose of listening for input device events is to allow the user to tell the plugin to do something. What that something is depends upon the plugin. It could be for flat review, changing speech parameters, obtaining "Where am I" information, etc.

Channels

Channels are something we're thinking about for the new Orca, but we're not sure about it yet. A "channel" is merely something that knows how to present something to the user. For example, there might be a speech channel, a braille channel, a magnification channel, etc. In general, a channel will listen for signals (e.g., focus-changed or locus-changed) from a script and then make an appropriate presentation to the user.

TODO: a channel could simply be a plugin.py:Plugin instance.

TODO: the goal of channels is to allow more and different channels to be easily added in the future. They may introduce a ton of complexity, such as trying to figure out how a plugin can provide the channel with sufficiently enough information for what should be presented to the user (e.g., the "Where Am I" plugin making sure a braille channel and speech channel get enough information to present their information in the unique way they need to present it). As such, we may end up abandoning this effort and going to a design where the plugin itself has direct control over the presentation in speech, braille, magnification, etc.

Utilities

utils.py

The utils.py module contains a number of utilities that can be used by any module. The module also contains some methods for registering/deregistering AT-SPI object event listeners as well as setting a global callback for AT-SPI events. Of special note is the utils.py:setEventListenerCallback method for setting the callback: it should only be called by script_manager.py:ScriptManager since it is what handles the queuing and dequeuing of events.

braille/brltty.py

The braille/brltty.py module is for dealing specifically with BrlTTY. It has methods for writing to the display (writeText) and setting which BrlTTY commands Orca will listen to (setKeys). The methods of the braille/brltty.py module are imported into the braille module itself, so that one can call braille.writeText directly. The brltty.py module should only be initialized by script_manager.py:ScriptManager since it is what handles the queuing and dequeuing of braille input device events.

TODO: need to add the braille regions support back in.

settings.py

The settings.py module defines a Settings class that is a specialized dictionary whose initializer takes two other Settings instances: delegate and override. The delegate notion is that if a Settings instance doesn't have a key, then it will look to the delegate. This allows plugins and scripts to defer to their owners and/or superclasses for settings. The override notion is that this will be searched first, before looking locally. This allows command line settings to take precedence.

Plugins

These are examples of plugins. Their main job is to listen for AT-SPI and input device events and then do something. If the "channel" concept is implemented, these do something (e.g., cause a gobject signal to be emitted) to cause the channels to do the presentation. If channels are not implemented, then these can talk to speech, braille, and magnification directly.

Status


The information on this page and the other Orca-related pages on this site are distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Orca/Refactor (last edited 2008-10-06 21:07:18 by WillieWalker)