Orca Regression Testing

Orca Logo

Main Ideas

The main ideas behind the Orca test harness are as follows:

Harness Directory Layout

The Orca regression tests contained in the test directory are laid out as follows:

Prerequisites

The first prerequisite is you: among other things, you need to have the knowledge, skills, and permission to build/install modules from GNOME CVS, to add users and run things as root. Throughout the rest of this page are various examples of how to do these things on Ubuntu and Solaris, but it is expected that you have the knowledge of what these commands are and how to run them. That is, this is not intended to be a guide on system administration and application development.

Macaroon

To run any of the tests, you need to build/install Macaroon. Macaroon can be obtained, built, and installed by issuing the following commands:

git clone git://git.gnome.org/accerciser
cd accerciser/macaroon
./autogen.sh
make
sudo make install

gtk-demo

The tests also require various applications to be installed, including gtk-demo. On Solaris, gtk-demo is available at /usr/demo/jds/bin/gtk-demo. To make things go smoother for Solaris, provide a symbolic link from /usr/bin/gtk-demo to /usr/demo/jds/bin/gtk-demo. For Ubuntu, you can obtain/install gtk-demo via the following command:

sudo apt-get install gtk2.0-examples

trace2html

To do code coverage analysis, you need to grab Olivier Grisel's trace2html 0.2.1 and apply the test/harness/trace2html-coverage-patch.txt. You can apply the patch and install trace2html via the following commands:

gunzip -c trace2html-0.2.1.tar.gz | tar xvf -
cd trace2html-0.2.1
<<<copy your trace2html-coverage-patch.txt to the current directory>>>
patch -p0 src/trace2html.py < trace2html-coverage-patch.txt
sudo python setup.py install

Python cProfile

To do performance profiling, you need the Python profiler module ("import cProfile"), which can be obtained via the following command on Ubuntu (WDW: need directions to get this on Solaris as of SXDE 01/08. I think I remember it seemed to be there by default at one time in the past.):

sudo apt-get install python-profiler

Main Files

Writing Tests Using Macaroon

See the writing tests page.

Running the Regression Tests

Set up an 'orca' Test Account

It is best to run regression tests from a different user account than the account you normally log into. This will help avoid conflicts with things such as personal preferences for theming as well as using 'point to focus' in a window versus the default 'click to focus'. The preferred username is orca, and this user should only use the default GNOME desktop settings. The main things of importance are:

Run the Harness

The harness is designed to be run from the test/harness directory. Don't run it from anywhere else or bad things might happen. To run the harness, merely run the runall.sh script when sitting in the test/harness directory:

./runall.sh > runall.out 2>&1

To specify running tests from just one application, you can add the absolute path to the directory using the -a parameter to runall.sh. For example:

./runall.sh `pwd`/../keystrokes/oowriter > runall.out 2>&1

If you want to specify a different PATH, you can do so quite easily. This makes testing different versions of an application easier. For example:

PATH=~/Desktop/firefox:$PATH ./runall.sh `pwd`/../keystrokes/firefox > runall.out 2>&1

The runall.sh script will run through all the keystrokes and output summary information for the tests to the console. So, redirecting the output to runall.out (as shown above) is a useful way to be able to save the output for later examination. As part of a run, you might see output such as the following:

Test 1 of 1 FAILED: /export/home/orca/orca/trunk-3743/test/harness/../keystrokes
/gtk-demo/debug_commands.py:Report script information
EXPECTED:
     "BRAILLE LINE:  'SCRIPT INFO: Script name='gtk-demo (module=orca.default)'
Application name='gtk-demo' Toolkit name='GAIL' Version='1.20.0''",
     "     VISIBLE:  'SCRIPT INFO: Script name='gtk-de', cursor=0",
     "SPEECH OUTPUT: 'SCRIPT INFO: Script name='gtk-demo (module=orca.default)'
Application name='gtk-demo' Toolkit name='GAIL' Version='1.20.0''",
ACTUAL:
     "BRAILLE LINE:  'SCRIPT INFO: Script name='gtk-demo (module=orca.default)'
Application name='gtk-demo' Toolkit name='GAIL' Version='1.21.5''",
     "     VISIBLE:  'SCRIPT INFO: Script name='gtk-de', cursor=0",
     "SPEECH OUTPUT: 'SCRIPT INFO: Script name='gtk-demo (module=orca.default)'
Application name='gtk-demo' Toolkit name='GAIL' Version='1.21.5''",
[FAILURE WAS UNEXPECTED]

Unexpected failures are not good. When you get one of these, you should compare the output from the 'EXPECTED' section to the output of the 'ACTUAL' section and then work to resolve the differences.

You might also see output with KNOWN_ISSUE in it:

Test 3 of 5 FAILED: /export/home/orca/orca/trunk-3743/test/harness/../keystrokes
/gtk-demo/role_radio_button.py:Range radio button
EXPECTED:
     "KNOWN ISSUE - the radio button should be presented as selected.",
     "BRAILLE LINE:  'gtk-demo Application Print Dialog TabList General Page Pri
nt Pages Filler & y Range RadioButton'",
     "     VISIBLE:  '& y Range RadioButton', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'Range not selected radio button'",
ACTUAL:
     "BRAILLE LINE:  'gtk-demo Application Print Dialog TabList General Page Pri
nt Pages Filler & y Range RadioButton'",
     "     VISIBLE:  '& y Range RadioButton', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'Range not selected radio button'",
[FAILURE WAS EXPECTED - LOOK FOR KNOWN ISSUE IN EXPECTED RESULTS]
Test 5 of 5 FAILED: /export/home/orca/orca/trunk-3743/test/harness/../keystrokes
/gtk-demo/role_radio_button.py:All radio button
EXPECTED:
     "KNOWN ISSUE - the radio button should be presented as selected.",
     "BRAILLE LINE:  'gtk-demo Application Print Dialog TabList General Page Pri
nt Pages Filler & y All RadioButton'",
     "     VISIBLE:  '& y All RadioButton', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'All not selected radio button'",
ACTUAL:
     "BRAILLE LINE:  'gtk-demo Application Print Dialog TabList General Page Pri
nt Pages Filler & y All RadioButton'",
     "     VISIBLE:  '& y All RadioButton', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'All not selected radio button'",
[FAILURE WAS EXPECTED - LOOK FOR KNOWN ISSUE IN EXPECTED RESULTS]

The presence of KNOWN_ISSUE in the expected results is a reminder of an issue that the team is aware of, but cannot fix.

Finally, after each test file is run, you should see summary output similar to the following:

SUMMARY: 4 SUCCEEDED and 0 FAILED (0 UNEXPECTED) of 4 for /export/home/orca/orca
/trunk-3743/test/harness/../keystrokes/gtk-demo/role_radio_menu_item.py

A quick way to analyze a saved runall.out file is via this command:

egrep "SUMMARY|FAILED" runall.out | grep -v "0 FAILED"

If you observe unexpected failures as part of a run, you can examine the debug logs in more detail. The runall.sh script saves the results to a directory whose name is of the form YYYY-MM-DD_HH:MM:SS (e.g., 2006-11-29_20:21:41). The YYYY-MM-DD_HH:MM:SS directory should contain a set of directories that matches those in the ./keystrokes directory. Under each of the those directories are files containing the reference speech and braille output from a run of the associated *.py file. For each test, there are 5 files: *.speech.unfiltered, *.speech, *.braille.unfiltered, *.braille, and *.debug. The debug files represent Orca debug output obtained during the run and are likely to always be different between runs of the harness. These are useful, however, for analyzing regression differences if they occur. The *.unfiltered files represent the exact output of orca whereas the other files represent a filtered form that helps with repeatability of test results.

Running Just One Test

As you are creating tests or debugging a particular problem, it is useful to be able to run just one test. You can use the runone.sh script for this:

./runone.sh <*.py test file> <app-name> [0|1]

With this command:

Here's an example:

./runone.sh ../keystrokes/gtk-demo/role_radio_button.py gtk-demo 0

Running Code Coverage Analysis

Remember that you need to have Olivier Grisel's trace2html 0.2.1 with the trace2html-coverage-patch.txt patch applied as described above.

Code coverage analysis is then obtained by running runall.sh with the -c parameter:

./runall.sh -c

The coverage results will be placed in ../coverage/<YYYY-MM-DD_HH:MM:SS>.

Running Performance Analysis

Performance analysis is obtained by running runall.sh with the -p parameter:

./runall.sh -p

Remember that you need to have the Python profile module installed (sudo apt-get install python-profiler). The performance results will be placed in ../profile/<YYYY-MM-DD_HH:MM:SS>. The *.orcaprof file is a raw data profile file. The *.txt is a processed version of the *.prof file that is sorted by cumulative time spent in each method.

Doing a Performance Analysis Manually

You might just want to do a quick check or test by running Orca manually, experimenting with an app or feature, and then analyze the performance of that. You can do that by running test/harness/runprofiler.py to run Orca with profiling enabled. Do your manual experimentation here and then quit Orca. The raw binary profile data will be saved in a file called "orcaprof". You can analyze the data using commands such as the following:

python -c "import pstats; pstats.Stats('orcaprof').sort_stats('cumulative').print_stats()"

Nightly Tests

WDW has been experimenting with nightly tests on OpenSolaris 2008.11 (get it here and install it using the accessible install instructions. Here's what he did:

  1. Created an 'orca' test user and set it up
  2. AS THE 'orca' USER: Ran vncserver -ac :1 to setup vnc - a vnc session will be started by the nightly test to give the test user an X Server to use, and can be run on a headless system and/or a machine where nobody is logged into the console. This also creates an xstartup you will edit in the following steps.

  3. Set up the 'orca' user's vnc server's xstartup file (see below)

  4. Set up a nightly script to be run via a cron job

Set Up VNC

  1. AS THE 'orca' USER, first run vncserver -ac :1 if you haven't already done so.

  2. Then, run vncserver -kill :1 and give ~orca/.vnc/xstartup these contents:

[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources
vncconfig -iconic &
gnome-session &

Create Nightly Script

Here's the nightly script for OpenSolaris. It lives in ~orca/bin/orca_nightly_test, and it does the following:

  1. Pulls sources from SVN trunk, builds them, and installs them under /tmp -- WDW - this needs updating for GIT!
  2. Runs pylint on the code

  3. Runs the gtk-demo regression tests

  4. Sends mail only on failure in pylint or the regression tests

Note also that this script also determines the DBUS_SESSION_BUS_ADDRESS which is needed for the tests to communicate with Orca.

# NOTE: This assume TLS on SMTP server.
#
# $1 = SMTP server
# $2 = SMTP server username
# $3 = SMTP server password
# $4 = Your e-mail address
#

. ~/.bash_profile ""

# Get the VNC server going and also make sure we can connect
# to the D-Bus session bus.
#
vncserver -kill :1
bonobo-slay -s
vncserver -ac :1
sleep 30
eval `~/bin/get_dbus`
export DBUS_SESSION_BUS_ADDRESS

env 

# Now, check out from trunk, build it, and install it.
#
cd
rm -rf orca/trunk
git clone git://git.gnome.org/orca
SVNVERSION=`svnversion orca/trunk`
mv orca/trunk orca/trunk-$SVNVERSION
cd orca/trunk-$SVNVERSION
./autogen.sh --prefix=/tmp/orca-$SVNVERSION
make
make install
export PATH=/tmp/orca-$SVNVERSION/bin:$PATH

# Run pylint and make a summary of the bad results.
#
./run_pylint.sh src/orca/*.py src/orca/scripts/*.py 
grep "Your code has been" *.pylint | grep -v "10[.]00" > pylint_summary.out
echo "PYLINT RESULTS:"
cat pylint_summary.out

# Run the gtk-demo tests and make a summary of the bad results.
#
export DISPLAY=:1
xmodmap -e "keycode 23 = Tab ISO_Left_Tab"
xmodmap -e "keycode  79 = KP_Home KP_7 F27 KP_7 F27"
xmodmap -e "keycode  80 = KP_Up KP_8 F28 KP_8 F28"
xmodmap -e "keycode  81 = KP_Prior KP_9 F29 KP_9 F29"
xmodmap -e "keycode  83 = KP_Left KP_4 F30 KP_4 F30"
xmodmap -e "keycode  84 = KP_Begin KP_5 F31 KP_5 F31"
xmodmap -e "keycode  85 = KP_Right KP_6 F32 KP_6 F32"
xmodmap -e "keycode  87 = KP_End KP_1 F33 KP_1 F33"
xmodmap -e "keycode  88 = KP_Down KP_2 F34 KP_2 F34"
xmodmap -e "keycode  89 = KP_Next KP_3 F35 KP_3 F35"
xmodmap -pke

cd test/harness
./runall.sh -a `pwd`/../keystrokes/gtk-demo > gtk-demo.out 2>&1
egrep "SUMMARY" gtk-demo.out | grep -v "0 FAILED" > gtk-demo_summary.out
echo "GTK-DEMO RESULTS:"
cat gtk-demo_summary.out

export GTK_MODULES=
./runall.sh -a `pwd`/../keystrokes/firefox > firefox.out 2>&1
egrep "SUMMARY" firefox.out | grep -v "0 FAILED" > firefox_summary.out
echo "FIREFOX RESULTS:"
cat firefox_summary.out

# Put the pylint and regression test summaries together.
#
cd ../..
cat pylint_summary.out test/harness/gtk-demo_summary.out test/harness/firefox_summary.out > full_summary.out

# Send an e-mail only on failure.
#
NUMLINES=`cat full_summary.out | wc -l`
if [ $NUMLINES -ne 0 ]
then
INFO=`uname -a`
MACHINE=`hostname`
ME=`whoami`
SUBJECT="URGENT: orca-$SVNVERSION test failures on $INFO"
python $HOME/bin/mailit.py << EOF
$1
$2
$3
$ME@$MACHINE
$4
$SUBJECT
full_summary.out
EOF
fi

The get_dbus script looks like this:

MYID=`id -u`
GNOME_SESSION_PID=`ps -u $MYID -f | grep gnome-session$ | grep -v dbus | awk '{ print $2 }'`
pargs -e $GNOME_SESSION_PID | grep DBUS_SESSION_BUS_ADDRESS | awk '{ print $2 }'

The .bash_profile and the .bashrc file it calls look like this:

if [ -f ~/.bashrc ]; then
source ~/.bashrc
fi

. /opt/dtbld/bin/env.sh
export PATH=$PATH:/usr/X11/bin:/usr/openwin/bin:/usr/demo/jds/bin
export MANPATH=/usr/gnu/share/man:/usr/share/man:/usr/X11/share/man
export PAGER="/usr/bin/less -ins"
PS1='${LOGNAME}@$(/usr/bin/hostname):$(
    [[ "${LOGNAME}" == "root" ]] && printf "%s" "${PWD/${HOME}/~}# " ||
    printf "%s" "${PWD/${HOME}/~}\$ ")'

This also requires a ~orca/bin/mailit.py file, which sends mail via an SMTP server:

import smtplib

def prompt(prompt):
    return raw_input(prompt).strip()

smtpserver = prompt("SMTP Server: ")
username = prompt("Username: ")
password = prompt("Password: ")
fromaddr = prompt("From: ")
toaddrs = prompt("To: ").split()
subject = prompt("Subject: ")
filename = prompt("File: ")

msg = ("From: %s\r\nTo: %s\r\nSubject: %s\r\n\r\n"
       % (fromaddr, ", ".join(toaddrs), subject))

infile = open(filename, "r")
msg += infile.read()
infile.close()

server = smtplib.SMTP(smtpserver)
#server.set_debuglevel(1)
server.ehlo()
server.starttls()
server.login(username,password)
server.sendmail(fromaddr, toaddrs, msg)
server.quit()

Set Up the cron Job

Use crontab -e to set up your cron job. Here's an example.

0 0 * * * $HOME/bin/orca_nightly_test 'my.smtp.server' 'myusername' 'mypassword' 'email@address'

Wish List for Nightly Tests

Ideally, we could set up the nightly tests to allow us to determine not only if Orca changes caused regressions in Orca, but also if external components caused regressions. For example, these tests should allow us to at least:

Known Issues

xmodmap -e "keycode  23 = Tab ISO_Left_Tab"
xmodmap -e "keycode  79 = KP_Home KP_7 F27 KP_7 F27"
xmodmap -e "keycode  80 = KP_Up KP_8 F28 KP_8 F28"
xmodmap -e "keycode  81 = KP_Prior KP_9 F29 KP_9 F29"
xmodmap -e "keycode  83 = KP_Left KP_4 F30 KP_4 F30"
xmodmap -e "keycode  84 = KP_Begin KP_5 F31 KP_5 F31"
xmodmap -e "keycode  85 = KP_Right KP_6 F32 KP_6 F32"
xmodmap -e "keycode  87 = KP_End KP_1 F33 KP_1 F33"
xmodmap -e "keycode  88 = KP_Down KP_2 F34 KP_2 F34"
xmodmap -e "keycode  89 = KP_Next KP_3 F35 KP_3 F35"

Note on StarOffice Tests

There are a few things you need to take into consideration with the StarOffice tests:

Test Plan

NOTE: IGNORE THESE. They are here for organizational and historical reference only.

The Orca Test Plan outlines the tests that we want to have for Orca. Ideally, there is a 1:1 mapping between the written tests you find here and the automated tests in the regression test suite. The tests come in two primary forms: one is to test Orca's functionality with the AT-SPI implementation of the toolkit or application in question, the other is to test the script-specific work done for an application.

AT-SPI Implementation Tests

Application Specific Tests


The information on this page and the other Orca-related pages on this site are distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Orca/RegressionTesting (last edited 2009-08-20 17:43:44 by WillieWalker)