You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



Care and Feeding of the DOM Serializer Tests


by Akkana Peck, akkzilla at shallowsky dot com.

Introduction

The DOM Serializers are the code which controls all output from Mozilla -- they're what translates html into plaintext when you send as plaintext from the html mail compose window, what produces the plaintext or html you see when you copy something in mozilla and paste it into another window, and what produces the html output from Composer.

Since historically the output of the serializers have been very sensitive to changes, the mozilla build now includes a set of automated serializer tests which are run as part of the Tinderbox cycle.  This document will describe how those tests work, how to add a new test, and what to do if something breaks one of the tests.

Overview of the tests

In a source tree, the serializer tests live in htmlparser/tests/outsinks.  This may seem like a strange place given that the serializers themselves live in content/base/src.  Indeed it is -- it's left over from when the serializers were part of the parser.  When they got moved, the tests didn't move with them. 

The tests consist of several components:
  • The test program, a C++ program called TestOutput (which is run from dist/bin just like the mozilla application itself).  Most of the code implementing this is in Convert.cpp.
  • Input samples, generally named *.html in htmlparser/tests/outsinks.  During a build these are copied or linked to dist/bin/OutTestData.
  • Output samples: generally named *.out in htmlparser/tests/outsinks, copied to dist/bin/OutTestData during a build.  TestOutput takes each input sample, runs it through the appropriate serializer with appropriate flags, then compares the result against one of the output samples.
  • A perl script, TestOutSinks.pl, which contains the master list of tests, input and output files and flags.  This is the script which is run from Tinderbox and which should be run by hand after making changes that might affect the serializer code.

What do I do if I broke the tests and need to fix them, fast?

  • TestOutSinks.pl will print the name of the test that's failing in English -- for example, "Mail quoting test failed."  Read the TestOutSinks.pl script to find out the command corresponding to that message (a command starting with ./TestOutput, for example, ./TestOutput -i text/html -o text/plain -f 2 -w 50 -c OutTestData/mailquote.out OutTestData/mailquote.html).  Run that command by hand from dist/bin to see the output, which should include the offset where the error occurred.
  • Then run that command by hand, but omit the "-o OutTestDate/filename" part to see the actual output. Perhaps capture it in a file to make it easier to debug.  Note the filename you're omitting (mailquote.out in this case): this is the "comparison file" and you'll need it for comparison later.
  • In your favorite text editor, view both the .out file and the saved output from running the test, search forward the appropriate number of characters from the beginning of the file (in emacs, that's ctrl-U <number> ctrl-f), then compare the saved output with the .out file to see how they differ.
  • If the change is actually correct behavior (it's behaving better now than before), then just modify the comparison file in htmlparser/tests/outsinks/*.out and check that in along with the patch (be sure the reviewer knows about that part of the change and that everybody agrees the change in behavior is appropriate). If it's an error and the old behavior was better, then your patch introduced a bug and should be fixed.

How do I add a new test?

Thanks for asking!  There are two ways to add a new test: 1. Add new cases to an existing test, or 2. Add a n entirely new test.
  1. If you are adding another case which relates to one of the existing tests -- for example, if you have been working on a specific behavior in mail quotation blocks and want to make sure that the current behavior does not regress, but it's not covered in the current mail quotation test -- then just add a block into the existing .html file, add a corresponding block to the appropriate .out file (the easiest way may be just to run the appropriate test and capture the output in the .out file), and check them both in.  You're done!
  2. Adding an entirely new html->plaintext test is slightly more difficult (but still pretty easy).  You will need to
    1. Make a new .html file.
    2. Add an appropriate ./TestOutput line to TestOutSinks.pl.
    3. Generate a new .out file.
    4. Add the new files (.html and .out) to the Makefile.in so they'll be installed into dist/bin/OutTestData.
    5. Check in the new .html, the new .out, and the modified TestOutSinks.pl and Makefile.in.

What if I want to test a different combination, say, html to html or xml to plaintext?

That used to be easy, but unfortunately a change in the serializer architecture has made it harder, and the TestOutput program no longer automatically supports any type besides html to plaintext.  That doesn't mean it's impossible; it just means the system needs some work and no one has done it yet.

Here's what needs to be done for this to work:
The routine HTML2text currently takes inType and outType parameters (they're mime types), but it returns an error if inType != text/html or outType != text/plain.  That's because it next creates an output sink of type NS_PLAINTEXTSINK_CONTRACTID.  Back when all serialization was done through creating parser sinks, that was no problem -- there was a contractid for the html content sink too.  But that all changed when the sinks were turned into serializers, and now there's no contractid that specifically claims to be an html sink.

Is that a problem?  Really all we need is to create a serializer object (serializers can act like parser sinks), so it would probably work to use any contractid that happened to result in an nsHTMLContentSerializer or nsXMLContentSerializer object .  So it's possible that all that is needed is to find the appropriate contactid definitions, add them to the top of Convert.cpp, then use them inside HTML2text depending on what inType and outType are.