The Future of Automated Patch Testing

Over two years of development has lead to testing.drupal.org becoming a reality. The testbed has been active for almost two months with virtually no issues related to the automated testing framework. That is not to say the bot has not needed to be disabled, but instead that the issues were unrelated to the automated testing framework itself.

The stability of the framework has lead to the addition of over 6 testing servers to the network, with more in the works. Increasing the number of testing servers also means an increased load required to manage the testing network. A fair amount of labor is needed to keep the testing system running smoothly and to watch for Drupal 7 core bug introductions.

Having the testing framework in place has saved patch reviewers, specifically the Drupal 7 core maintainers: Dries and webchick, countless hours that would otherwise have been spent running tests. The automated framework also ensure that tests are run against every patch before it is committed. Ensuring tests are always run has lead to a relatively stable Drupal 7 core that receives updates frequently.

Based on the overwhelming positive feedback and personal evaluation of the framework a number of improvements have been made since its deployment. The enhancements have, of course, lead to further ideas which will make the automated testing system much more powerful than it already is.

Server management

A number of additions have been made to make it easier to manage the testing fleet. The major issue that remains is that of confirming that a testing slave returns the proper results. The process currently requires manual testing and confirmation of the results. The process could be streamlined by adding a facility that would send a number of patches with known results to the testing slave in question. The results returned would then be compared to the expected results.

The system could be used at multiple stages during the testing process. Initially when a testing slave is added to the network it would be automatically confirmed or denied. Once confirmed the server would begin receiving patches for testing. Periodically the server would then be re-tested to ensure that it is still functioning. If not the system would automatically disable the server and notify the related server administrator.

Having this sort of functionality opens up some exciting possibilities as described bellow.

Public server queue

Once the automated server management framework is in place the process of adding servers to the fleet could be exposed to the public. A server donor could create an account on testing.drupal.org, enter their server information, and check back for the results of the automated testing. If errors occur the server administrator would be given common solutions in addition to the debugging facilities already provided by the system.

If the server passes inspection an administrator of testing.drupal.org would be notified. The administrator would then confirm the server donation and add the server to the fleet. Once in the fleet the server would be tested regularly like all the other servers. If the donor wishes to remove their server from the fleet they would request removal of their server on testing.drupal.org. The framework would let any tests finish running and then remove the server automatically.

A system of this kind would provide a powerful and easy way to increase the testing fleet with minimal burden to the testing.drupal.org administrators. Having a larger fleet has a number of benefits that will be discussed further.

Automated core error detection

Automatically testing an un-patched Drupal HEAD checkout after each commit and confirming that all tests pass would ensure that any human mistakes made during the commit process do not cause false test results. In addition to testing the core itself the detection algorithm would also disable the testing framework if drupal.org is unavailable. Currently when drupal.org goes down the testing framework continues to run which causes errors due to the patches not being accessible. Having this sort of system in place would be a great time saver for administrators and ensure that the results are always accurate.

There is currently code in place for this, but it needs expanding and testing.

Multiple database testing

Drupal 7 currently supports three databases and there are plans to support more. Testing patches on each of the databases is crucial to ensure that no database is neglected. Creating such as system would require a few minimal changes to the testing framework to store results related to a particular database, send a patch to be tested on each particular database, and display the results in a logical manor on drupal.org.

Patch reviewing environment

In addition to performing patch testing the framework could also be used to lower the barrier required to review a patch. Instead of having to apply a patch to a local development environment a reviewer would simply be required to press a button on testing.drupal.org after which he/she would be logged into an automatically setup environment with the patch applied.

This sort of system would save reviewers time and would make it much easier for non-developers to review patches, especially for usability issues.

Code coverage reports

Drupal 7 strives to have full test coverage. What that means is that the tests check almost every part of the Drupal core to ensure that every works as intended. It is rather difficult to gage the degree to which core is covered without the use of a code coverage reporting utility. Setting up a utility of that kind is no small task and getting results requires large amounts of CPU time.

The testing framework could be extended to automatically provide code coverage reports on a nightly basis. The reports can then be used, as they have been already, to come up with a plan for writing additional tests to fill the gaps.

Performance ranking

Since the tests are very CPU intensive having a good idea of the performance of a particular testing slave would be useful for ordering which servers are sent patches first. Ensuring that patches are always sent the fastest available testing server will ensure the quickest turn-around of results. The testing framework could automatically collect performance data and use an average to rank the testing server.

Standard VM

Creating a standard virtual machine would have a number of benefits: 1) eliminate most configuration issues, 2) provide consistent results, 3) make the processing of setting up a testing slave easier, and 4) make it possible for one testing server to test patches on different databases. Several virtual machines are currently in the works, but a standard one has yet to be agreed upon.

Benefits

Drupal is somewhat unique in having an automated system like the one in place. The system has already proven to be a beneficial tool and with the addition of these enhancements it will become a more integral part of Drupal development and quality assurance. Maintaining the system will be much easier, reviewing core patches will be simpler, and the testing fleet can be increased in size much more easily.

With a larger testing fleet the testing framework can be expanded to test contributed modules. In addition the framework can be modified to support testing of Drupal 6 and thus enable it to test the large number of contributed modules that have tests written. Having such a powerful tool available to contrib developers will hopefully motivate more developers to write tests for their modules and in doing so increase the stability of contributed code base.

The automated testing framework is just beginning its life-cycle and has already proven its worth, with enhancements like the ones discussed above the framework can continue to provide new tools to the Drupal community.

Comments

Patch reviewing environment is a fantastic idea. Making it easy to test patches will do wonders for getting the many old languishing patches out there dealt with.

Out of curiosity what is the current code coverage of core tests?

Thanks for all your work on this. This whole thing is really helping to bring the quality of Drupal head and shoulders above the competition.

Being the maintainer of that at Acquia, i cannot get one build a day to pass, never mind a build for every patch.

My latest error during installation is

SQLSTATE[42S02]: Base table or view not found: 1146 Table 'drupal_coverage.simpletest214041vocabulary_node_types' doesn't exist in default_profile_tasks() (line 148 of /home/buildbot/drupal-coverage/build/profiles/default/default.profile).

and so there is sits until i have time to look at the issue.

So i see the benefits of having a coalition of some sort able to help keep the system running smoothly

re:latest error
turns out there was an error reseting db, so errors were from leftover tables from last run

Testing.drupal.org has been getting passes on patches for several months now.

Here's how i would set this up, tell me how far off I am

1. Bring back simpletest from simpletest.org
2. Identify true unit tests and separate them from system tests
3. Port unit tests to use simpletest.org code
4. Install a continuous integration software system like buildbot
- write custom trigger for when to build should take place
- configure it to accept patches on a port sent from d.o.
- configure it to queue builds to an finite number of slaves
- configure it notify folks via IRC, email
- publish read-only web browser status
- write custom notifier to update d.o on patches status

5. Configure buildbot to run all unit tests on each patch
6. Configure buildbot to run system tests several times a day based on what's already been commited to CVS. Any failures, can be attributed to any/all folks that commited in the time window
7. Allow core maintainers to try any patch against full system tests at their will.

yes, all test slaves should be virtual machines. Yes, because it's 1000 times easier, but also because patches may do unintentional bad things and you wouldn't want anyone volunteering a bare metal machine and have it get destroyed.