Part 3: Testing battle plan

I have been meaning to make this post for quite some time. I have posted pieces before and had many discussions, but I have yet to write it out formally. Moshe's recent post about Upal motivated me to finally write the post.

Background

The SimpleTest code for mimicking browser behavior, specifically the form handling, has required a fair amount of upkeep, improvement, and generally wasted resources that could be better applied elsewhere. I attempted to clean things up with the external browser component, but that effort ended up dying. Overtime I became more and more convinced that we should revisit the basis for our testing system and try to rebuild things on an existing framework.

You may be wondering why we ever decided to create our own framework in the first place, which is indeed a good question. Back before we had a testing framework in core the concern with adding the SimpleTest.org library to core was its size. At the time SimpleTest was not ready to depend on PHP 5 and specifically SimpleXML. It soon became clear that we could develop our own internal browser using a combination of PHP 5 tools that ended up being MUCH smaller then SimpleTest.org's implementation. This process was led by the poor assumption that we should commit the testing framework to Drupal core. In hindsight that was probably not the best idea.

Rather then commit a third-party library to core it should simply be included using a build script (like many other projects) and Drupal has a system tailored to do exactly that, Drush make. We could even use this approach for jQuery instead of committing the entire library into the core repository. Drupal.org already supports invoking Drush make scripts during release building so the general public wouldn't even notice the difference.

In addition to the size problem is the issue of bandwidth, which myself and many others have discussed many times before. Keeping the Drupal testing integration (with testing library) in contrib would allow it to be maintained and more easily developed while removing an unnecessary burden from Drupal core.

Combination of tools

It is great to see Moshe's post and plan for building atop PHPUnit. Using PHPUnit definitely seems like a great start as it provides integration with many other tools, a familiar API, and using drush removes the "two Drupal sites" problem, but PHPUnit doesn't replace the browser component nor provide a JavaScript testing platform. At this point it seems prudent to build the functional tests atop Selenium which allows the tests to be written in PHP while elevating the need for the custom browser component and allowing for JavaScript testing.

I was part of the initial effort to get QUnit into core along with cwgordon7. We had a working version integrated with the test runner, but things were derailed for various reasons which has since spawned the QUnit project on Drupal.org.

Webchick summed it up:

Yeah, ideally we use PHPUnit and QUnit for PHP/JS unit testing, respectively, and Selenium for functional testing. Those are each the best tools for the job.

Selenium has also received some love in the form of SimpleTest integration project on Drupal.org. We have the makings of a great testing setup, but we need to put the pieces together.

Battle plan

Moving forward it seems prudent to continue maintaining each of the pieces in their respective locations and outside of core.

  • Selenium project needs to refactor to build on Upal
  • Rebuild DrupalWebTestCase as a compatibility layer on top of Selenium
  • Integrate QUnit project with Upal in a similiar fashion to Selenium
  • Provide PHPUnit test output parser for testbot
  • Provide a drush make script for testing in core or in a central hub repository in contrib
  • Inventory and refactor Drupal 8 tests to use new system while removing duplicity and waste

Lets make this happen!

Part 2: Breathing new life into the testbot

The purpose of this post is to describe the solution that, after careful consideration, seems best suited to alleviating the situation described in the previous post. Other solutions may exist that we have not considered and that will effectively solve the problem. We are open to discussing alternatives and welcome constructive comments on our proposal. At the same time, we discourage negative comments that do not offer a positive alternative. As it is clear the current situation needs improvement, simply dismissing our proposal without offering a better alternative is not useful.

Proposal

ReviewDriven (RD) is a distributed quality assurance platform built to provide a simple yet powerful interface that makes it easy to apply the best practices of continuous integration, test driven development, and automated quality reviews to your development life cycle. The ReviewDriven stack provides a completely rebuilt system designed to take advantage of Drupal 7 and contributed modules that will allow Drupal.org, Drupal development shops, and site owners to take advantage of automated quality assurance. In Drupal terms, RD is the next generation of the testbot (qa.drupal.org).

We would like to see DO and other interested parties take advantage of automated QA tools. Towards that end, we propose that DO engage RD to assume the role of the testbot and provide those same services to the Drupal community.

Advantages

One of the limitations of the current system and one of the primary concerns we addressed with RD is the lack of control over the testing workflow. For example, the current workflow settings apply globally instead of on a more granular basis. In contrast, the RD platform will allow Drupal.org full control to define the workflow and settings to be used with each review. The integration between the testbot (RD) and Drupal.org will continue to be maintained as an open source module which will allow anyone to contribute ideas and changes to the QA workflow on Drupal.org. Since the ReviewDriven stack provides a versioned API, the Drupal.org integration may be maintained and updated independent of RD and on it's own schedule. This approach leaves all the control and flexibility in the hands of the Drupal community and shifts the burden of the testbot to RD.

Other challenges faced by the current system were also taken into consideration when building ReviewDriven. The ReviewDriven stack is extremely flexible which in itself solves a number of the current issues and opens up a variety of new options. This opens up the possibility of reviews for things like Coder and code coverage.

The RD stack as a whole is much more maintainable since it is built on as many contributed modules as possible. This keeps the actual codebase much smaller then the current system (PIFR). Depending on contributed modules has led us to suggest and contribute a number of improvements to those modules, and to create other contributed modules. This also implies the system is easily maintainable by the Drupal community in the event we open source the code (since the majority of the code is already maintained by others). The RD architecture is analogous to any other Drupal site (sponsored by a good citizen) in that we are maintaining the code specific to our site while contributing back to the community modifications to existing modules and new features designed as generic modules.

Putting QA in context

Using a house as illustration for building a site on Drupal

Our proposal hinges on the fact that the testbot (and most of Drupal.org) provides a direct benefit to everyone but, just like roads and other infrastructure, the cost needs to be shared. Core and contributed modules provide a direct benefit in an indirect way by reducing the amount of and time spent writing custom code, ensuring the base system works as intended, and allowing you to leverage with confidence that code base while building your site. Core and contributed modules represent the sure foundation upon which you build your house.

There has been plenty of discussion about the important role the testing infrastructure and testing as a whole played in Drupal 7 development. The benefit of QA is also evident by the fact that a number of very large Drupal sites launched on Drupal 7 before it was officially released. The stability and dependability offered by quality assurance testing is something everyone wants.

Drupal.org is one of very few open source projects, much less projects in general, to adopt quality assurance and testing. The Drupal core development process requires new features to have tests and bug fixes to include tests. This workflow is encouraged for contributed projects and has been adopted by many of the more used projects among others. Not only does this help ensure the stability and quality of Drupal and its contributed projects, but in turn serves as a selling point and differentiator for Drupal adoption. Given that QA is both an adoption point and a vital tool for improving Drupal, does it not follow that it makes sense to provide funding for a full-time effort towards its improvement?

Just as many Linux developers are full-time employees paid to work on improving Linux, we seek to work full-time on improving the Drupal ecosystem through quality assurance. We are not the first to be funded full-time to work on Drupal or to be paid to improve Drupal.org. A perfect example is Angie Byron (webchick) who was hired by Acquia to work full time on Drupal. Just as Linux was started by hobbyists and grew into a profession, so too Drupal appears to have outgown the ability to be maintained entirely by volunteers.

Funding

We see two separate areas that need funding. The first focuses on taking advantage of the ReviewDriven platform by updating the Drupal.org integration with the new testbot (RD). The second area is the ongoing fee for use of the platform (which includes infrastructure costs). RD will use the ongoing fees to improve and maintain the platform (like any other business).

Harnessing the flexibility provided by ReviewDriven will require a large overhaul of the current Drupal.org QA integration module (PIFT). We envision Drupal taking advantage of the granular settings supported by RD to provide per project, per release, per issue, and per patch settings to control the reviews made. Granular settings will ensure that the various workflows, coding standards, and environments that exist in "contrib" can be handled properly. Many projects have different requirements or adhere to different standards between their various releases. The integration with the new testbot would remain open source as Drupal.org integration and can be funded just like any other Drupal.org project.

We would also like to see a QA status advertised on each project page, possibly even some sort of ranking based on a number of quality assurance metrics. These metrics would help people select between similarly featured modules, advertise that we do QA, and help motivate developers to adopt QA. We have many other ideas for improvements and anticipate suggestions from the community.

The ongoing service also requires ongoing funding to handle infrastructure costs, feature improvements, updates for Drupal core changes, and requests from the Drupal community would ensure that things do not stagnate. We have a vision for new features that would significantly improve the Drupal ecosystem, some of which we have discussed with a few community members.

We envision either the Drupal Association or a group of businesses and other organizations with an interest in Drupal to hire RD as the logical successor to the current testbot. Our business will be to develop and maintain the testbot for use by Drupal.org and other organizations. The same approach can then be applied to other critical peices of infrastructure such as the improvement of Drupal.org and its maintenance. We would like to pioneer this effort for Drupal to further enhance the process and tools available to Drupal contributors and the community.

Further details about the specifics of the arrangement, details of the improvements, and plans for the future can be found in our formal proposal and addendum.

What to expect during the transition

With both the short-term upgrades and improvements to the Drupal.org integration and the ongoing RD services funded, we see the transition to RD taking shape as follows.

The first stage of the transition will require the update of Drupal.org's integration with the testbot to provide basic connectivity with ReviewDriven. Supporting RD will require a number of changes, both user-facing and behind the scenes. In addition, just using the ReviewDriven platform will enable a number of features and workflows.

Once the initial integration is ready to be tested in production, we suggest both ReviewDriven and the predecessor be run in parallel. Running both systems in parallel will provide the community with a preview of what is to come and an opportunity for feedback. Results from the two systems can be compared to provide a final round of human checks and give people time to adjust to the new system. After the completion of the parallel phase the old testbot will be deactivated and the new system will be given priority.

The second phase involves the larger changes necessary to take advantage of ReviewDriven's features and flexibility. We will start discussions and work on this phase as the initial integration stabilizes.

Some of the exciting features we will expose to DO in one or both of the stages include:

Code coverage example
Example of code coverage during a test run
green = executed, red = not executed, gray = ignored (or non-executable)
  • Much improved turnaround time with the ability to scale as the test suite grows
  • Code coverage reports from test runs [picture]
  • Testing of sandboxes (both core forks and modules)
  • Support for the developer application process
  • Drush make scripts for retrieving third party libraries.
  • Drush make files in lieu of parsing the project dependencies
  • Execution of arbitrary commands during various stages of the worker processing
  • Automatic enabling of issue retesting
  • Completely automated site reviews (reviews run against a configured site)
  • Reroll of a patch (using git --rebase) on one issue after a commit on another issue (for example, this large core change)
  • Display of quality metrics
  • Visible branch test results
  • Forcing a patch to run when a branch is broken (in order to fix the branch)
  • Determine the disruption to other patches that would be caused by another patch (e.g. the patch to move all core files)
  • Run Selenium tests
  • Provide separation between the testbot and the Drupal installer by writing a special script maintained by the community

Conclusion

We look forward to feedback about our proposal and encourage you to voice your opinion. Please be sure to be constructive. In case its not obvious, we are extremely passionate about doing this. So let's make this happen.

Part 1: The woes of the testbot

The intent of this series of posts is not to blame people, but rather to point out the testbot needs full-time attention. Integral to this story are the decisions and circumstances that led me to stop working on SimpleTest in core and the "testbot" which runs on qa.drupal.org. I intend to follow-up this post with others dealing with rejuvenation of the testbot and improvements to SimpleTest. I understand some will not agree with my position, but I would like everyone to understand my reasons and intentions, and how we find ourselves in the current state of affairs. After everything is out in the open, my hope is that a useful discussion will ensue and meaningful progress will result.

Factors

Four factors led me to stop working on SimpleTest in core and the testbot:

  • I no longer had gratuitious amounts of free time.
  • I now had a need to make a living (and working on the testbot does not generate any income).
  • The core development process being what it is led to burnout and lack of desire.
  • The request to stop working on the testbot in conjunction with the Drupal 7 code freeze.

With me out of the picture, it magnified the fact that noone else worked on the testbot and, going forward, noone stepped up to take my place.

Background

Lets start off with some background about my involvement with the Drupal testing story.

SimpleTest's journey to core

Rewind the clock back to early 2008. I had gotten involved in Drupal through GHOP and became maintainer of SimpleTest. I proceeded to perform a large-scale refactoring and cleaning up of SimpleTest. This, combined with other community efforts, resulted in SimpleTest being added to Drupal 7 core during the Paris Coding Sprint. The rapid pace at which I was able to develop SimpleTest quickly slowed as I no longer had the ability to commit changes nor make design decisions. Instead, even the most trivial changes took days or weeks to get committed. In spite of these additional challenges, I continued to diligently work on SimpleTest in core. To my dismay I discovered on multiple occasions that large changes were virtually impossible to push through the core queue, and I spent countless hours rerolling patches and refactoring code at various developers' whims. In the end, the patches simply died, but not for lack of quality or merit.

SimpleTest Transition to Core Commit Log
The chart shows 37 commits to the SimpleTest project before and after it was added to Drupal core. It is clear the pace of development slowed immediately and lessened further with time.

Changing course I focused on small changes to SimpleTest in core, but ran into similar throughput issues. For all intents and purposes, my ability to make contributions to SimpleTest had ground to a halt. This led me to write a blog post detailing the problem and possible solutions. I was not alone in my conclusions and many would still like to see the problem resolved. I continued to contribute to core now and then, but I was completely burned out. I even took month long breaks from Drupal as it literally burned me out to try to make any contribution to core. My burnout was not caused by overwork but was due to frustration with the exaggerated length of time to accomplish a minor commit.

Following up SimpleTest with the testbot

On a parallel track, getting SimpleTest into core turned out to be only half of the battle. Actually seeing the tests adopted and maintained remained a challenge. I led the charge to keep the tests in sync (initially doing it almost alone). The effort to create an automated system for running the tests had been underway for quite some time, but lacked the necessary volunteers and commitment to really get it off the ground. I was then asked to take over the project at which point I evaluated its status and decided to start over. I created PIFR, a plan for realizing the goal, and proceeded to rapidly make progress. Testing.drupal.org launched shortly afterward and testing became an integral part of the Drupal core workflow.

With a working system I then laid plans for a second iteration of the testbot with a number of improvements. After heavy development the second generation of the testing system was launched with a massively improved feature set.

Seeking sponsors

After graduation from high school I was no longer financially able to devote large portions of my time to the testing system or core development so I sought sponsors to enable me to continue my work. Acquia provided an internship that allowed me to focus on testing again. After successfully completing the internship I found a job with Examiner.com that allowed me to spend a portion of my time improving and maintaining the automated testing system and roll out the initial work for contributed project testing and a number of other improvements in ATS (PIFR and PIFT) 2.2. The contributed project testing with dependencies was labeled beta because it did not support specific versions and had known issues. The plan was to make a followup release to solve the issues.

Code freeze and the request to stop

After deploying PIFR 2.2, I was asked to stop making changes to the testbot to ensure stability of the testing system during the final stages of Drupal 7 development. I continued to make improvements that I planned to deploy once the freeze was lifted, but the short freeze turned into months and more months. This delay ultimately forced me to stop development before the codebase diverged too much from the active testbot.

PIFR and PIFT commit log
The chart shows my combined commit activity for PIFR and PIFT and indicates the dramatic slowdown that occurred as a result of the freeze placed on the testbot.

During this time I was the only person who worked on the testbot in any significant capacity (or virtually at all). My availability for working on testing dwindled when my time with Examiner ended. This, combined with the stagnation forced upon the testbot, meant things simply ceased moving forward. The complete stagnation is seen in the long period of time between the 2.2 release and the 2.3 release of PIFR on January 28, 2010 and March 28, 2011, respectively. During that entire period of more than a year no changes were made to the testbot. When changes were finally made, they were done merely out of necessity to accommodate the git migration.

Post-freeze undeployed features

Shortly after the 2.2 release I completed a number of improvements before things came to a stand-still. Some of the recent deployments have included functionality that I had completed, most notably:

  • Version control system abstraction and plugins for bazaar, cvs, git, and svn
  • Coder reviews in addition to testing
  • Beta support for contributed project testing with dependencies

Recent changes

As mentioned above, I had already abstracted the version control handling in the testbot and had four plugins (bazaar, cvs, git, and svn). Unfortunately, there were a number of assumptions that had to be made due to limitations with the project module's VCS integration. These assumptions had to be updated for the shiny new version control API. The changes required were very minor and did not represent any feature improvements, but were simply part of the changes necessary to complete the git migration. Randy Fay made the necessary changes and the testbot saw its first update in a very long time. A few small followups were released as part of the planned phasing out of the old patch format and such. It is interesting to note the other major components of the Drupal.org migration were contracted by the Drupal Association except the automated testing system.

Jeremy Thorson has recently been working on using the testbot's ability to perform coder reviews to help solve the woefully broken project application process which he describes in several blog posts. Again we see change coming to the testbot out of necessity rather than a focused plan for improvement. For those not aware of it, the project application queue has several hundred applications and it takes months to even receive a review. Jeremy has worked hard on improving the application process, at the heart of which is the ability to perform automated coder reviews. Providing automated reviews has been held back on multiple fronts not the least of which is finding people to get things done. This is a definite hurdle considering that only three people have every worked on the testbot code itself not to mention there is an average of less than one active maintainer at any give time.

As mentioned above, I had deployed the first stage of contributed project testing over a year ago, but was forced to shelve the follow-up deployments. The code to properly handle module dependencies fell into disarray with the git migration and required refactoring to work with the version control API. Derek Wright and I spent a lot of time hashing out the details to ensure things were properly abstracted for the project module. I completed the code, but it was never committed and thus was not maintained through the migration. Randy took it upon himself to update the code, but deviated from the agreed upon design. This choice meant the code would not be included in the project module and has a number of other ramifications. The feature was rebuilt in a drupal.org specific manner that precludes others from taking advantage of the code and eliminates the possibility of exposing the data through the update XML information. Exposing the data in that fashion would mean projects like drush, drush make, Aegir and others could discard code written to recreate this data or would now be able to support proper dependency handling. In addition, the recent deployment of dependency handling has led to large delays and instability in the testbot.

Conclusion

The decision to freeze the testbot in conjunction with the Drupal 7 code freeze made sense at the time. However, the extended freeze of the testbot (due to the extended Drupal 7 code freeze) along with moving SimpleTest into core had the unintended and disappointing side effect of causing the effective stagnation of the testing system. The only changes to the testbot in the past 20 months have been made out of necessity and annoyance (the git migration and the unfinished testbot integration with the project application process for new developers). During my tenure with Examiner.com, a fair number of changes were made to the testing system but not deployed on drupal.org. The module dependency code had been written over a year ago and finalized shortly thereafter but languished and was never deployed. Recently, some of these changes were finally deployed along with the git migration. All the while, I had set forth a detailed roadmap for the testing system.

The testing system had been stable and running for 3 years. Recent changes (implemented by others) have resulted in the ups and downs of the testing system. The importance of testing to Drupal development coupled with the recent instability strongly suggests the testing system requires full-time attention. The lack of feature changes since the 2.2 release of PIFR in January, 2010 is a direct result of a lack of financial testing resources, the lock-down of the testing system components, the burnout caused by extreme difficulty to make changes, and the extended freeze placed on the testbot.

Various solutions were tried to enable the continuation of work on the testbot. None represented a viable long-term solution. In the end, my father and I decided the solution was to establish a business to advance testing for the Drupal community and to create an environment where we no longer have our hands tied behind our back. In the next post, I will share the vision and passion we have for testing along with several features that could be made available to the community immediately.

Handling multicolumn and aggregated data using Drupal 7 fields

The following technique and many of the tools were developed/improved by ReviewDriven.

Problems

The field API in Drupal 7 combined with Views is very powerful combination. Yet there are certain data structures that are difficult or inefficient to work with using the two tools. One such structure is tables of data with multiple columns which can be stored using the fields API, but have a number of issues that arise.

  • Field data is loaded every time the entity is loaded which can be catastrophic with large datasets.
    • For example, when editing a node with a large dataset attached each value has a corresponding field element generated which eats up memory and processing power quickly and can easily cause white screens.
    • When viewing a node even if the fields are hidden the data is loaded and again is a tax on memory.
  • Relationship between columns is not exported to Views, since it has no way to know, which limits the way values may be displayed.
  • Aggregating columns in Views cannot be done across entities.
  • Views loads field data through the field API which is inefficient for large datasets, but ensures that display formatters and such are respected.

Solutions

Thankfully, there are a number of tools that can be utilized to solve each of these problems and in combination provide a very powerful way to handle tables of data and aggregation across entities.

Field suppress

From the Field suppress project page.

Suppress field data from being loaded during entity_load(). Since field data will not be loaded it will not be displayed nor editable through the interface. This can be handy if you are using an alternate means to display or edit data and/or if you have a large amount of data in fields which will cause the node edit (or similar) interface to use a huge amount of memory and take a very long time to build.

Field suppress solves the first problem, but data will then need to be edited directly through code and manually displayed (ex. Views).

Field group views

The next problem requires an interface/API for defining relationships between fields (or columns). Relationships can be defined using Field group views which is a plugin for Field group which provides both a UI and exportables for defining groups of fields. A display group can be defined and the plugin set to Views which will then export the proper relationships to Views and generate a stub view which can then be customized. The view will automatically replace the fields when displaying the entity. A requirement of Field group views is Views field which actually solves the last two problems.

Views field

Views field exposes the field tables and revision field tables to Views as base tables. Using the field base tables means that field data can be loaded directly instead of going through the field API. Loading data directly is much more efficient (especially for large datasets) and allows for aggregation across multiple entities, but losses the formatting capabilities of the fields API (could possibly add support for formatting to Views field). Formatting can also be added to the exposed field base table through hook_views_data_alter() in the following manor.

<?php
/**
 * Implements hook_views_data_alter().
 */
function conduit_views_data_alter(&$data) {
  
// May not always be '_value' depending on field type.
  
$data['field_data_MY_FIELD']['MY_FIELD_value']['field']['handler'] = 'views_handler_HANDLER';
}
?>

Please note, there are two bugs that cause annoyances due to changes in the Views API, but do no prevent Views field from working. Feel free to submit patches.

Example

I have utilized these tools in combination on a number of projects with quite satisfying results, but I will attempt to provide a few generic examples to provide a clearer picture of how these tools can be used.

Tallying summary results

If you have a collection of entities and you want to be able to group the overall field data and perform SQL operations like COUNT() or SUM() Views field makes it easy. The core poll module could be rewritten using this technique so we will use poll as an example (could be done many ways).

Say you have a node type for "Foo" poll entries that looks something like the following.

poll_foo_entry

  • poll_foo_value: customizable field capable of storing the value for a poll, in this case lets go with a text field

Results can be calculated using a view with the base table poll_foo_value and aggregation enabled. The poll_foo_value column can be used as the group by column and additional columns can be added for determining the COUNT(). You could even then display the results using a chart plugin for views.

I used this technique to create http://survey.reviewdriven.com/results. The tallied results are display on the left.

Table of data

Another powerful usecase is storing and displaying a table of data. Lets use a simple example of storing temperature data over time on nodes (possibly a node per region). A possible node structure is as follows.

temperature_history

  • title: region or some such
  • date: mulivalue date field
  • temperature: mulivalue temperature field

The date and temperature fields can then be related using Field group views by placing them in the same group and displayed using a view. The fields are related based on each fields delta. In other words the data is stored using the field API in the following manor.

date[0] = 2011-09-01
date[1] = 2011-09-02
date[2] = 2011-09-03
date[3] = 2011-09-04
date[4] = 2011-09-06
 
temperature[0] = 70
temperature[1] = 69
temperature[2] = 68
temperature[3] = 67
temperature[4] = 66

The number in brackets being the delta which allows the tables to be joined intelligently and produce a table like the following.

Date Temperature
2011-09-01 70
2011-09-02 69
2011-09-03 68
2011-09-04 67
2011-09-06 66

Improvements

There are still some areas for improvement, but one only has so much time.

Per-Bundle Storage

Storing related field data in a single table allows for the removal of overlapping columns and removes the need to join multiple tables. A field group could then be placed in a separate bundle or otherwise exposed to PBS and stored together in a single table. The database scheme would then be much more manageable and easier to query manually.

Please keep in mind that PBS is not currently functional.

Editing

Being able to edit the dataset using a view would also be a major plus. There a number of modules that provide functionality of this sort, but not in such a flexible manor. Something like Editview would complete Field group views functionality. Large datasets could then be paginated for editing to prevent overload.

Exciting usage

Given that Views is plugable virtually an data structure could be displayed using this technique. The advantage over writing a one off field is that the structure is then easy to extend and modify, and requires little to no code to create. Simply export the fields, and field group definitions.

Fields such as the Name field could be turned into a collection of fields displayed using a view and editable (assuming that gets implemented) using a similar structure.

The possibilities opened up by this technique are quite exciting and I look forward to seeing what people come up with.

Managing upstream integrations and releases

The AWS SDK for PHP Drupal integration project, that I maintain, provides releases that correspond to each release made by Amazon. Having releases that correspond to an upstream source is something that you see in Linux packaging systems and offers some interesting bonuses. In Linux distribution model individuals do not monitor upstream for changes, and packages are rebuilt and increment version regardless of changes to the packaging scripts themselves. Given that Drupal distributions mimic Linux distributions in many ways and even individual site builds I think the reasons behind making corresponding releases are interesting and worth consideration.

To clarify what releases corresponding to upstream releases means consider the following from the AWS SDK for PHP project page.

Since Drupal.org does not allow for three-part version numbers this project follows the AWS SDK for PHP 1.x development line and the release version mapping is as follows. The mapping shows the Drupal release to the Amazon release. Please ensure you are using a Drupal module and Amazon SDK pair that match the version number mapping. For example, use the 4.1 Drupal module with the 1.4.1 Amazon SDK.

  • 4.x -> 1.4.x
  • 3.x -> 1.3.x
  • 2.x -> 1.2.x

Making a release whenever upstream makes a release means that some releases may end up with little or no changes to the actual Drupal module. The AWS SDK module provides a Drush Make file that is updated each release to point to the matching upstream release, but that is the only change for some of the releases.

Making arbitrary releases may sound foolish at first, but consider some of the advantages.

  • Drupal site maintainers and distributions developers can receive update notifications from the standard Drupal core update system instead of having to monitor upstream for releases. Making it easier to watch for updates encourages keeping up-to-date with upstream changes which has implied benefits.
  • Any incompatibilities with a specific version of a third-party system or upstream library do not need to be kept track of since the integration project is always used with a specific upstream release.
  • New features or configuration options from an upstream source may be added immediately to the integration module without worry about incompatibility with older releases of the upstream source. Meaning if a configuration variable changes name or a new option is added it can be exposed through a Drupal UI without worry if people are using the appropriate upstream version (since they are always intended to work with a matching release).
  • When building a site or distribution using Drush Make specific versions of upstream libraries can be easily included using the corresponding Drupal project with a proper make file. If one wants AWS SDK 2.6 simply add projects[awssdk] = 2.6 instead of having to include the Drupal project and override any third party generic make script.
  • Handling of updates, when necessary, is much simplier since you always know the library version that was used with the previous release.

The general pattern that seems to emmerge is that upstream sources integrate better into Drupal workflow, tools, and infrastructure when matching releases are made.

It does not seem that any other (or few if they exist) third-party integrations or library integrations on drupal.org follow this workflow. Making corresponding releases may not be appropriate in all situations, but I think it warrants consideration and I am interested to hear thoughts on the subject.

Reflections on Drupal Quality Assurance

We recently launched ReviewDriven, a distributed quality assurance platform, which is the culmination of months of work and knowledge gained working with Drupal and its community over the past 3+ years. I look forward to feedback from the community regarding ReviewDriven, and being able to fund further development of this service and, at the same time, improving the Drupal quality assurance ecosystem.

Since ReviewDriven is a major event in my life and Drupal career it caused me to reflect on how I got to this point. From my humble beginnings in the Google Highly Open Participation Contest (GHOP), where my testing roots were planted, I received encouragement and mentoring which inspired me to continue working with open source. Previously, I had been interested in contributing to open source, but had never found a place to plug in. Since my start with Drupal I have contributed in a variety of ways to openSUSE, the Linux kernel, and KDE.

After my initial introduction to Drupal, I was received with enthusiasm and the community helped me get to Drupalcon Boston 2008. I took part in the GHOP and SimpleTest presentations. During the coding sprint after the conference, I was approached by Kieran Lal who offered to help me get to a testing sprint in Paris. Again the Drupal community along with some help from Google made it possible for me to attend. Not only was it my first time out of the United States, but I got to spend a few days working closely with some of Drupal's best. During the sprint Drupal took a major step towards realizing automated testing with the introduction of SimpleTest (or rather a fork) into Drupal core.

In the months after the sprint I pushed hard to maintain, add to, and improve the tests in core. At the time patches were committed without much thought given to the tests so keeping the tests passing was a full time job. After discussions with Kieran I ended up taking over the testing.drupal.org (now qa.drupal.org) effort. After a radical redesign and plenty of work we managed to deploy testing.drupal.org and enable integration with the issue queue once we finally got all the tests passing. With the integration also came the adoption of the current "tests always pass" ideology and requirement to include tests with patches which has revolutionized Drupal development. The system even caught some interesting drupal.org bugs.

Again thanks to community support I was able to attend Drupalcon DC and give a presentation with Kieran on the testing saga. The conference was a lot of fun in general and gave me a chance to meet all the people who I had been working with fairly regularly. Later that year I was accepted to Google Summer of Code (GSoC) for the second time and I worked part-time for Acquia as an intern over the same summer. After an exciting summer, with help from the community, I attended Drupalcon Paris 2009 where I gave another presentation on SimpleTest and the automated testing system with Kieran. After a productive Drupalcon we deployed the second version of the automated testing system and continued to improve the system.

Before the end of the year I was hired full-time by Examiner to lead their quality assurance effort. The opportunity provided me with first-hand experience on how quality assurance can fit into an enterprise Drupal development workflow. Additionally, I was able to spend a portion of my time improving the automated testing system so that we could enable partial testing of contributed projects. Examiner required a slightly different approach to testing which formed the basis for SimpleTest 7.x-2.x. Examiner sponsored the development team to attend Druaplcon San Francisco 2010 during which I gave a talk at the Core developer summit on Quality Assurance in Drupal 8, a SimpleTest presentation, and a productive BoF on testing.

In addition to the specifics mentioned I have been blessed to work with and learn from many skilled Drupal developers, and to contribute to Drupal core and contrib which has further refined my skills. My Drupal career has been a great learning experience in addition to being fun and exciting. I look forward to continued involvement with and support from the Drupal community!

Drupal 8 thoughts: configuration management and improved installation

I have been doing a lot of work related to easing the process of building a site from scratch on an individual machine and thus dabbling with configuration management and related topics. Since configuration management is one of the Drupal 8 key initiatives I figured I would share some thoughts I had on the outer fringes of the topic with more to come in the future.

Installation

One area of Drupal that has always seemed a bit odd to me has been the installation process/system. The process can play a key part in configuration management across machines and made it impossibly to implement a basic environment system during installation without hacking core. To remedy this issue I believe the installation system can be much improved, simplified, and made much more consistent with the rest of Drupal which will in tern make it easy to implement my environment system in contrib or core.

The installation system in Drupal 7 was re-factored/rewritten quite a bit and has thus been quite improved, but I think the direction of the installation system could be changed to make it much better. Currently the installer attempts to fake systems in core, like the cache, and actually duplicates a lot of code found elsewhere for module management and what not. Why not simply package a minimal database dump, similar to how the update tests used a Drupal 6 dump written in DBTNG, that can be installed to create an extremely minimal Drupal installation. At that point the installer can act like any other module and provide forms to complete the process. All modules in addition to the required modules can be installed through the standard process invoked from the modules page, but done in an automated fashion through the installer.

The reasons why this approach is beneficial are: 1) install.inc and install.core.inc could be virtually removed, 2) profiles would no longer be "hackish" during the install phase (solve the current issues related to dependency resolution and what not being the same for modules and profiles), 3) hooks like hook_system_info_alter() would work properly for profiles and modules during early installation phases, and 4) remove race conditions in general caused by maintain two sets of the same code.

Environments

In addition, my environment.module would work without hacking core. This concept is nothing new in the development world, but is something I feel would be a great candidate for Drupal 8.

<?php
/**
 * Implements hook_system_info_alter().
 */
function environment_system_info_alter(&$info$file$type) {
  if (!empty(
$info['dependencies'])) {
    
$environment environment_get();
    if (!empty(
$info['dependencies'][$environment])) {
      
$info['dependencies'] = array_merge($info['dependencies'], $info['dependencies'][$environment]);
    }
    foreach (
$info['dependencies'] as $key => $dependency) {
      if (!
is_numeric($key)) {
        unset(
$info['dependencies'][$key]);
      }
    }
  }
}

/**
 * Get the current environment.
 *
 * @return
 *   The current environment: production, staging, or development.
 */
function environment_get() {
  return 
variable_get('environment''production');
}
?>

The above code allows for the environment to be configured in the settings.php file for a site.

<?php
$conf
['environment'] = 'development';
?>

During installation a profile (or module) can perform different tasks or conditional code based on the environment. For example, generated users can have a simple password on a development machine and complex ones on production or staging machines.

<?php
function my_profile_install() {
  if (
environment_get() == 'development') {
    
// Do cool stuff that only devs get to see.
  
}
}
?>

Another cool feature that I have a use-case for on a site I am working on and seems to generally be useful is to enable modules based on the environment. Instead of having to do that in hook_install() or related it makes sense to have a way to specify that in a .info file. The above code allows for the following.

name = My profile
description = ....
version = 0.1
core = 7.x
 
dependencies[] = block
dependencies[] = dblog
 
dependencies[development][] = views_ui
dependencies[development][] = fields_ui
dependencies[development][] = devel
 
dependencies[production][] = integration_with_third_party
dependencies[staging][] = integration_with_third_party

Having this type of functionality in core would hopefully encourage better development practices and seems like a great feature to have. I have a number of scripts in combination with drush make, and the above environment utility that allow me to build out a fully functional site on a new box with a single command. I plan to cleanup the scripts, document them, and provide them in a followup post. As always I would love to hear your thoughts on this subject.

Git bisect saves the day

I have been working on a private project for quite some time using git. Yesterday, I noticed that one of the views on the site was taking around 10 seconds to generate instead of less than a second like it used to. I scanned through the recent commits, scratched my head, and messed with a bunch of stuff, but to no avail. I reverted to a commit from over a month ago just for kicks and sure enough the view rendered quickly again. So I decided it was time to learn how to use git bisect.

From git documentation:

Find by binary search the change that introduced a bug

I had read in passing the general idea behind git bisect and that you could use a script that returned pass or fail to automate the process. After reading through the man page I confirmed that a script may be used and simply needs to return exits status code 0 for pass and 1 for fail. So I figured I could write a script that manually executes the view and check to see if the amount of time required was over a certain threshold. Interestingly it seems the view executes much faster using the script than inside a normal page request. Thus the threshold used is much lower then one might expect.

I created two files since I wanted to use drush php-script and follow the documentation's recommendation by placing the scripts outside the repository. The first script is a wrapper that simply changes to the directory in which Drupal is installed and then executes the drush command.

git_bisect.sh

#!/bin/bash
cd /path/to/drupal
drush php-script ~/check_view.php

check_view.php

<?php
drupal_flush_all_caches
();

$view views_get_view('MY_CUSTOM_VIEW');
$view->set_arguments(array(3)); // Test data.
$start microtime(TRUE);
$view->execute();
$stop microtime(TRUE);

// 0: pass, 1: fail
$diff $stop $start;
$status $diff 0.3 0// Threshold.

var_dump($diff);
var_dump($status);

// If exit(0) is called drush still views it as abnormal shutdown and sets code
// to non-zero so only call when we want abnormal shutdown.
if ($status != 0) {
  exit(
1);
}
?>

My case was a bit more complex since code beyond a certain point was incompatible since I had to backup a related module to work with old revisions.

078d60ad18b73ec356436a7ea30528c95c9c4844 (bad)
3f1cfca83821a6b2d694cf228e5d8af3db20922f (good)

I ran the following inside the repository directory.

git bisect start 078d60ad18b73ec356436a7ea30528c95c9c4844 3f1cfca83821a6b2d694cf228e5d8af3db20922f --
git bisect run ~/git_bisect.sh

I ended up with the following result (-- indicates where I scrubbed data for privacy).

running /home/boombatower/git_bisect.sh
float(0.43191289901733)
int(1)
Drush command terminated abnormally due to an unrecoverable error.
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[50dcca7e9cec514c2bcc24156cd8b4622eb2cd3e] -- message --
running /home/boombatower/git_bisect.sh
float(0.53287100791931)
int(1)
Drush command terminated abnormally due to an unrecoverable error.
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[37e793d693a75a55470e6a92f5e3f30649ee2214] -- message --
running /home/boombatower/git_bisect.sh
float(0.18644404411316)
int(0)
Bisecting: 1 revision left to test after this (roughly 1 step)
[77ddb40b26fe2436cd7a15549109ffa9095d6995] -- message --
running /home/boombatower/git_bisect.sh
float(0.51487994194031)
int(1)
Drush command terminated abnormally due to an unrecoverable error.
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[08d99c98f4b9d837775db47770bc125727d93dc6] -- message --
running /home/boombatower/git_bisect.sh
float(0.1781919002533)
int(0)
77ddb40b26fe2436cd7a15549109ffa9095d6995 is the first bad commit
commit 77ddb40b26fe2436cd7a15549109ffa9095d6995
Author: --
Date:   --
 
    -- message --
 
:040000 040000 a4d3a8cb990d0eff7a7dd87c941a3f12b768feaf 10ab97d60a9b03a56862828e0f9f66cf7f4ef6b4 M      --
:100644 100644 7254a3027edfb35e10b948a9dcd994a9fbdd44a3 0a8ef37931373929328fe6458bf1f595549d265a M      --
bisect run success

Sure enough the "first bad commit" was indeed the commit that caused the performance issue. Very cool!

Drush, Drush Make, other Drupal packages, and development setup for openSUSE

I have recently added drush and drush make packages to my openSUSE repository. For more information or to report bugs on the packages please visit their respective project pages: drush and drush_make.

To install the packages you can use the one-click installers provided by the build service or manually add my repository and install the packages as shown bellow.

su
zypper ar http://download.opensuse.org/repositories/home:/boombatower/openSUSE_11.[2 or 3]/ home:boombatower
zypper in drush drush_make

Also note my existing Drupal packages: drupal-dev and drupal-vhosts, as well as the LAMP Drupal one-click pattern. The latter package (drupal-vhosts) is very useful in setting up a multi-drupal version, multi-subdomain work environment.

To use simply install and run the command to point the virtual hosts to the directory containing your Drupal code.

su
zypper in drupal-vhosts
drupal-vhost /path/to/main/software/directory

Either edit the hosts file directory or use YaST -> Network Services -> Hostnames to add an entry for every Drupal version you wish to run (package currently supports 6, 7, and 8). The relevant lines from my /etc/hosts file are as follows.

127.0.0.1       d7x.loc 
127.0.0.1       d6x.loc 

For my setup I use /home/boombatower/software for all my code with Drupal cores in drupal-7 and drupal-6 directories respectively. If you want to have subdomains for your sites just add more entries to /etc/hosts and use the respective Drupal sites directories.

Personally, I then create symbolic links to all my modules so that the code resides in the root of the software directory, but can be used by any respective site. This makes the paths to modules and what not much shorter and easier to reference from multiple specific sub-sites and what not. For example to link pathauto to the all modules directory for Drupal 7 I would execute the following.

ln -s ~/software/pathauto ~/software/drupal-7/sites/all/modules

Or from within the sites/all/modules directory as I tend to do.

ln -s ~/software/pathauto .

Also note, to enable mod_rewrite and get clean URLs to work simply go to YaST -> System -> /etc/sysconfig Editor then Network -> WWW -> Apache 2 -> APACHE_MODULES and add rewrite to the end of the line. You can do so manually of course as well.

In order for the virtual host changes and apache module addition to take effect you will need to restart apache and for the /etc/hosts changes you need to restart the network which you can do with the following commands run as root.

rcapache2 restart
rcnetwork restart

The end result of all this work is beautiful URLs like: http://d7x.loc/node/1, http://foo.d7x.loc/user, and http://d6x.loc/.

I also create a similar structure within MySQL. First, I set an easy to remember MySQL root password since there really no reason for it not to be easy to remember and it is helpful when having to enter it a lot.

mysqladmin -u root password EASY_TO_REMEMBER_PASSWORD

Next setup a drupal user in MySQL and give the user all permissions to d7x* and d6x* named databases which allows us to use a single user for all our drupal sites (much easier to remember login info) without having to update privileges all the time. I name my databases the same as virtual hosts, so for d7x.loc I would have d7x as the database name and for foo.d7x.loc I would have d7x-foo.

CREATE USER 'drupal'@'localhost' IDENTIFIED BY  'EASY_TO_REMEMBER_PASSWORD';
GRANT USAGE ON * . * TO  'drupal'@'localhost' IDENTIFIED BY  'SAME_EASY_TO_REMEMBER_PASSWORD' ;
 
GRANT ALL PRIVILEGES ON  `d7x%` . * TO  'drupal'@'localhost';
GRANT ALL PRIVILEGES ON  `d6x%` . * TO  'drupal'@'localhost';

Anytime you want to add a database for a new site simply run the following.

CREATE DATABASE  `DATABASE_NAME` ;

Enjoy your fancy development environment!

Pages

Subscribe to boombatower RSS