The woes of the testbot

For those not familiar with me, a little research should make it clear that I am the person behind the testbot deployed in 2008 that has revolutionized Drupal core development, stability, etc. and that has been running tens of thousands of assertions with each patch submitted against core and many contributed modules for 6 years.

My intimate involvement with the testbot came to a rather abrupt and unintended end several years ago due to a number of factors (which only a select few members of this community are clearly aware). After several potholes, detours, and bumps in the road, it became clear to me the impossibility of maintaining and enhancing the testbot under the policies and constraints imposed upon me.

Five years ago we finished writing an entirely new testing system, designed to overcome the technical obstacles of the current testbot and to introduce new features that would enable an enormous improvement in resource utilization that could then be used for new and more frequent QA.

Five years ago we submitted a proposal to the Drupal Association and key members of the community for taking the testbot to the next level, built atop the new testing system. This proposal was ignored by the Association and never evaluated by the community. The latter is quite puzzling to me given:

  • the importance of the testbot
  • the pride this open source community has in openly evaluating and debating literally everything (a healthy sentiment especially in the software development world)
  • I had already freely dedicated years of my life to the project.

The remainder of this read will:

  • list some of the items included in our proposal that were dismissed with prejudice five years ago, but since have been adopted and implemented
  • compare the technical merits of the new system (ReviewDriven) with the current testbot and a recent proposal regarding "modernizing" the testbot
  • provide an indication of where the community will be in five years if it does nothing or attempts to implement the recent proposal.

This read will not cover the rude and in some cases seemingly unethical behavior that led to the original proposal being overlooked. Nor will this cover the roller coaster of events that led up to the proposal. The intent is to focus on a technical comparison and to draw attention to the obvious disparity between the systems.

About Face

Things mentioned in our proposal that have subsequently been adopted include:

  • paying for development primarily benefiting drupal.org instead of clinging to the obvious falacy of "open source it and they will come"
  • paying for machine time (for workers) as EC2 is regularly utilized
  • utilizing proprietary SaaS solutions (Mollom on groups.drupal.org)
  • automatically spinning up more servers to handle load (e.g. during code sprints) which has been included in the "modernize" proposal

Comparison

The following is a rough, high-level comparison of the three systems that makes clear the superior choice. Obviously, this comparison does not cover everything.

Baseline Backwards modernization True step forward
System Current qa.drupal.org "Modernize" Proposal ReviewDriven
Status It's been running for over 6 years Does not exist Existed 5 years ago at ReviewDriven.com
Complexity Custom PHP code and Drupal Does not make use of contrib code Mish mash of languages and environments: ruby, python, bash, java, php, several custom config formats, etc.

Will butcher a variety of systems from their intended purpose and attempt to have them all communicate

Adds a number of extra levels of communication and points of failure
Minimal custom PHP code and Drupal

Uses commonly understood contrib code like Views
Maintainability Learning curve but all PHP Languages and tools not common to Drupal site building or maintenance

Vast array of systems to learn and the unique ways in which they are hacked
Less code to maintain and all familiar to Drupal contributors
Speed Known; gets slower as test suite grows due to serial execution Still serial execution and probably slower than current as each separate system will add additional communication delay An order of magnitude faster thanks to concurrent execution

Limited by the slowest test case

*See below
Extensibility (Plugins) Moderately easy, does not utilize contrib code so requires knowledge of current system Several components, one on each system used

New plugins will have to be able to pass data or tweak any of the layers involved which means writing a plugin may involve a variety of languages and systems and thus include a much wider breadth of required knowledge
Much easier as it heavily uses commons systems like Views

Plugin development is almost entirely common to Drupal development:
define storage: Fields
define display: Views
define execution: CTools function on worker

And all PHP
Security Runs as same user as web process Many more surfaces for attack and that require proper configuration Daemon to monitor and shutdown job process, lends itself to Docker style with added security
3rd party integration Basic RSS feeds and restricted XML-RPC client API Unknown Full Services module integration for public, versioned, read API and write for authorized clients
Stability When not disturbed, has run well for years, primary causes of instability include ill-advised changes to the code base

Temporary and environment reset problems easily solved by using Docker containers with current code base
Unknown but multiple systems imply more points of failure Same number of components as current system

Services versioning which allows components to be updated independently

Far less code as majority depends on very common and heavily used Drupal modules which are stable

2-part daemon (master can react to misbehaving jobs)

Docker image could be added with minimal effort as system (which predates Docker) is designed with same goals as Docker
Resource utilization Entire test suite runs on single box and cannot utilize multiple machines for single patch Multiple servers with unshared memory resources due to variety of language environments

Same serial execution of test cases per patch which does not optimally utilize resources
An order of magnitude better due to concurrent execution across multiple machines

Completely dynamic hardware; takes full advantage of available machines.

*See below
Human interaction Manually spin up boxes; reduce load by turning on additional machines Intended to include automatic EC2 spin up, but does not yet exist; more points of failure due to multiple systems Additional resources are automatically turned on and utilized
Test itself Tests could be run on development setup, but not within the production testbot Unknown Yes, due to change in worker design.

A testbot inside a testbot! Recursion!
API Does the trick, but custom XML-RPC methods Unknown Highly flexible input configuration is similar to other systems built later like travis-ci

All entity edits are done using Services module which follows best practices
3rd party code Able to test security.drupal.org patches on public instance Unknown, but not a stated goal Supports importing VCS credentials which allows testing of private code bases and thus supports the business aspect to provide as a service and to be self sustaining

Results and configuration permissioned per user to allow for drupal.org results to be public on the same instance as private results
Implemented plugins Simpletest, coder None exist Simpletest, coder, code coverage, patch conflict detection, reroll of patch, backport patch to previous branch
Interface Well known; designed to deal with display of several 100K distinct test results; lacks revision history; display uses combination of custom code and Views Unknown as being built from scratch and not begun

Jenkins can not support this interface (in Jenkins terminology multiple 100K jobs) so will have to be written from scratch (as proposal confirms and was reason for avoiding Jenkins in past)

Jenkins was designed for small instances within businesses or projects, not a large central interface like qa.drupal.org
Hierarchical results navigation from project, branch, issue, patch

Context around failed assertion (like diff -u)

Minimizes clutter, focuses on results of greatest interest (e.g. failed assertions); entirely built using Views so highly customizable

Simplified to help highlight pertinent information (even icons to quickly extract status)

Capable of displaying partial results as they are concurrently streamed in from the various workers

Speed and Resource Utilization

Arguably one of the most important advantages of the ReviewDriven system is concurrency. Interestingly, after seeing inside Google I can say this approach is far more similar to the system Google has in place than Jenkins or anything else.

Systems like Jenkins and especially travis-ci, which for the purpose of being generic and simpler, do not attempt to understand the workload being performed. For example Travis simply asks for commands to execute inside a VM and presents the output log as the result. Contrast that with the Drupal testbot which knows the tests being run and what they are being run against. Why is this useful? Concurrency.

Instead of running all the test cases for a single patch on one machine, the test cases for a patch may be split out into separate chunks. Each chunk is processed on a different machine and the results are returned to the system. Because the system understands the results it can reassemble the chunked results in a useful way. Instead of an endlessly growing wait time as more tests are added and instead of having nine machines sitting idle while one machine runs the entire test suite all ten can be used on every patch. The wait time effectively becomes the time required to run the slowest test case. Instead of waiting 45 minutes one would only wait perhaps 1 minute. The difference becomes more exaggerated over time as more tests are added.

In addition to the enormous improvement in turnaround time which enables the development workflow to process much faster you can now find new ways to use those machine resources. Like testing contrib projects against core commits, or compatibility tests between contrib modules, or retesting all patches on commit to related project, or checking what other patches a patch will break (to name a few). Can you even imagine? A Drupal sprint where the queue builds up an order of magnitude more slowly and runs through the queue 40x faster?

Now imagine having additional resources automatically started when the need arises. No need to imagine...it works (and did so 5 years ago). Dynamic spinning up of EC2 resources which could obviously be applied to other services that provide an API.

This single advantage and the world of possibility it makes available should be enough to justify the system, but there are plenty more items to consider which were all implemented and will not be present in the proposed initiative solution.

Five Years Later

Five years after the original proposal, Drupal is left with a testbot that has languished and received no feature development. Contrast that with Drupal having continued to lead the way in automated testing with a system that shares many of the successful facets of travis-ci (which was developed later) and is superior in other aspects.

As was evident five years ago the testbot cannot be supported in the way much of Drupal development is funded since the testbot is not a site building component placed in a production site. This fact drove the development of a business model that could support the testbot and has proven to be accurate since the current efforts continue to be plagued by under-resourcing. One could argue the situation is even more dire since Drupal got a "freebie" so to speak with me donating nearly full-time for a couple of years versus the two spare time contributors that exist now.

On top of lack of resources the current initiative, whose stated goal is to "modernize" the testbot, is needlessly recreating the entire system instead of just adding Docker to the existing system. None of the other components being used can be described as "modern" since most pre-date the current system. Overall, this appears to be nothing more than code churn.

Assuming the code churn is completed some time far in the future; a migration plan is created, developed, and performed; and everything goes swimmingly, Drupal will have exactly what it has now. Perhaps some of the plugins already built in the ReviewDriven system will be ported and provide a few small improvements, but nothing overarching or worth the decade it took to get there. In fact the system will needlessly require a much rarer skill set, far more interactions between disparate components, and complexity to be understood just to be maintained.

Contrast that with an existing system that can run the entire test suite against a patch across a multitude of machines, seamlessly stitch the results together, and post back the result in under a minute. Contrast that with having that system in place five years ago. Contrast that with the whole slew of improvements that could have also been completed in the four years hence by a passionate, full-time team. Contrast that with at the very least deploying that system today. Does this not bother anyone else?

Contrast that with Drupal being the envy of the open source world, having deployed a solution superior to travis-ci and years earlier.

Please post feedback on drupal.org issue.

Fitting a board inside a window sill

Working on a personal project, I came across the need to cut a board such that I could rotate it into place inside a window sill. One could cut the board an inch or so sorter, try it out, and adjust as necessary, but what fun is that. Instead we can use math to find the answer!

After drawing out the problem, labeling the knowns and unknowns, I started playing around with the problem for a bit trying to see what relationships I could draw. One can represent the problem as a simple right triangle with the bottom of the board as the hypotenuse, the bottom of the window, and a side of the window as the other sides of the triangle.

right triangle

Given an angle x you could determine the longest board that would fit (ie the hypotenuse). Given a function that represents the length of a board that would fit at a given angle, the minimum of the function could be found to determine the maximum length board that could make it all the way though. Cosine of the angle between the board and the bottom of the sill (x in the diagram) provides that relationship if the board were simply a line and can be written as follows.

cos(x) = window width (a) / board length (c)

Next one needs to account for the fact that the board is not just a line, but has width and when rotated the starting point of the board bottom is moved away from the window sill wall.

board animation

That means the effective length of the window bottom (side a) in our equation is offset by some amount. Looking at the animation you can see that the corner of the board forms two right triangles. Since the bottom of the window is considered a continuous line there are 180 degrees (pi radians) on either side. The board takes up 90 degrees (pi/2 radians) leaving 90 degrees to split between the two angles made by the board with the bottom of the window. If we take the original equation and rewrite it to include the offset from the second triangle we get the following.

cos(x) = (window width - offset) / board length

We can create an equation for the small triangle given that we know the hypotenuse is the width of the board we know the angle (z) is complementary to angle x and we are trying to solve for the adjacent side (offset).

cos(y) = offset / board width

Solving for the offset and putting in terms of x:

y = (pi / 2) - x 
cos((pi / 2) - x) = offset / board width
offset = board width * cos((pi / 2) - x)

Plugging into the original equation with the offset and solving for board length.

cos(x) = (window width - (board width * cos((pi / 2) - x))) / board length
board length = (window width - (board width * cos((pi / 2) - x))) / cos(x)

Graphed using the following (values for my window and board (1x8)).

window width = 35 inches
board width = 8 inches
minimize (35 - 8 * cos((pi / 2) - x))) / cos(x) over [0, pi/2]

maximum board length graph

A graph zoomed in to [0, pi / 4] better demonstrates the length change.

maximum board length graph

The shortest length the board may be is not while the board is not while resetting flat against the window sill (x = 0 radians) which makes sense. Additionally, in the first graph you can see the board length goes out to infinity at pi/2 (90 degrees) which makes sense if you consider a board standing perfectly upright would never touch the other side of the window. The answer that Wolfram Alpha provides is:

board length = 34.0735 inches

That means one should cut the board to 34 inches to be able to rotate it into place! For your convenience I have provided a widget which you can use to solve for your own use. Keep in mind this works for more than just windows, any rectangle object you wish to rotate into another rectangle.

Drupal integration module for Google App Engine

Since posting Drupal on Google App Engine the App Engine team has been working hard to identify and improve many of the troublesome areas identified in the original post, other external sources, and further internal discussions. In addition, I have developed a Drupal integration module for Google App Engine. The combination of the integration module and the work done internally provides a more compelling Drupal experience on App Engine.

For those of you who are not sure what features App Engine provides or why to consider App Engine have a look at the reference material. Getting started is also easier than ever as the whitelist has been removed and the SDK comes bundled with PHP. The rest of this post will focus on what improvements have been made, what the integration provides, and how to make use of it.

To see a functioning Drupal site making use of the App Engine module and the Memcache module see my demo site. The demo site includes an example of a file field served out of Google Cloud Storage.

If you are just interested in making use of the integration visit the Google App Engine Drupal project, have a look at the included README and/or the Getting started section of this post. The second half covers some technical details on the implementation for those who are interested.

Getting started

The preferred method of developing for App Engine is to work locally using the SDK and the included development server. Once the site is ready it can be uploaded to App Engine.

You can choose to skip the local development setup and work directly on App Engine, but you will still need the SDK setup for uploading the app. Regardless, there are a number of ways to get a hold of Drupal and the integration module.

  • all-in-one download
  • drush make
  • manual

All-in-one download

Simply download the full release containing a patched Drupal core, Google App Engine module, and Memcache module.

Extract the files and enjoy.

Drush make

The App Engine module includes Drush make scripts. There are two profiles: minimal using drupal.make; and full (to include more going forward) using drupal-full.make. Only the latter contains the memcache module.

Depending on which profile is desired, invoke the appropriate command.

$ drush make http://drupalcode.org/project/google_appengine.git/blob_plain/refs/heads/7.x-1.x:/root/drupal-full.make
$ drush make http://drupalcode.org/project/google_appengine.git/blob_plain/refs/heads/7.x-1.x:/root/drupal.make

Manual

Obviously, the components can be downloaded and manually assembled as well.

Drupal installation

Follow the normal process for installing Drupal. Keep in mind that the Drupal files will not be writable on App Engine so any changes to settings.php or any other modules configuration will need to be made prior to upload. See the SETTINGS.PHP section of the README for details on setting up the settings.php file for development against the local server and production App Engine.

The gist of the comments is to use the following for database credentials, filling in the {} sections.

<?php
if(strpos($_SERVER['SERVER_SOFTWARE'], 'Google App Engine') !== false) {
  
// Cloud SQL database credentials.
  
$databases['default']['default'] = array(
    
'database' => '{DATABASE}',
    
'username' => 'root',
    
'password' => '',
    
'unix_socket' => '/cloudsql/{SOME_PROJECT}:{DATABASE}',
    
'port' => '',
    
'driver' => 'mysql',
    
'prefix' => '',
  );
}
else {
  
// Local database credentials.
  
$databases['default']['default'] = array(
    
'database' => '{DATABASE}',
    
'username' => '{USERNAME}',
    
'password' => '{PASSWORD}',
    
'host' => 'localhost',
    
'port' => '',
    
'driver' => 'mysql',
    
'prefix' => '',
  );
}
?>

Memcache module

If you choose to make use of the Memcache module be sure to follow the setup instructions. For the default setup simply add the following lines to the bottom of settings.php.

<?php
$conf
['cache_backends'][] = 'sites/all/modules/memcache/memcache.inc';
// The 'cache_form' bin must be assigned no non-volatile storage.
$conf['cache_class_cache_form'] = 'DrupalDatabaseCache';
$conf['cache_default_class'] = 'MemCacheDrupal';
$conf['memcache_key_prefix'] = 'something_unique';
?>

Additionally, a patch should be applied to define MEMCACHE_COMPRESSED which is missing from the App Engine implementation (fix upcoming).

App Engine module

To make use of the integration enable the App Engine module. In order to use Google Cloud Storage be sure to configure the default storage bucket by visiting admin/config/media/file-system in your Drupal site.

GCS settings

If you choose to enable CSS/JS aggregation be sure to read through the serving options on admin/config/development/performance and choose the one that best suites your workflow.

GCS settings

Importing

If you are looking to import an existing site into Google App engine take a look at the following documentation links.

Be sure to add and enable the App Engine module to the existing code base.

Support

If you encounter any difficulties please let us know via the appropriate channel.

  • For general App Engine (PHP) support please visit Stackoverflow
  • For issues specific to this Drupal module please visit the issue queue

Integration details

The rest of the post will discuss the implementation details behind the integration. The features provided by the 1.0 release of the App Engine module are as follows. See previous blog post for details on what led up to this work.

  • App Engine mail service
  • Cloud Storage
  • Drupal core patch

App Engine mail service

Implements Drupal MailSystemInterface to make use of the App Engine mail service. The system email address will be used as the default from address and must be authorized to send mail. To configure the address, visit admin/config/system/site-information. For details on App Engine mail service, read this document.

For further details see the mail integration code.

Cloud Storage

The GAE team has provided a PHP stream wrapper which allows the use of standard PHP file handling functions for interacting with GCS. The current implementation requires a storage bucket in the file path which means applications must be altered to not only make use of the stream wrapper, but include a bucket in all file paths. Additionally, Drupal requires the implementation of an additional set of methods (DrupalStreamWrapperInterface) on top of the default set required for all PHP stream wrappers.

Instead of attempting to provide a format that allows the bucket to be optionally included, the best route forward is to provide a bucketless stream wrapper that always assumes no bucket is included and instead uses a default bucket. The new stream wrapper would sit atop the default stream wrapper and add the default bucket to all paths before handing off to the parent implementation. For lack of anything more descriptive the letter b (for bucketless) was appended to the gs stream wrapper. The examples below demonstrate the usage.

  • gs://defaultbucket/dir1/dir2/file
  • gsb://dir1/dir2/file (assumed defaultbucket and thus equivalent to first example)

The additional stream wrapper solution means that paths can be identical to those used with a local file-system, but applications wishing to utilize more than one bucket can still do so.

An additional implication of removing the bucket is that it allows for staging sites in different environments with the same set of files and corresponding data since a different bucket may be configured for the entire site instead of duplicated in each path and requiring changing. Obviously, those applications that choose to use more than one bucket will need to handle the cases themselves. This also aids in site migration as file paths stored in database do not need to be changed.

In lieu of an upstream GAE PHP runtime user-space setting the Drupal module will provide a typical Drupal setting and make use of it in the bucketless stream wrapper. In the future, it would make sense for the Drupal setting to merely set the upstream user-space setting.

Implementation

The following is the class hierarchy used for implementing the bucketless and Drupal specific stream wrappers discussed in details below.

GCS hierachy

In order to facilitate a clean implementation and the possibility for moving upstream, while working within the restriction that the current stream wrapper is a final class, a rough facsimile, that acts as a proxy, of the base implementation is provided in order to allow for extension. The bucketless stream wrapper is built on top of the facsimile. This provides two basic stream wrapper implementations without any Drupal specific additions. The facsimile is implemented as a PHP Trait in order to allow for multiple inheritance as needed later for the Drupal wrappers.

There are two levels of integration with Drupal that make sense to allow GCS to be used as comprehensively and easily as possible. The first is providing stream wrappers that implement the additional functionality defined by the DrupalStreamWrapperInterface and the second is overriding the default provided local file-system stream wrappers to use GCS. The first is required for the second, but also allows for the use of GCS in a specific manner vs catch-all local file-system.

The three core stream wrappers (private://, public://, temporary://) are overridden via hook_stream_wrappers_alter() to use the GCS wrappers. The storage bucket must be configured in order for the GCS integration to function properly. The standard mechanisms for controlling the file system setup (admin/config/media/file-system) can be used and file fields can be stored within one of the default stream wrappers.

File MIME types are determined by DrupalLocalStreamWrapper::getMimeType() which consults file_mimetype_mapping() for a mapping of extensions to MIME types. The type is included in the stream context when writing files to GCS and as such the file will be served with the assigned MIME type.

Drupal core patch

In order to have Drupal run properly on Google App Engine a few changes need to be made to Drupal core. Those changes can be found in root/core.patch which is managed in the 7.x-appengine branch and rebased on top of Drupal core updates. The patch creates three other files within the appengine root directory that need to be placed in the Drupal root.

  • Alters drupal_http_request() in common.inc to work without requiring socket support.
  • Alters drupal_move_uploaded_file() in includes/file.inc to support newly uploaded files from the $_FILES array being referenced via a stream wrapper. In the case of App Engine all uploaded files are uploaded through the GCS proxy, hosted on GCS, and thus start with gs://. The change should be generally useful and has been rolled as a core patch.
  • Alters file_upload_max_size() in includes/file.inc to only check PHP ini setting 'upload_max_filesize' instead of also checking 'post_max_size' which is normally relevant, but in the case of App Engine is not since all uploads are sent through GCS proxy and are thus not affected by app instance post limits.
  • Alters drupal_tempnam() in includes/file.inc to manually simulate tempnam() since it is currently not supported by App Engine.
  • Alters system_file_system_settings() in modules/system/system.admin.inc to include #wrapper_scheme property to be picked up by system_check_directory() in modules/system/system.module. Given that the current code voids using the stream wrappers this is technically a bug and is a candidate for being fixed in Drupal core as well.
  • Alters system_requirements() in modules/system/system.install to skip the directory check since the GCS integration will not be loaded until the App Engine module is enabled.

A number of the changes included in the patch are being looked at and will hopefully becoming unnecessary in the future. The following are also included in the patch as a convenience, but they do not alter Drupal core.

  • Add app.yaml to root which provides basic information about the app to Google App Engine so that it can invoke Drupal properly.
  • Add php.ini to root which enables some php functions used by Drupal and turns on output buffering.
  • Adds wrapper.php to root which simulates Apache mod_rewrite like behavior.

Aggregate CSS/JS

Since a local writable file-system is not available on Google App Engine for various reasons, the ability for Drupal to aggregate CSS and JS into combined files is restricted. There are three choices.

  • Directly from static files (recommended, but requires proper setup)

    Serving from static files requires that the aggregate files be uploaded with the app. There are a couple of ways to achieve this some of which are better than others.

    • Build site locally using the development server and generate the files locally. During upload the aggregate files will be present and included with app.
    • Upload app and generate the files while running on App Engine and written to GCS. Download the files locally into the app and re-upload. This method means that your app may serve with out-of-date CSS or JS until you re-upload which can cause all sort of issues.

      gsutil makes it easy to download the css and js files from GCS.

      Run the following with the relevant values filled in.

./gsutil cp -R gs://{BUCKET}/sites/default/files/css {~/path/to/drupal}/sites/default/files/
./gsutil cp -R gs://{BUCKET}/sites/default/files/js {~/path/to/drupal}/sites/default/files/
  • From GCS using Drupal router as proxy (default)

    By default, aggregate files are served via a Drupal router which acts as a GCS proxy. The proxy should always work without any additional configuration, but this will consume instance hours for serving static aggregate resources.

  • Directly from GCS

    Serving directly from GCS does not require uploading static files with the app, but can cause difficulties since resources referenced from CSS will need to be uploaded to GCS as well (or referenced using an absolute URL). Also note that that the CSS and JS files will be served from a different domain which may cause complications.

Closing

There are areas that could be improved and I plan to continue working so stay tuned. If you have any ideas or want to pitch in you may do so in the Google App Engine issue queue. As always I look forward to your feedback.

Drupal on Google App Engine

For the latest information see newer post

Today Google announced PHP support for Google App Engine! I have been one of the lucky folks who had early access and so of course I worked on getting Drupal up and running on GAE. There are a few things that still need to be worked out which I will continue to discuss with the app engine team, but I have a working Drupal setup which I will detail below. Note that much of this may also apply to other PHP frameworks.

Getting up and running

I will cover the steps specific to getting Drupal 7 (notes for Drupal 6 along with branches in repository) up and running on App Engine and not how to use the SDK and development flow which is detailed in the documentation. For an example (minimal profile from core) of Drupal running on Google App Engine see boombatower-drupal.appspot.com.

Sign up to be whitelisted for PHP runtime

Currently, the PHP runtime requires you to sign up specifically for access. Assuming you have access you should be able to follow along with the steps below. Otherwise, the following steps will give you a feel for what it takes to get Drupal running on GAE.

Create an app

Create app by visiting appengine.google.com and clicking Create Application, see the documentation for more details.

Create an Application

Create a Cloud SQL Instance

Follow the documentation for setting up a Cloud SQL Instance. Be sure to give your application access to the instance.

Create a Cloud SQL Instance

Once the instance has been created select the SQL Prompt tab and create a database for your Drupal site as follows.

CREATE DATABASE drupal;

Create a Cloud SQL Database

Download Drupal

There are a few tweaks that need to be made to get Drupal to run properly on GAE which are explained below, but for the purposes of this walk-through one can simply download my branch containing all the changes from github.

git clone --branch 7.x-appengine https://github.com/boombatower/drupal-appengine.git
 
# or for Drupal 6
git clone --branch 6.x-appengine https://github.com/boombatower/drupal-appengine.git

or download as a zip or for Drupal 6 download as a zip.

Configure Drupal database settings

Since GAE does not allow the filesystem to be writeable one must configure the database settings ahead of time.

Copy default.settings.php as settings.php and add the following bellow <?php $databases = array(); ?> around line 213.

<?php
$databases 
= array();
$databases['default']['default'] = array(
  
'driver' => 'mysql',
  
'database' => 'drupal'// The database created above (example used 'drupal').
  
'username' => 'root',
  
'password' => '',
  
// Setting the 'host' key will use a TCP connection which is not supported by GAE.
  // The name of the instance created above (ex. boombatower-drupal:drupal).
  
'unix_socket' => '/cloudsql/[INSTANCE]',
//  'unix_socket' => '/cloudsql/boombatower-drupal:drupal',
  
'prefix' => '',
);
?>

For Drupal 6 around line 91.

<?php
$db_url 
'mysql://root:@cloudsql__boombatower-drupal___drupal/drupal';
?>

Push to App Engine

Update the application name in the app.yaml file to the one you created above and upload by following the documentation.

# See https://developers.google.com/appengine/docs/php/config/appconfig.
 
application: drupal # <-- change this to your application
version: 1
runtime: php
api_version: 1
threadsafe: true
 
handlers:
# Default handler for requests (wrapper which will forward to index.php).
- url: /
  script: wrapper.php
 
# Handle static requests.
- url: /(.*\.(ico$|jpg$|png$|gif$|htm$|html$|css$|js$))
  # Location from which to serve static files.
  static_files: \1
  # Upload static files for static serving.
  upload: (.*\.(ico$|jpg$|png$|gif$|htm$|html$|css$|js$))
  # Ensures that a copy of the static files is left for Drupal during runtime.
  application_readable: true
 
# Catch all unhandled requests and pass to wrapper.php which will simulate
# mod_rewrite by forwarding the requests to index.php?q=...
- url: /(.+)
  script: wrapper.php
appcfg.py update drupal/

Install

Visit your-app.appspot.com/install.php and follow the installation steps just as you would normally except that the database information will already be filled in. Go ahead and ignore the mbstring warning and note that the GAE team is looking into supporting mbstring.

Explanation of changes

If you are interested in what changes/additions were made and the reasons for them continue reading, otherwise you should have a working Drupal install ready to explore! There are a few basic things that do not work perfectly out of the box on GAE. The changes can be seen by diffing the 7.x-appengine branch against the 7.x branch in my repository.

File directory during installation

The Drupal installer requires that the files directory be writeable, but GAE does not allow for local write access thus the requirement must be bypassed in order for the installation to complete.

Author: boombatower <boombatower@google.com>
Date:   Wed May 15 15:49:03 2013 -0700
 
    Hack to trick Drupal into ignoring that file directory is not writable.
 
diff --git a/modules/system/system.install b/modules/system/system.install
index 1b037b8..9931aad 100644
--- a/modules/system/system.install
+++ b/modules/system/system.install
@@ -333,6 +333,8 @@ function system_requirements($phase) {
     }
     $is_writable = is_writable($directory);
     $is_directory = is_dir($directory);
+    // Force Drupal to think the directories are writable during installation.
+    $is_writable = $is_directory = TRUE;
     if (!$is_writable || !$is_directory) {
       $description = '';
       $requirements['file system']['value'] = $t('Not writable');

Clean URLs

In order to take advantage of clean urls, of which most sites take advantage, mod_rewrite is required for Apache environments. Since GAE does not use Apache it does not support mod_rewrite and thus another solution is needed. The app.yaml can configure handlers which allow for wildcard matching which means multiple paths can easily be routed to a single script. Taking that one step further we can alter the <?php $_GET['q']?> variable just as mod_rewrite would so that Drupal functions properly. Rather than modify core this can be done via a wrapper script as show below (this should work well for other PHP applications).

<?php
/**
 * @file
 * Provide mod_rewrite like functionality and correct $_SERVER['SCRIPT_NAME'].
 *
 * Pass through requests for root php files and forward all other requests to
 * index.php with $_GET['q'] equal to path. In terms of how the requests will
 * seem please see the following examples.
 *
 * - /install.php: install.php
 * - /update.php?op=info: update.php?op=info
 * - /foo/bar: index.php?q=/foo/bar
 * - /: index.php?q=/
 */

$path parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);

// Provide mod_rewrite like functionality. If a php file in the root directory
// is explicitely requested then load the file, otherwise load index.php and
// set get variable 'q' to $_SERVER['REQUEST_URI'].
if (dirname($path) == '/' && pathinfo($pathPATHINFO_EXTENSION) == 'php') {
  
$file pathinfo($pathPATHINFO_BASENAME);
}
else {
  
$file 'index.php';

  
// Provide mod_rewrite like functionality by using the path which excludes
  // any other part of the request query (ie. ignores ?foo=bar).
  
$_GET['q'] = $path;
}

// Override the script name to simulate the behavior without wrapper.php.
// Ensure that $_SERVER['SCRIPT_NAME'] always begins with a / to be consistent
// with HTTP request and the value that is normally provided (not what GAE
// currently provides).
$_SERVER['SCRIPT_NAME'] = '/' $file;
require 
$file;
?>

PHP $_SERVER['SCRIPT_NAME'] variable

The <?php $_SERVER['SCRIPT_NAME'?> implementation differs from Apache mod_php implementation which can cause issues with a variety of PHP applications. The variable matches the HTTP spec and not the filesystem when called through Apache.

For example a script named foo.php contains the following.

<?php
var_dump
($_SERVER['SCRIPT_NAME']);
?>

When executed from command line here are the results.

$ php foo.php
string(7) "foo.php"
 
$ php ./foo.php
string(9) "./foo.php"

When invoked through Apache like http://example.com/foo.php.

string(8) "/foo.php"

The documentation does not talk about this behavior (although many comments demonstrated the expected Apache behavior), but it is definitely depended on.

The difference causes Drupal to format invalid URLs.

example.com.foo.css (instead of ...com/foo.css)
example.comsubdir/foo.css (instead of ...com/subdir/foo.css)

Drupal derives the URL from <?php dirname() ?> of <?php $_SERVER['SCRIPT_NAME'?> which will return . if no slashes or just / for something like /index.php.

The wrapper script above solves this by ensuring that the SCRIPT_NAME variable alway starts with a leading slash.

HTTP requests

GAE does not yet support support outbound sockets for PHP (although supported for Python and Java) and if/when it does the preferred way will continue to be streams due to automatic caching of outbound requests using urlfetch. I have included a small change to provide basic HTTP requests through drupal_http_request(). A proper solution would be to override the drupal_http_request_function variable and provide a fully functional alternative using streams. Drupal 8 has converted drupal_http_request() to use Guzzle which supports streams. Making a similar conversion for Drupal 7 seems like the cleanest way forward rather than reinventing the change.

php.ini

GAE disables a number of functions for security reasons, but only softly disables some functions which may then be enabled. Drupal provides access to phpinfo() from admin/reports/status and uses output buffering, both of which are disabled by default. The included php.ini enables both functions in addition to getmypid which is used by drupal_random_bytes().

# See https://developers.google.com/appengine/docs/php/config/php_ini.
 
# Required for ob_*() calls which you can find by grepping.
# grep -nR '\sob_.*()' .
output_buffering = "1"
 
# See https://developers.google.com/appengine/docs/php/runtime#Functions-That-Must-Be-Manually-Enabled
# phpinfo: Provided on admin/reports/status under PHP -> "more information".
# getmypid: Used by drupal_random_bytes(), but not required.
google_app_engine.enable_functions = "getmypid, phpinfo"

Future

I plan to continue working with the GAE team to ensure that support for Drupal can be provided in a clean and simple manner. Once current discussions have been resolved I hope to provide more formal documentation and support for Drupal.

File handling

I worked on file support, but there were a number of upcoming changes that would make things much cleaner so I decided to wait. GAE provides a stream wrapper for Google Cloud Storage which makes using the service very simple. Assuming you have completed the prerequisites files on GCS may be accessed using standard PHP file handling functions as shown in the documentation.

<?php
$file 
'gs://my_bucket/hello.txt';
file_put_contents($file'hello world');

$contents file_get_contents($file);
var_dump($contents); // prints: hello world
?>

Unfortunately, the wrapper does not currently support directories nor does file_exists() work properly. Keep in mind that the filesystem is flat so a file may be written to any path without explicitly creating the directory. Meaning one can write to gs://bucket/foo/bar.txt without creating the directory foo. With that being the case it is possible to get some hacky support by simply disabling all the directory code in Drupal, but not really usable. It should be possible to hack support in through the stream wrapper since directories are simply specially name files, but the app engine team has indicated they will look into the matter so hopefully this will be solved cleanly.

Assuming the stream wrappers are fixed up then support can be added in much the same way as that Amazon S3 support is added except that no additional library will be needed.

Additionally, the documentation also notes the following.

Direct file uploads to your POST handler, without using the App Engine upload agent, are not supported and will fail.

In order to support file uploads the form must be submitted to the url provided by CloudStorageTools::createUploadUrl() and the forwarded result handled by Drupal. A benefit of proxying requests through uploader service is that uploaded files may be up to 100TB in size.

Other

There are a number of additional services provided as part of GAE of which Drupal could take advantage.

Closing

Hopefully this will be useful in getting folks up and running quickly on GAE with Drupal and understanding the caveats of the environment. Obviously there is a lot more to cover and I look forward to seeing what others publish on the matter.

Steam for Linux thoughts

I am a long-time Linux user and avid gamer who has always been excited about the prospect of gaming on Linux. I setup various games in wine, applied patched, maintained a game on WineHQ, tried out the Linux version of games like Unreal 2004 and humble indie bundle games, etc. When the first rumors about Valve possibly porting Steam to Linux started to spread a couple of years ago I couldn't wait. Fast forward and now Steam for Linux is a reality. Obviously, I have been playing with Steam on Linux and various games from my collection that are available for Linux. I noticed a couple of nice differences when gaming on Linux and figured it was worth writing up my overall thoughts.

Steam for Linux

No more repetitive installation

Unlike Windows where you are greeted with the installing DirectX dialog over and over Steam on Linux simply starts the game. It isn't that big of a deal for typical Linux applications since package managers remove the need for silly installation wizards, but it is one of the small things that just feels smoother.

Full-screen window focus

One of the more annoying aspects of using full-screen applications on Windows with multiple monitors is that although I can see the applications on the other screens while playing the game I am forced to completely minimize the game in order to use any other applications. Most of the time I usually have other persistent applications such as Skype open while playing games and it is somewhat annoying to switch back and forth. World of Warcraft seemed to get around this with a full-screen windowed mode that removed the window decoration and made it the size of the screen (thus looking the same as full-screen) without entering full-screen mode. This meant you could switch much more quickly and without loosing the game window. Being able to keep an eye on the game while doing something outside is handy and this is the norm in my experience with full-screen applications on Linux.

Valve listening to community

Valve created a GitHub repo for the purpose of tracking issues on with Steam on Linux and has done an excellent job reading through, managing, responding, and actually fixing the issues presented. Valve even worked on an agreeable license that allowed for distributions to package Steam. I filed a few issues myself and was impressed with the prompt responses from Valve employees and see things get fixed.

Drivers

A big pain point in the Linux world has been video drivers. When I originally started using Linux the proprietary drivers were pretty much the only way to go. During my initial days of Linux I used an Nvidia card and the proprietary drivers provided by Nvidia were not bad, but were definitely nothing like the Windows drivers. Various activities would result in unpleasant behavior and slowness. I later purchased an ATI card and with the open source drivers really coming into their own I started using them. Although the OSS drivers (both radeon [ati] and nouveau [nvidia]) worked great for 2D they had little to no support for 3D). I would argue that both the OSS drivers perform much better then their proprietary counterparts for general desktop use.

With the advent of Steam for Linux I figured it was worth trying out the latest ATI drivers with my Radeon HD 7970 and put them through their paces. I was quite hopefully with the posts from Valve stating they got better performance then Windows with the Nvidia drivers and seemed to be working to get the drivers improved. The latest driver release from ATI has a specific note about a fix for Big Picture mode in Steam.

[370839]: Resolves a sporadic Steam Big Picture mode crashing issue encountered with AMD Radeon™ HD 7000, 6000 Series

I was pleasantly surprised to find that the drivers worked quite well and a number of my prior complaints were no longer an issue. The drivers are still not perfect, but they perform quite well. WebGL demos run beautifully and the games I have tried from Steam work extremely well. I was even able to max the settings for Trine 2 for a beautiful result.

Closing thoughts

It will be interesting to see how Valve's move plays out for the future of the Linux desktop and gaming. I can only image what being able to profile things through the Linux kernel source will allow game developers in terms of tuned and determining issues. There will obviously be holdouts and what not, but without gaming holding people back from a purely superficial view Linux costs nothing and can browse the web just fine...what more do 99% of Windows users need? Of course who doesn't love wobbly Steam? I am definitely looking forward to see improved driver quality and with the continued rise of HTML5 there being less and less platform specific development.

For those interested I am running on openSUSE 12.2 (tumbleweed). For easy installation just visit software.opensuse.org and search for steam.

Google

I took my first big step into open source and Drupal through the Google Highly Open Participation Contest back in 2007-2008. The next two summers I went on to participate in Google Summer of Code and just last month I started working full-time at Google! Never did I imagine such a progression back when I first started.

Working for Google is a great opportunity for numerous reasons not the least of which...it's Google. The software, languages, and tools that Google uses are quite different from the tools to which I am accustomed, but have lots of similarities. It has been encouraging to see the similarities to Drupal in the processes and reasons for the ways things are done. The story behind Google's testing efforts follows a similar path to that of getting testing into Drupal core, the difficulties, processes setup, and the systems put in place. Hearing the same reasons for decisions as we came up with in the Drupal project has been encouraging. On the flip side looking at the differences in development processes and tools has been rather enlightening.

The aspect that I am most excited about for Drupal is using my Google "20% time" to contribute to Drupal. Having the ability to spend more regular time working on Drupal projects such as qa.drupal.org is something that I believe will make a big difference. On my first 20% day, Friday (today), I will be working on the next generation testing system to replace the current qa.drupal.org. The system is the open sourced portion of the platform that powers ReviewDriven. The code can be found on github at drupalorg_qa, drupalorg_qa_worker, and also through a number of projects on Drupal.org which will become the primary hub. The plan is to run the new system alongside the existing system and to demonstrate it to the community in the near future.

Another big change that came with the job was the move to Mountain View, California from Omaha, Nebraska. Quite a big change both in environment/living style and leaving friends/family to which I am still getting used to, but I expect it to be a rewarding opportunity. I would like to thank those in the community whose encouragement has meant a lot to me, specifically Angie Byron (webchick) and Kieran Lal (amazon).

Lastly, did I mention the amazing connection speed form my office! :)

Noogler welcome

Automatically backport commits using git

For those who just want the result please scroll to the bottom, otherwise you can read the story.

I started looking into how to backport commits since I know how easily git forward ports commits and figured there had to be a better way then the tricks and manual methods recommended in documentation and various results from google. My original plan was to make a plugin that would re-roll patches that need forward porting automatically on drupal.org and if possible backport as well. Over the course of an hour and half I found a variety of snippets that got me part of the way there, developed a basic idea of how to accomplish backporting in git, and slowly refined it to built-in commands. Huge thanks to cbreak and FauxFaux in #git on freenode!

The basic idea I started from was to simply remove all the commits between the commit you want to patch on top of and the last commit in the tree. To test my approach I started by creating a simple situation representing 7.x -> 8.x using the following.

$ git checkout -b test 8.x
$ git mv core/CHANGELOG.txt core/CHANGELOG2.txt
$ git commit -m "move"
 
$ echo "hello world" > core/CHANGELOG2.txt
$ git commit -am "edit"

I then proceeded to remove the 2nd to last commit ("move") to see if the last commit ("edit") would be updated to edit core/CHANGELOG.txt instead of core/CHANGELOG2.txt.

$ git rebase -i HEAD ~3 # commented out the "move" commit and saved
$ git show

To my delight the "edit" commit was indeed updated.

The next step was to figure out how to remove all the commits between the one to be backported and the last point at which 7.x and 8.x "last overlap" or at the point 8.x was branched (those are not necessarily the same and in Drupal's case they are not). I first envisioned dumping git rebase -i to a file, editing with script to remove commits and then rebasing. To that end it was suggested to change the GIT_EDITOR environment variable to a grep command and then use git rebase -i which would work, but the better approach is to use --onto which does exactly that.

$ git rebase --onto=A B

In simplest terms: remove all commits between A and B non-inclusively.

Now I simply needed a way to find the best point to merge onto. There are several snippets in stackexchange and elsewhere that find the "oldest ancestor" which would work, but a) are not optimal, and b) require long snippets that are not easy to understand. When I dug around in #git I was pointed to git merge-base which finds the best point for such an operation.

$ git merge-base 7.x 8.x

Finds the last common commit between the 7.x and 8.x branches. Using this we get the following.

$ git rebase --onto=`git merge-base 7.x 8.x` HEAD~1

Backport the latest commit onto the merge-base for 7.x and 8.x. Finally, we just need to forward port the patch to the end of the 7.x branch which is easy.

$ git rebase 7.x

Result

So putting it all together we get the following powerful one-liner that will backport the last commit in the 8.x branch to the 7.x branch.

$ git rebase --onto=`git merge-base 7.x 8.x` HEAD~1 && git rebase 7.x

If you want to backport something other than the last commit simply checkout a branch with that commit as the HEAD.

$ git checkout -b backport-branch commit-to-backport

This can also be extended to backport a patch straight on drupal.org by downloading and committing the patch first.

$ your facorite download method (wget, curl, etc)
$ git apply some.patch
$ git commit -am "patch to backport" 
$ git rebase --onto=`git merge-base 7.x 8.x` HEAD~1 && git rebase 7.x
$ git show > some.patch

You can of course use a fancier git format-patch and what not, but this gets the point across. Enjoy!

Tags:

Code coverage reports

Today, I made public a whole bunch of commits I made while improving the Code coverage project for use with ReviewDriven. You may remember the screenshot I included in Part 2: Breathing new life into the testbot which is where this is coming from. Thanks to the improvements the module now actually works and is much more efficient, polished, accurate, and much more. With that in mind I am proud to announce the first stable release of the project targeted at Drupal 7.14.

Using the module you can get coverage reports for individual page executions or any combination of tests. Code coverage reports can be extremely useful in determining areas of code that are not tested at all or provide a snapshot of the code involved in generating a page.

Pages

Code coverage can be recorded by adding ?code_coverage=true to the end of a page URL. After the page has completed execution a linked will be placed at the bottom of the HTML which will display outside the page style.

Coverage page

The link will open the coverage report generated for that page request. The report will include all the files that where loaded during the execution of the page.

Tests

Similarly, a link is provided for the code coverage recorded during a test run.

Coverage test

Reports

The report includes two parts: 1) the summary or index, and 2) the line-by-line coverage information. The links described above point to the summary of the coverage information.

Coverage summary

The links in the summary point to the line-by-line coverage information overlayed on the corresponding code. The colors indicate the following.

  • green = executed
  • red = not executed
  • gray = ignored (or non-executable)

Coverage example

Filters

The coverage scope may be filtered to focus on improving coverage for a particular module/file/directory. Reducing the scope will also improve the coverage recording performance which may be useful when when dealing with large tests.

Coverage filter

Future

I have already integrated the Code coverage project with Conduit (the open source ReviewDriven platform) which will be replacing the current system running qa.drupal.org. The plan is to get the new platform up and running in parallel with the current system at which time regular coverage runs against core (and contrib projects) can be made publicly available.

Part 3: Testing battle plan

I have been meaning to make this post for quite some time. I have posted pieces before and had many discussions, but I have yet to write it out formally. Moshe's recent post about Upal motivated me to finally write the post.

Background

The SimpleTest code for mimicking browser behavior, specifically the form handling, has required a fair amount of upkeep, improvement, and generally wasted resources that could be better applied elsewhere. I attempted to clean things up with the external browser component, but that effort ended up dying. Overtime I became more and more convinced that we should revisit the basis for our testing system and try to rebuild things on an existing framework.

You may be wondering why we ever decided to create our own framework in the first place, which is indeed a good question. Back before we had a testing framework in core the concern with adding the SimpleTest.org library to core was its size. At the time SimpleTest was not ready to depend on PHP 5 and specifically SimpleXML. It soon became clear that we could develop our own internal browser using a combination of PHP 5 tools that ended up being MUCH smaller then SimpleTest.org's implementation. This process was led by the poor assumption that we should commit the testing framework to Drupal core. In hindsight that was probably not the best idea.

Rather then commit a third-party library to core it should simply be included using a build script (like many other projects) and Drupal has a system tailored to do exactly that, Drush make. We could even use this approach for jQuery instead of committing the entire library into the core repository. Drupal.org already supports invoking Drush make scripts during release building so the general public wouldn't even notice the difference.

In addition to the size problem is the issue of bandwidth, which myself and many others have discussed many times before. Keeping the Drupal testing integration (with testing library) in contrib would allow it to be maintained and more easily developed while removing an unnecessary burden from Drupal core.

Combination of tools

It is great to see Moshe's post and plan for building atop PHPUnit. Using PHPUnit definitely seems like a great start as it provides integration with many other tools, a familiar API, and using drush removes the "two Drupal sites" problem, but PHPUnit doesn't replace the browser component nor provide a JavaScript testing platform. At this point it seems prudent to build the functional tests atop Selenium which allows the tests to be written in PHP while elevating the need for the custom browser component and allowing for JavaScript testing.

I was part of the initial effort to get QUnit into core along with cwgordon7. We had a working version integrated with the test runner, but things were derailed for various reasons which has since spawned the QUnit project on Drupal.org.

Webchick summed it up:

Yeah, ideally we use PHPUnit and QUnit for PHP/JS unit testing, respectively, and Selenium for functional testing. Those are each the best tools for the job.

Selenium has also received some love in the form of SimpleTest integration project on Drupal.org. We have the makings of a great testing setup, but we need to put the pieces together.

Battle plan

Moving forward it seems prudent to continue maintaining each of the pieces in their respective locations and outside of core.

  • Selenium project needs to refactor to build on Upal
  • Rebuild DrupalWebTestCase as a compatibility layer on top of Selenium
  • Integrate QUnit project with Upal in a similiar fashion to Selenium
  • Provide PHPUnit test output parser for testbot
  • Provide a drush make script for testing in core or in a central hub repository in contrib
  • Inventory and refactor Drupal 8 tests to use new system while removing duplicity and waste

Lets make this happen!

Pages

Subscribe to boombatower RSS