Drupal on Google App Engine

For the latest information see newer post

Today Google announced PHP support for Google App Engine! I have been one of the lucky folks who had early access and so of course I worked on getting Drupal up and running on GAE. There are a few things that still need to be worked out which I will continue to discuss with the app engine team, but I have a working Drupal setup which I will detail below. Note that much of this may also apply to other PHP frameworks.

Getting up and running

I will cover the steps specific to getting Drupal 7 (notes for Drupal 6 along with branches in repository) up and running on App Engine and not how to use the SDK and development flow which is detailed in the documentation. For an example (minimal profile from core) of Drupal running on Google App Engine see boombatower-drupal.appspot.com.

Sign up to be whitelisted for PHP runtime

Currently, the PHP runtime requires you to sign up specifically for access. Assuming you have access you should be able to follow along with the steps below. Otherwise, the following steps will give you a feel for what it takes to get Drupal running on GAE.

Create an app

Create app by visiting appengine.google.com and clicking Create Application, see the documentation for more details.

Create an Application

Create a Cloud SQL Instance

Follow the documentation for setting up a Cloud SQL Instance. Be sure to give your application access to the instance.

Create a Cloud SQL Instance

Once the instance has been created select the SQL Prompt tab and create a database for your Drupal site as follows.

CREATE DATABASE drupal;

Create a Cloud SQL Database

Download Drupal

There are a few tweaks that need to be made to get Drupal to run properly on GAE which are explained below, but for the purposes of this walk-through one can simply download my branch containing all the changes from github.

git clone --branch 7.x-appengine https://github.com/boombatower/drupal-appengine.git
 
# or for Drupal 6
git clone --branch 6.x-appengine https://github.com/boombatower/drupal-appengine.git

or download as a zip or for Drupal 6 download as a zip.

Configure Drupal database settings

Since GAE does not allow the filesystem to be writeable one must configure the database settings ahead of time.

Copy default.settings.php as settings.php and add the following bellow <?php $databases = array(); ?> around line 213.

<?php
$databases 
= array();
$databases['default']['default'] = array(
  
'driver' => 'mysql',
  
'database' => 'drupal'// The database created above (example used 'drupal').
  
'username' => 'root',
  
'password' => '',
  
// Setting the 'host' key will use a TCP connection which is not supported by GAE.
  // The name of the instance created above (ex. boombatower-drupal:drupal).
  
'unix_socket' => '/cloudsql/[INSTANCE]',
//  'unix_socket' => '/cloudsql/boombatower-drupal:drupal',
  
'prefix' => '',
);
?>

For Drupal 6 around line 91.

<?php
$db_url 
'mysql://root:@cloudsql__boombatower-drupal___drupal/drupal';
?>

Push to App Engine

Update the application name in the app.yaml file to the one you created above and upload by following the documentation.

# See https://developers.google.com/appengine/docs/php/config/appconfig.
 
application: drupal # <-- change this to your application
version: 1
runtime: php
api_version: 1
threadsafe: true
 
handlers:
# Default handler for requests (wrapper which will forward to index.php).
- url: /
  script: wrapper.php
 
# Handle static requests.
- url: /(.*\.(ico$|jpg$|png$|gif$|htm$|html$|css$|js$))
  # Location from which to serve static files.
  static_files: \1
  # Upload static files for static serving.
  upload: (.*\.(ico$|jpg$|png$|gif$|htm$|html$|css$|js$))
  # Ensures that a copy of the static files is left for Drupal during runtime.
  application_readable: true
 
# Catch all unhandled requests and pass to wrapper.php which will simulate
# mod_rewrite by forwarding the requests to index.php?q=...
- url: /(.+)
  script: wrapper.php
appcfg.py update drupal/

Install

Visit your-app.appspot.com/install.php and follow the installation steps just as you would normally except that the database information will already be filled in. Go ahead and ignore the mbstring warning and note that the GAE team is looking into supporting mbstring.

Explanation of changes

If you are interested in what changes/additions were made and the reasons for them continue reading, otherwise you should have a working Drupal install ready to explore! There are a few basic things that do not work perfectly out of the box on GAE. The changes can be seen by diffing the 7.x-appengine branch against the 7.x branch in my repository.

File directory during installation

The Drupal installer requires that the files directory be writeable, but GAE does not allow for local write access thus the requirement must be bypassed in order for the installation to complete.

Author: boombatower <boombatower@google.com>
Date:   Wed May 15 15:49:03 2013 -0700
 
    Hack to trick Drupal into ignoring that file directory is not writable.
 
diff --git a/modules/system/system.install b/modules/system/system.install
index 1b037b8..9931aad 100644
--- a/modules/system/system.install
+++ b/modules/system/system.install
@@ -333,6 +333,8 @@ function system_requirements($phase) {
     }
     $is_writable = is_writable($directory);
     $is_directory = is_dir($directory);
+    // Force Drupal to think the directories are writable during installation.
+    $is_writable = $is_directory = TRUE;
     if (!$is_writable || !$is_directory) {
       $description = '';
       $requirements['file system']['value'] = $t('Not writable');

Clean URLs

In order to take advantage of clean urls, of which most sites take advantage, mod_rewrite is required for Apache environments. Since GAE does not use Apache it does not support mod_rewrite and thus another solution is needed. The app.yaml can configure handlers which allow for wildcard matching which means multiple paths can easily be routed to a single script. Taking that one step further we can alter the <?php $_GET['q']?> variable just as mod_rewrite would so that Drupal functions properly. Rather than modify core this can be done via a wrapper script as show below (this should work well for other PHP applications).

<?php
/**
 * @file
 * Provide mod_rewrite like functionality and correct $_SERVER['SCRIPT_NAME'].
 *
 * Pass through requests for root php files and forward all other requests to
 * index.php with $_GET['q'] equal to path. In terms of how the requests will
 * seem please see the following examples.
 *
 * - /install.php: install.php
 * - /update.php?op=info: update.php?op=info
 * - /foo/bar: index.php?q=/foo/bar
 * - /: index.php?q=/
 */

$path parse_url($_SERVER['REQUEST_URI'], PHP_URL_PATH);

// Provide mod_rewrite like functionality. If a php file in the root directory
// is explicitely requested then load the file, otherwise load index.php and
// set get variable 'q' to $_SERVER['REQUEST_URI'].
if (dirname($path) == '/' && pathinfo($pathPATHINFO_EXTENSION) == 'php') {
  
$file pathinfo($pathPATHINFO_BASENAME);
}
else {
  
$file 'index.php';

  
// Provide mod_rewrite like functionality by using the path which excludes
  // any other part of the request query (ie. ignores ?foo=bar).
  
$_GET['q'] = $path;
}

// Override the script name to simulate the behavior without wrapper.php.
// Ensure that $_SERVER['SCRIPT_NAME'] always begins with a / to be consistent
// with HTTP request and the value that is normally provided (not what GAE
// currently provides).
$_SERVER['SCRIPT_NAME'] = '/' $file;
require 
$file;
?>

PHP $_SERVER['SCRIPT_NAME'] variable

The <?php $_SERVER['SCRIPT_NAME'?> implementation differs from Apache mod_php implementation which can cause issues with a variety of PHP applications. The variable matches the HTTP spec and not the filesystem when called through Apache.

For example a script named foo.php contains the following.

<?php
var_dump
($_SERVER['SCRIPT_NAME']);
?>

When executed from command line here are the results.

$ php foo.php
string(7) "foo.php"
 
$ php ./foo.php
string(9) "./foo.php"

When invoked through Apache like http://example.com/foo.php.

string(8) "/foo.php"

The documentation does not talk about this behavior (although many comments demonstrated the expected Apache behavior), but it is definitely depended on.

The difference causes Drupal to format invalid URLs.

example.com.foo.css (instead of ...com/foo.css)
example.comsubdir/foo.css (instead of ...com/subdir/foo.css)

Drupal derives the URL from <?php dirname() ?> of <?php $_SERVER['SCRIPT_NAME'?> which will return . if no slashes or just / for something like /index.php.

The wrapper script above solves this by ensuring that the SCRIPT_NAME variable alway starts with a leading slash.

HTTP requests

GAE does not yet support support outbound sockets for PHP (although supported for Python and Java) and if/when it does the preferred way will continue to be streams due to automatic caching of outbound requests using urlfetch. I have included a small change to provide basic HTTP requests through drupal_http_request(). A proper solution would be to override the drupal_http_request_function variable and provide a fully functional alternative using streams. Drupal 8 has converted drupal_http_request() to use Guzzle which supports streams. Making a similar conversion for Drupal 7 seems like the cleanest way forward rather than reinventing the change.

php.ini

GAE disables a number of functions for security reasons, but only softly disables some functions which may then be enabled. Drupal provides access to phpinfo() from admin/reports/status and uses output buffering, both of which are disabled by default. The included php.ini enables both functions in addition to getmypid which is used by drupal_random_bytes().

# See https://developers.google.com/appengine/docs/php/config/php_ini.
 
# Required for ob_*() calls which you can find by grepping.
# grep -nR '\sob_.*()' .
output_buffering = "1"
 
# See https://developers.google.com/appengine/docs/php/runtime#Functions-That-Must-Be-Manually-Enabled
# phpinfo: Provided on admin/reports/status under PHP -> "more information".
# getmypid: Used by drupal_random_bytes(), but not required.
google_app_engine.enable_functions = "getmypid, phpinfo"

Future

I plan to continue working with the GAE team to ensure that support for Drupal can be provided in a clean and simple manner. Once current discussions have been resolved I hope to provide more formal documentation and support for Drupal.

File handling

I worked on file support, but there were a number of upcoming changes that would make things much cleaner so I decided to wait. GAE provides a stream wrapper for Google Cloud Storage which makes using the service very simple. Assuming you have completed the prerequisites files on GCS may be accessed using standard PHP file handling functions as shown in the documentation.

<?php
$file 
'gs://my_bucket/hello.txt';
file_put_contents($file'hello world');

$contents file_get_contents($file);
var_dump($contents); // prints: hello world
?>

Unfortunately, the wrapper does not currently support directories nor does file_exists() work properly. Keep in mind that the filesystem is flat so a file may be written to any path without explicitly creating the directory. Meaning one can write to gs://bucket/foo/bar.txt without creating the directory foo. With that being the case it is possible to get some hacky support by simply disabling all the directory code in Drupal, but not really usable. It should be possible to hack support in through the stream wrapper since directories are simply specially name files, but the app engine team has indicated they will look into the matter so hopefully this will be solved cleanly.

Assuming the stream wrappers are fixed up then support can be added in much the same way as that Amazon S3 support is added except that no additional library will be needed.

Additionally, the documentation also notes the following.

Direct file uploads to your POST handler, without using the App Engine upload agent, are not supported and will fail.

In order to support file uploads the form must be submitted to the url provided by CloudStorageTools::createUploadUrl() and the forwarded result handled by Drupal. A benefit of proxying requests through uploader service is that uploaded files may be up to 100TB in size.

Other

There are a number of additional services provided as part of GAE of which Drupal could take advantage.

Closing

Hopefully this will be useful in getting folks up and running quickly on GAE with Drupal and understanding the caveats of the environment. Obviously there is a lot more to cover and I look forward to seeing what others publish on the matter.

Comments

As is often case with many of you who become enamored with a particular technology you never bother to explain why the rest of us should care. What does this do? What does it compete with? Does this provide significant advantages, what are they? Who should pay particular attention to this? If you have no answers to these and similar questions then this becomes nothing more than a time sink for others.

Your comment makes sense and those are definitely good questions to answer. As with most things there are pro and cons to every technology which make it hard to provide a simple answer. The purpose of this post was to act as a tutorial and not as a marketing piece explaining the reasons for using Google app engine. There is a lot of material already available on the subject, but nothing on how to setup Drupal since it just became possible. Doing a search for "google app engine vs ec2" or other hosting platforms provides a lot of details surrounding the difference in billing, level of control / setup required, scaling, ease of use, etc. For example, GAE charges by CPU usage instead of instance hours which can be very beneficial depending on your load, for sites without much traffic it can be much cheaper. GAE is more of "give us your code and we will host it" rather than ec2 which is much more self serve (setup and manage a box) although they have other services on top of it. In the Drupal world GAE would compare more directly to Acquia or Pantheon hosting.

thanks a lot :)

keep going...

This is great, timely, information as I was just about to check this out myself. Sounds like it might be a good idea to wait a bit. Besides, I'm still waiting for approval. This definitely has the potential to be a game changer in the world of hosting. I'm glad we have a Drupal person "on the inside".

James, I have a suggestion. It's probably better for you and others who think the same to just ignore this technology that some of us are enamored with so that it doesn't become a time sink for you. The author spent the time to write an excellent "how" article and many of us appreciate it.

Hi,

Thanks for the great tutorial. I found this excellent post by searching for different CDN's and cloud hosting options.
I checked out the site you created but it seems very slow for a basic drupal 7 site. I am on the east coast.

What's your opinion? How do you thing the speed compares to a traditional server or to other cloud hosting?

Best,
Zoltan
ps: I wanted to sign up for "Notify me when new comments are posted - Replies to my comment" but I don't see an input box for my email address.

Hi,

I've just started a new sandbox project on D.O https://drupal.org/sandbox/FooZee/2032725 with the aim of eventually writing all the required integrations in a set of submodules ...

currently only the MAIL API is only written and working --supposed that you set the website's email to some valid sender, this is yet a work in progress, but I'd really appreciate any feedback

I'm also intending to start writing integration for cloud storage API next then will jump on to task queues hoping to get those soon too :)

I need this so bad. thanks for your effort, it save me a day.

Thanks for getting some example work out on the internet.
I'm super hopeful that GAE will become my Drupal host of choice.

I think I have followed the steps you outline to the t but I'm failing at install.php with

Drupal already installed
Error message
Warning: PDO::__construct(): MySQL server has gone away in DatabaseConnection->__construct() (line 304 of /base/data/home/apps/s~[instance]/1.369296016011812357/includes/database/database.inc).
Warning: PDO::__construct(): Error while reading greeting packet. PID=-1 in DatabaseConnection->__construct() (line 304 of /base/data/home/apps/s~[instance]/1.369296016011812357/includes/database/database.inc

Any chance you found an error in the way you explain using 'unix_socket'?

Warning: PDO::__construct(): MySQL server has gone away in DatabaseConnection->__construct() (line 304 of /base/data/home/apps/s~ggincsite/1.372845341600851701/includes/database/database.inc).
Warning: PDO::__construct(): Error while reading greeting packet. PID=-1 in DatabaseConnection->__construct() (line 304 of /base/data/home/apps/s~ggincsite/1.372845341600851701/includes/database/database.inc)

Now that I've got the base Drupal install set up, more questions...

Since the file system is not writable, I'm having issues with the CSS aggregation provided by CTools. Is there a way around that?

Also, for some reason all my CSS includes are showing up like this: <link type="text/css" rel="stylesheet" href="http://cityvisioninternships.appspot.com/" media="all" /> It's like the app thinks they should be served by the root document. I have the app.yaml set up exactly as in your example.

Take a look at my most recent post which provides integration that covers the aggregation problem. http://blog.boombatower.com/drupal-integration-module-google-app-engine

Turns out that I had CSS aggregation still enabled in the database. When I edited my {variables} table and then flushed the cache, I was able to view the site with CSS properly.

Now the main issue that I'm facing is that the site seems rather slow (1 sec slower on average than our dedicated server), so I'm testing Memcache on it...

thank you very very much for this great tutorial !!!!!!!!

I finally got whitelistet. I will check how it works.
thanks for your time spending on this project.

I tried applying the same steps above to installing Drupal on the local Dev Server for GAE, all I had to do was change the cloudsql unix_socket setting to /tmp/mysql.sock (I'm on OS X), but when I went to do the install and Drupal checks for the requirements it says gd isn't enabled, and indeed phpinfo() shows nothing related to gd. I wonder why this is, I'd need to be able to develop locally to test any incremental changes to my scripts but the local DevServ can't connect to Cloud SQL.

Got this to work recently using this guide, and with some file handling updates by GAE, things are even smoother than when this was first written back in May.

However one big warning to others: GAE quotas currently limit the number of files to 10,000. While this seems generous, I had a modest site with 20-25 modules and quickly surpassed this number. This quota limit will quickly crush the hopes of many looking to use GAE and Drupal with many typical contributed modules.

This is awesome! Thank you.

i get the standard message below when i follow this installation - note that i have used the branch from Git as advised... is there something i have missed in the steps ?

Settings file The settings file is not writable.
The Drupal installer requires write permissions to ./sites/default/settings.php during the installation process. If you are unsure how to grant file permissions, consult the online handbook.

The filesystem is not writable on App Engine. Please be sure you either handcraft the settings.php file as per the instructions or perform the installation locally using the SDK and devappserver.

Add new comment

Filtered HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Type the characters you see in this picture. (verify using audio)
Type the characters you see in the picture above; if you can't read them, submit the form and a new image will be generated. Not case sensitive.