views

Handling multicolumn and aggregated data using Drupal 7 fields

The following technique and many of the tools were developed/improved by ReviewDriven.

Problems

The field API in Drupal 7 combined with Views is very powerful combination. Yet there are certain data structures that are difficult or inefficient to work with using the two tools. One such structure is tables of data with multiple columns which can be stored using the fields API, but have a number of issues that arise.

  • Field data is loaded every time the entity is loaded which can be catastrophic with large datasets.
    • For example, when editing a node with a large dataset attached each value has a corresponding field element generated which eats up memory and processing power quickly and can easily cause white screens.
    • When viewing a node even if the fields are hidden the data is loaded and again is a tax on memory.
  • Relationship between columns is not exported to Views, since it has no way to know, which limits the way values may be displayed.
  • Aggregating columns in Views cannot be done across entities.
  • Views loads field data through the field API which is inefficient for large datasets, but ensures that display formatters and such are respected.

Solutions

Thankfully, there are a number of tools that can be utilized to solve each of these problems and in combination provide a very powerful way to handle tables of data and aggregation across entities.

Field suppress

From the Field suppress project page.

Suppress field data from being loaded during entity_load(). Since field data will not be loaded it will not be displayed nor editable through the interface. This can be handy if you are using an alternate means to display or edit data and/or if you have a large amount of data in fields which will cause the node edit (or similar) interface to use a huge amount of memory and take a very long time to build.

Field suppress solves the first problem, but data will then need to be edited directly through code and manually displayed (ex. Views).

Field group views

The next problem requires an interface/API for defining relationships between fields (or columns). Relationships can be defined using Field group views which is a plugin for Field group which provides both a UI and exportables for defining groups of fields. A display group can be defined and the plugin set to Views which will then export the proper relationships to Views and generate a stub view which can then be customized. The view will automatically replace the fields when displaying the entity. A requirement of Field group views is Views field which actually solves the last two problems.

Views field

Views field exposes the field tables and revision field tables to Views as base tables. Using the field base tables means that field data can be loaded directly instead of going through the field API. Loading data directly is much more efficient (especially for large datasets) and allows for aggregation across multiple entities, but losses the formatting capabilities of the fields API (could possibly add support for formatting to Views field). Formatting can also be added to the exposed field base table through hook_views_data_alter() in the following manor.

<?php
/**
 * Implements hook_views_data_alter().
 */
function conduit_views_data_alter(&$data) {
  
// May not always be '_value' depending on field type.
  
$data['field_data_MY_FIELD']['MY_FIELD_value']['field']['handler'] = 'views_handler_HANDLER';
}
?>

Please note, there are two bugs that cause annoyances due to changes in the Views API, but do no prevent Views field from working. Feel free to submit patches.

Example

I have utilized these tools in combination on a number of projects with quite satisfying results, but I will attempt to provide a few generic examples to provide a clearer picture of how these tools can be used.

Tallying summary results

If you have a collection of entities and you want to be able to group the overall field data and perform SQL operations like COUNT() or SUM() Views field makes it easy. The core poll module could be rewritten using this technique so we will use poll as an example (could be done many ways).

Say you have a node type for "Foo" poll entries that looks something like the following.

poll_foo_entry

  • poll_foo_value: customizable field capable of storing the value for a poll, in this case lets go with a text field

Results can be calculated using a view with the base table poll_foo_value and aggregation enabled. The poll_foo_value column can be used as the group by column and additional columns can be added for determining the COUNT(). You could even then display the results using a chart plugin for views.

I used this technique to create http://survey.reviewdriven.com/results. The tallied results are display on the left.

Table of data

Another powerful usecase is storing and displaying a table of data. Lets use a simple example of storing temperature data over time on nodes (possibly a node per region). A possible node structure is as follows.

temperature_history

  • title: region or some such
  • date: mulivalue date field
  • temperature: mulivalue temperature field

The date and temperature fields can then be related using Field group views by placing them in the same group and displayed using a view. The fields are related based on each fields delta. In other words the data is stored using the field API in the following manor.

date[0] = 2011-09-01
date[1] = 2011-09-02
date[2] = 2011-09-03
date[3] = 2011-09-04
date[4] = 2011-09-06
 
temperature[0] = 70
temperature[1] = 69
temperature[2] = 68
temperature[3] = 67
temperature[4] = 66

The number in brackets being the delta which allows the tables to be joined intelligently and produce a table like the following.

Date Temperature
2011-09-01 70
2011-09-02 69
2011-09-03 68
2011-09-04 67
2011-09-06 66

Improvements

There are still some areas for improvement, but one only has so much time.

Per-Bundle Storage

Storing related field data in a single table allows for the removal of overlapping columns and removes the need to join multiple tables. A field group could then be placed in a separate bundle or otherwise exposed to PBS and stored together in a single table. The database scheme would then be much more manageable and easier to query manually.

Please keep in mind that PBS is not currently functional.

Editing

Being able to edit the dataset using a view would also be a major plus. There a number of modules that provide functionality of this sort, but not in such a flexible manor. Something like Editview would complete Field group views functionality. Large datasets could then be paginated for editing to prevent overload.

Exciting usage

Given that Views is plugable virtually an data structure could be displayed using this technique. The advantage over writing a one off field is that the structure is then easy to extend and modify, and requires little to no code to create. Simply export the fields, and field group definitions.

Fields such as the Name field could be turned into a collection of fields displayed using a view and editable (assuming that gets implemented) using a similar structure.

The possibilities opened up by this technique are quite exciting and I look forward to seeing what people come up with.

Git bisect saves the day

I have been working on a private project for quite some time using git. Yesterday, I noticed that one of the views on the site was taking around 10 seconds to generate instead of less than a second like it used to. I scanned through the recent commits, scratched my head, and messed with a bunch of stuff, but to no avail. I reverted to a commit from over a month ago just for kicks and sure enough the view rendered quickly again. So I decided it was time to learn how to use git bisect.

From git documentation:

Find by binary search the change that introduced a bug

I had read in passing the general idea behind git bisect and that you could use a script that returned pass or fail to automate the process. After reading through the man page I confirmed that a script may be used and simply needs to return exits status code 0 for pass and 1 for fail. So I figured I could write a script that manually executes the view and check to see if the amount of time required was over a certain threshold. Interestingly it seems the view executes much faster using the script than inside a normal page request. Thus the threshold used is much lower then one might expect.

I created two files since I wanted to use drush php-script and follow the documentation's recommendation by placing the scripts outside the repository. The first script is a wrapper that simply changes to the directory in which Drupal is installed and then executes the drush command.

git_bisect.sh

#!/bin/bash
cd /path/to/drupal
drush php-script ~/check_view.php

check_view.php

<?php
drupal_flush_all_caches
();

$view views_get_view('MY_CUSTOM_VIEW');
$view->set_arguments(array(3)); // Test data.
$start microtime(TRUE);
$view->execute();
$stop microtime(TRUE);

// 0: pass, 1: fail
$diff $stop $start;
$status $diff 0.3 0// Threshold.

var_dump($diff);
var_dump($status);

// If exit(0) is called drush still views it as abnormal shutdown and sets code
// to non-zero so only call when we want abnormal shutdown.
if ($status != 0) {
  exit(
1);
}
?>

My case was a bit more complex since code beyond a certain point was incompatible since I had to backup a related module to work with old revisions.

078d60ad18b73ec356436a7ea30528c95c9c4844 (bad)
3f1cfca83821a6b2d694cf228e5d8af3db20922f (good)

I ran the following inside the repository directory.

git bisect start 078d60ad18b73ec356436a7ea30528c95c9c4844 3f1cfca83821a6b2d694cf228e5d8af3db20922f --
git bisect run ~/git_bisect.sh

I ended up with the following result (-- indicates where I scrubbed data for privacy).

running /home/boombatower/git_bisect.sh
float(0.43191289901733)
int(1)
Drush command terminated abnormally due to an unrecoverable error.
Bisecting: 7 revisions left to test after this (roughly 3 steps)
[50dcca7e9cec514c2bcc24156cd8b4622eb2cd3e] -- message --
running /home/boombatower/git_bisect.sh
float(0.53287100791931)
int(1)
Drush command terminated abnormally due to an unrecoverable error.
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[37e793d693a75a55470e6a92f5e3f30649ee2214] -- message --
running /home/boombatower/git_bisect.sh
float(0.18644404411316)
int(0)
Bisecting: 1 revision left to test after this (roughly 1 step)
[77ddb40b26fe2436cd7a15549109ffa9095d6995] -- message --
running /home/boombatower/git_bisect.sh
float(0.51487994194031)
int(1)
Drush command terminated abnormally due to an unrecoverable error.
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[08d99c98f4b9d837775db47770bc125727d93dc6] -- message --
running /home/boombatower/git_bisect.sh
float(0.1781919002533)
int(0)
77ddb40b26fe2436cd7a15549109ffa9095d6995 is the first bad commit
commit 77ddb40b26fe2436cd7a15549109ffa9095d6995
Author: --
Date:   --
 
    -- message --
 
:040000 040000 a4d3a8cb990d0eff7a7dd87c941a3f12b768feaf 10ab97d60a9b03a56862828e0f9f66cf7f4ef6b4 M      --
:100644 100644 7254a3027edfb35e10b948a9dcd994a9fbdd44a3 0a8ef37931373929328fe6458bf1f595549d265a M      --
bisect run success

Sure enough the "first bad commit" was indeed the commit that caused the performance issue. Very cool!

Subscribe to RSS - views