The Schema API Defining Database Structures

The next task in our module installation process is the defining of our new table.

In previous versions of Drupal, a module author wrote SQL DDL (Data Definition Language) to define any new database structures. However, this caused portability problems. Since SQL DDL is not particularly transferable from one DBMS (DataBase Management System) to another, modules that worked on, say, MySQL could not then be installed on PostgreSQL.

Database abstraction is a good goal

In the past, Drupal has been tightly integrated with MySQL, with moderate support for PostgreSQL. But as Drupal continues to evolve, it is likely that support may be extended to other databases. Oracle, MS ^ SQL Server, and others have been suggested as database platforms for

Drupal. For this reason, Drupal developers are starting to emphasize the desirability of portable SQL. Database-agnostic APIs and generic SQL are desirable for keeping modules portable.

To rectify this situation—and also simplify database management tasks — Drupal developers added a new API. This Schema API provides the facilities for defining abstract database structures, and then converting them to DBMS-specific SQL dDl In other words, the API allows you to define tables, indexes, and so on as PHP code. Then, as necessary, the Schema API can translate these PHP data structures into SQL specific to the particular database engine. From a single schema, Drupal can then create MySQL and PostgreSQL versions (or, indeed, versions for any supported DBMS).

[ _ ■ The Schema API overview describes the API and provides links to all of "

the functions in the Schema API:

Also, once Drupal has a complete schema, it can perform different operations. For example, a table schema can be used to create or drop a table (as well as its indexes).

Practically speaking, then, we will use the hook_schema() hook to define a new schema, and let Drupal do the conversions to SQL as required.

A First Look at the Table Definition

For our new content type, we only need to create a single table, so our hook implementation will be fairly straightforward:

* Implementation of hook_schema(). */

function biography_schema() { $schema['biography'] = array(

'fields' => array( 'vid' => array( 'type' => 'int', 'unsigned' => TRUE, 'not null' => TRUE, 'default' => 0,

'nid' => array( 'type' => 'int', 'unsigned' => TRUE, 'not null' => TRUE, 'default' => 0,

'type' => 'varchar', 'length' => 127, 'not null' => TRUE, 'default' => '',

// Note: On MySQL, text fields cannot have default values, 'life' => array('type' => 'text', 'not null' => FALSE), 'works' => array('type' => 'text', 'not null' => FALSE),

// Version is primary key. Could do nid, vid.

return $schema;

A hook_schema() implementation is expected to return an associative array, where the top level of each array is a table name. Consider a simplified version of the second line of biography_schema():

The $schema array will be the associative array we will return. This line indicates that we are defining a table named biography.

However, this isn't a complete schema. We need to add information about each column in our table.

Our table is to have five columns: vid, nid, dates, life, and works. Let's talk about the purpose of each of these, and then we will revisit the code.

• vid: This is the version ID of the entry in this table. VIDs are unique to the table, and will serve as the primary key. In a Drupal installation that supports versioning, each version of a document will result in a new line being created in this table. This field is required for a content type extension table.

• nid: The node ID tracks the unique ID of a particular node (specified in the node table). When Drupal retrieves a node, it will retrieve the latest available vid for the requested nid. This field is required for a content type extension table.

• dates: This will be a free-form text field for entering date information. Why free-form text instead of a date selection field? Because we need to allow less than precise dates like "Circa 500 B.C.E. to 450 B.C.E" or "Unknown to 368 C.E.".

• life: This field will contain free-form text content (actually, restricted HTML text) that will describe the individual's life.

• works: This will also be used to hold restricted HTML text. This field is used to store bibliographic information.

Now that we know basically what each column is supposed to do, let's take a closer look at the schema.

Defining Fields (Columns)

Inside the array of the biography schema, there are three main arrays:

$schema['biography'] = array( 'fields' => array() 'indexes' => array() 'primary key' => array('vid'),

These arrays provide various aspects of the table's definition.

• fields: This array contains a definition for each of the columns in the table. Since our table will have five columns, this array will contain five entries.

• indexes: This array contains an entry for each index that is to be added to the table.

• primary key: This array contains entries for each component of a primary key.

In addition to these three, the schema API defines two more:

• unique key: This array contains entries for each unique (non-primary) key in a table.

• description: This allows you to attach a textual description to the schema definition.

Let's look at a couple of the definitions in the fields array:

'nid' => array( 'type' => 'int', 'unsigned' => TRUE, 'not null' => TRUE, 'default' => 0,

'type' => 'varchar', 'length' => 127, 'not null' => TRUE, 'default' => '',

Here are the definitions for the nid column and the dates column. The NID (Node ID) is a positive (unsigned) integer that can never be null. The purpose of this field is to correlate our extended table with the main nodes table. A nid in this table will always correspond to the nid field in the node table. Hence, we don't need to configure it to increment automatically.

If we were to take this definition and convert it to MySQL-brand DDL, it would look as follows:

CREATE TABLE "biography" (

"vid" int unsigned NOT NULL default '0', ~nid~ int unsigned NOT NULL default '0',

"dates" varchar(12 7) NOT NULL default '', "life" text,

"works" text, PRIMARY KEY ("vid"), KEY "nid" ("nid")

The nid array key becomes the column name, and the contents of the array are combined to create the definition. All this transformation is done under the hood by the schema API.

Each of the possible fields that can be used in a schema definition is I

documented in the Schema API manual: http://drupal.org/ I

node/14 6 9 3 9. For the most part, the Schema API names follow the I

naming patterns used by MySQL. I

Likewise, the dates entry also represents a column, though in this case the type is varchar instead of an int. Since we know a date field is short, even when expressed in text, we can limit the length of the field to 127 characters. To express the size limit, instead of defining type as varchar(127), we define the type as varchar and set length to 127.

There are a number of predefined sizes, such as small, big, tiny, and so on. (These sizes are commonly used with ints in MySQL.) If no size is specified, or if size is set to normal, then the defaults of the underlying database system are used.

The text type and default values

MySQL does not allow columns of type text to have default values. When the Schema API encounters a default value for a text column, it simply ignores the definition. This, however, can cause unexpected errors if SQL statements assume that a default will be set. For this reason, it is better to insert empty strings into text fields in a Drupal database. (We will see an example of this later in the chapter.)

The other field definitions follow the same patterns. Rather than belaboring the point, we will skip the other fields and take a look at the next two entries in the definition: indexes and primary key.

Defining Keys and Indexes

As we noted earlier, there are three items in the $schema['biography'] array: fields, indexes, and primary key. Above, we looked at the field definitions, which are used to define columns in a database table. Now we will look at the other two:

The indexes array is used to define database indexes. To ensure good performance, you should define indexes on fields used to match rows in a database query. The rule of thumb for this is that fields that are frequently used in the where clause or in joins in your SQL are those that make good candidates for indexes.

Defining an index is fairly simple. Every index needs a name and a list of fields to index. The name can be the same as the indexed fields.

In the example above, we define an index on Node IDs (nid), since that field is frequently used in retrieving information from the table. The index is defined as 'nid' => array('nid'). The array key, nid, will be the name of the index, and the array that it references, array('nid'), is the list of fields indexed here.

To improve performance on some queries, we could change this index to include cases where nid and vid (Version ID) were both used in the where clause of a query. Consider an index like this: 'nidvid' => array('nid','vid'). This defines an index named nidvid that will optimize for queries against Node ID and Version ID. (For example a query like: SELECT * FROM {biography} WHERE nid=10 AND vid=11;.)

MySQL and multi-column indexes

With MySQL, the nidvid index would not only be used for cases where both columns were accessed in the WHERE clause, but also for cases where the first of the columns (nid) is used. In other words, this one index can function as two different indexes—one on nid alone, and one on nid and vid.

Finally, a primary key entry indicates which field or combination of fields is to be treated as the table's primary key. A primary key must be unique and can never be null.

In most content type extension tables (like ours), the vid should be treated as the primary key. (Optionally, you can use the combination of nid and vid, but it is not clear that this offers any benefit.)

Once a field is identified as a primary key (or even a unique key, see the description above) then the field should not be indexed. Attempting to do so will cause database errors.

The structure of the primary key definition is simpler than the index definition: 'primary key' => array('vid'). Primary keys do not need names. Consequently, all the primary key entry requires is an array of fields to treat as the key. The unique keys definition also follows this same pattern.

Now we have created a complex data structure representing our schema definition. Any time the hook_schema() hook is called on this module, this schema definition is returned. Also, in our installation file, the scheme is retrieved twice — once by drupal_install_schema() and once by drupal_uninstall_schema().

At this point, we are done with the biography.install file. The three functions, biography_install(), biography_uninstall(), and biography_schema(), are the only functions we need to install this module.

Now we are ready to return to biography.module and make use of our newly defined database table.

+1 -1

Average user rating: 3 stars out of 2 votes

Post a comment

  • Receive news updates via email from this site