Migrating a web application from Drupal to Processwire - an approach

There are many good reasons for migrating from a CMS like Drupal to Processwire. If you are reading this article then chances are you have already decided to make the switch. But in case you need more convincing visit the Processwire website, or read The Processwire advantage.

While this article refers specifically to migrating a database application from Drupal the same principles apply for similar CMSs. 

If you are migrating a fairly straightforward site then Ryan Cramer’s Import pages from CSV files module may be all you need. For more complicated migrations, or application migrations an approach such as described here may work for you.

The example

The web application in this example is an intranet-based sheet music catalogue. The application is fairly simple, with fields to hold values like title, composer, arrangement, key, genre, condition and so on. Some of these fields have straight text inputs, but many are select inputs (ie from a drop down menu) with a static list of options. For example, the values for the condition select list are “Poor”, “Fair”, “Good” and “Very good”.

The most complicated part of the app relates to the fact that some sheet music titles are stand-alone, while others are part of a volume of sheet music. The app handles this by having a field called format-type. If the title being entered is a volume then the format-type is “volume”, otherwise it is “single”. Another of the fields in the app is volume. Again, it is a select list, but this one is populated dynamically, by listing the titles of all entries that have a format-type of “volume”. When a “single” item that is part of a volume is added, the appropriate volume is selected from this select list.

Getting your data out of Drupal

The first step is to get the data out of Drupal, in the form or records—one for each sheet music entry in the database.

If you are proficient at Drupal you might be able to write the SQL to do this yourself, but I used Views to generate the SQL that will export the records in the format required. Once I had the view working, I copied the SQL statement (which is provided on the Views edit page) and used phpMyAdmin to access the Drupal database, run the query and export the result set in XML format. I chose  XML format, as it made it easy to do the pre-processing necessary before importing the data into Processwire.

Setting up fields and templates in Processwire

Before importing the data into Processwire the fields and templates required to hold the data need to be set up. It would be possible to do this programmatically, rather than through the UI. But for a one-off migration I found it’s just as quick doing it through the UI.

Setting up select lists

Select lists in Processwire are generally handled by creating a separate record (Page) for each option, and then populating the select list by doing a query to select these Pages. I created a template called “select-item” which had no associated template file and only contained the mandatory title field. I then created a container page in the “Pages” tree, called “Resources”, with a container page inside that called “Sheet Music”.

Within these containers I created  one container for each select list (eg condition, genre). In these I created an entry for each option using the “select-item” template, with the “Title” field holding the value for each option. At this stage it was important to make sure that each option was entered exactly the same as it appears in the XML export file.

Setting up fields

The next step was to set up the fields needed in Processwire. This is straightforward so I won’t go into it any further here other than to say that the I made the select fields of type “Page” and set the “Parent of selectable pages” to the container holding the options for that select list.

Setting up the template

Once the fields are set up, I set up the template to hold the sheet music entries — which I called “folio”. Again, this is straightforward in Processwire and explained fully in the Processwire documentation.

Setting up the import script

The import script described here is based, in part, on a post dealing with migrating data into Processwire that I found in an online forum, but I have been unable to relocate it. If you recognise the original source please let me know and I will acknowledge it more formally, and link to it here.

  1. Load the xml file into a SimpleXML object eg:
    $xml = simplexml_load_file("nodesheetmusic.xml");
  2. Iterate through the nodes in your SimpleXML object, and for each node create a page in Processwire:
    foreach ($xml->node as $node) {
      $folio = new Page();
      $folio->template = $templates->get("folio");
      $folio->parent= $pages->get(1039); //I used the id of the page, but it could be selected other ways
  3. Assign values from the node to the fields in the template. For straightforward values:
    $folio->title = $node->node_title;
  4. For fields that have a select input type, go through all the possible select values and find the page id for the select-item page that matches in Processwire:
    switch ($node->node_condition) {
      case "Poor":
        $folio->folio_condition = $pages->get("title=Poor, template=select-item, parent=/resources/sheet-music/condition/")->id;
      break;
      case "Fair":
        $folio->folio_condition = $pages->get("title=Fair, template=select-item, parent=/resources/sheet-music/condition/")->id;
      break;
      case "Good":
        $folio->folio_condition = $pages->get("title=Good, template=select-item, parent=/resources/sheet-music/condition/")->id;
      break;
      case "Very good":
        $folio->folio_condition = $pages->get("title=Very good, template=select-item, parent=/resources/sheet-music/condition/")->id;
      break;
    } 
  5. Once all the fields have been added, save the page:
    $folio->save();

And there you have it, after the iteration is complete all the records from the Drupal application were imported into Processwire!

Not finished yet…

One of the complications in the sheet music catalogue is the relationship between an entry of format-type “single” and any volume that it belongs to. (As described above under “The example”.) The export from Drupal held the details of the volume that an entry belonged to as a node id (nid) that referenced the volume entry — but that is of no use in Processwire.

To get the correct values in Processwire there were a few things that needed to be done:

  1. On importing the values into Processwire the Drupal ids for each entry where imported as a field (folio_nid)
  2. The value of the volume id was also imported as a field (folio_volume_nid).
  3. The select field for volume in Processwire was set up as folio_volume.
  4. Once the import was complete I ran a second script:
    $folios = $pages->find("folio_volume_nid>0"); //find the pages that belong to a volume
    foreach ($folios as $folio) {
      $volume = $pages->get("folio_nid=$folio->folio_volume_nid"); //find the volume they belong to
      $folio->folio_volume = $volume->id; //set the value of volume to the id of that volume
      $folio->save(folio_volume);
     } 

This reflects how the volume value will be stored when additional entries are added through the GUI.

In conclusion

This approach worked remarkably well for the data I was dealing with — so I thought I’d share it with others that might be faced with the same task.

One of the limitations of the approach is the volume of data being migrated. There were about 4000 entries in this database, and running the script did cause a PHP timeout. As I was doing this in a development environment I adjusted the timeout limit in php.ini. Another solution would be to split the XML file into more manageable chunks.

Published: Friday, 9 August 2013

Last updated: Wednesday, 26 March 2014